crystal structure of a heptameric sm-like protein complex from archaea: implications for the...

9
Crystal Structure of a Heptameric Sm-like Protein Complex from Archaea: Implications for the Structure and Evolution of snRNPs Brett M. Collins 1 , Stephen J. Harrop 2 , Geoffrey D. Kornfeld 3 Ian W. Dawes 3 , Paul M. G. Curmi 2 and Bridget C. Mabbutt 1 * 1 Department of Chemistry Macquarie University, NSW 2109, Australia 2 Initiative in Biomolecular Structure, School of Physics University of New South Wales, Sydney, NSW 2052, Australia 3 School of Biochemistry and Molecular Genetics, University of New South Wales, NSW 2052, Australia The Sm/Lsm proteins associate with small nuclear RNA to form the core of small nuclear ribonucleoproteins, required for processes as diverse as pre-mRNA splicing, mRNA degradation and telomere formation. The Lsm proteins from archaea are likely to represent the ancestral Sm/Lsm domain. Here, we present the crystal structure of the Lsma protein from the thermophilic archaeon Methanobacterium thermoautotrophicum at 2.0 A ˚ resolution. The Lsma protein crystallizes as a heptameric ring comprised of seven identical subunits interacting via b-strand pairing and hydro- phobic interactions. The heptamer can be viewed as a propeller-like structure in which each blade consists of a seven-stranded antiparallel b- sheet formed from neighbouring subunits. There are seven slots on the inner surface of the heptamer ring, each of which is lined by Asp, Asn and Arg residues that are highly conserved in the Sm/Lsm sequences. These conserved slots are likely to form the RNA-binding site. In archaea, the gene encoding Lsma is located next to the L37e ribosomal protein gene in a putative operon, suggesting a role for the Lsma complex in ribosome function or biogenesis. # 2001 Academic Press Keywords: snRNP; spliceosome; Sm proteins; archaea; ribonucleoprotein *Corresponding author Introduction The eukaryotic nucleoplasm contains many small nuclear ribonucleoproteins (snRNPs), 1,2 stable complexes of small nuclear RNA (snRNA) bound to several specific proteins. The functions of snRNP particles appear to be important for RNA processing events, and have been implicated in mRNA degradation, 3,4 telomere synthesis, 5 histone pre-mRNA processing 6–8 and the addition of spliced leader RNA during trans-splicing in lower eukaryotes. 1,9,10 The best-characterised of the snRNP complexes are those containing the spliceosomal U-snRNPs, U1, U2, U4/U6 and U5, required for the splicing of nuclear pre-mRNA. 11 – 13 The core structure of these snRNPs contains seven different Sm proteins (SmB, SmD1, SmD2, SmD3, SmE, SmF and SmG) bound to the snRNA at a single-stranded uracil-rich sequence termed the Sm-binding site. 14,}15 Different seven-membered complexes of Sm-like (Lsm) pro- teins form the cores of snRNP particles involved in other RNA processing steps (e.g. the spliceosomal U6 snRNP and snRNPs involved in mRNA degradation). 3,4,16 – 19 Members of the Sm/Lsm family of proteins additionally have roles in snRNP biogenesis and transport, as well as recruit- ment of other snRNP proteins. 14,20 – 23 The Sm/Lsm proteins are characterized by a common domain of approximately 80 amino acid residues, consisting mainly of the conserved Sm sequence motif. 24 – 26 The structures of two com- ponents of the human snRNP complex, consisting of the Sm dimers D1.D2 and D3.B, have recently Present address: Brett M. Collins, The Wellcome Trust Centre for Molecular Mechanisms in Disease, Addenbrooke’s Hospital Site, Hills Road, Cambridge CB2 2XY, UK. Abbreviations used: snRNPs, small nuclear ribonucleoprotein; snRNA, small nuclear RNA; snoRNP, small nucleolar ribonucleoprotein, Lsm, Sm-like; MtLsma, Methanobacterium thermoautotrophicum Lsma; GST, glutathione-S-transferase; NCS, non- crystallographic symmetry; rmsd, root-mean-square deviation; TRAP, trp RNA-binding attenuation protein. E-mail address of the corresponding author: [email protected] doi:10.1006/jmbi.2001.4693 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 309, 915–923 0022-2836/01/040915–9 $35.00/0 # 2001 Academic Press

Upload: brett-m-collins

Post on 17-Oct-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

doi:10.1006/jmbi.2001.4693 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 309, 915±923

Crystal Structure of a Heptameric Sm-like ProteinComplex from Archaea: Implications for the Structureand Evolution of snRNPs

Brett M. Collins1, Stephen J. Harrop2, Geoffrey D. Kornfeld3

Ian W. Dawes3, Paul M. G. Curmi2 and Bridget C. Mabbutt1*

1Department of ChemistryMacquarie University, NSW2109, Australia2Initiative in BiomolecularStructure, School of PhysicsUniversity of New SouthWales, Sydney, NSW2052, Australia3School of Biochemistry andMolecular Genetics, Universityof New South Wales, NSW2052, Australia

Present address: Brett M. CollinsCentre for Molecular Mechanisms iAddenbrooke's Hospital Site, HillsCB2 2XY, UK.

Abbreviations used: snRNPs, smribonucleoprotein; snRNA, small nusmall nucleolar ribonucleoprotein, LMtLsma, Methanobacterium thermoauGST, glutathione-S-transferase; NCScrystallographic symmetry; rmsd, rdeviation; TRAP, trp RNA-binding

E-mail address of the [email protected]

0022-2836/01/040915±9 $35.00/0

The Sm/Lsm proteins associate with small nuclear RNA to form the coreof small nuclear ribonucleoproteins, required for processes as diverse aspre-mRNA splicing, mRNA degradation and telomere formation. TheLsm proteins from archaea are likely to represent the ancestral Sm/Lsmdomain. Here, we present the crystal structure of the Lsma protein fromthe thermophilic archaeon Methanobacterium thermoautotrophicum at 2.0 AÊ

resolution. The Lsma protein crystallizes as a heptameric ring comprisedof seven identical subunits interacting via b-strand pairing and hydro-phobic interactions. The heptamer can be viewed as a propeller-likestructure in which each blade consists of a seven-stranded antiparallel b-sheet formed from neighbouring subunits. There are seven slots on theinner surface of the heptamer ring, each of which is lined by Asp, Asnand Arg residues that are highly conserved in the Sm/Lsm sequences.These conserved slots are likely to form the RNA-binding site. In archaea,the gene encoding Lsma is located next to the L37e ribosomal proteingene in a putative operon, suggesting a role for the Lsma complex inribosome function or biogenesis.

# 2001 Academic Press

Keywords: snRNP; spliceosome; Sm proteins; archaea; ribonucleoprotein

*Corresponding author

Introduction

The eukaryotic nucleoplasm contains manysmall nuclear ribonucleoproteins (snRNPs),1,2

stable complexes of small nuclear RNA (snRNA)bound to several speci®c proteins. The functions ofsnRNP particles appear to be important for RNAprocessing events, and have been implicated inmRNA degradation,3,4 telomere synthesis,5 histonepre-mRNA processing6 ± 8 and the addition of

, The Wellcome Trustn Disease,Road, Cambridge

all nuclearclear RNA; snoRNP,sm, Sm-like;totrophicum Lsma;, non-

oot-mean-squareattenuation protein.ing author:

spliced leader RNA during trans-splicing in lowereukaryotes.1,9,10

The best-characterised of the snRNP complexesare those containing the spliceosomal U-snRNPs,U1, U2, U4/U6 and U5, required for the splicing ofnuclear pre-mRNA.11 ± 13 The core structure of thesesnRNPs contains seven different Sm proteins (SmB,SmD1, SmD2, SmD3, SmE, SmF and SmG) boundto the snRNA at a single-stranded uracil-richsequence termed the Sm-binding site.14,}15 Differentseven-membered complexes of Sm-like (Lsm) pro-teins form the cores of snRNP particles involved inother RNA processing steps (e.g. the spliceosomalU6 snRNP and snRNPs involved in mRNAdegradation).3,4,16 ± 19 Members of the Sm/Lsmfamily of proteins additionally have roles insnRNP biogenesis and transport, as well as recruit-ment of other snRNP proteins.14,20 ± 23

The Sm/Lsm proteins are characterized by acommon domain of approximately 80 amino acidresidues, consisting mainly of the conserved Smsequence motif.24 ± 26 The structures of two com-ponents of the human snRNP complex, consistingof the Sm dimers D1.D2 and D3.B, have recently

# 2001 Academic Press

Page 2: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

916 Structure of a Heptameric Archaeal Lsm Protein

been determined by X-ray crystallography. Theserevealed the common fold for the Sm proteindomain, as well as details of the protein-proteininteractions governing assembly of oligomers ofthe Sm/Lsm proteins.27 The Sm fold consists of atightly bent b-sheet structure with a short a-helixattached at the N terminus. In forming dimers,protein-protein interactions are mediated, in part,by b-strand backbone hydrogen bonding betweenthe two Sm protein subunits, as well as hydro-phobic interactions at the various dimeric inter-faces. Extension of the observed b-strand dimerpairing by modelling predicted that seven Sm pro-teins can stack together in the core of the snRNPparticle so as to form a heptameric ring. Thismodel highlighted a positively charged inner sur-face to the ring, identi®ed as a potential site forRNA binding.27 The circular arrangement of Smproteins has now been reconstructed successfullywithin the 10 AÊ electron cryomicroscopy mapof the human U1 snRNP, yielding signi®cant detailof the organisation of RNA and proteincomponents.28

At a functional level, the Sm/Lsm proteins havebeen characterized only in eukaryotes. However,following sequencing of several archaeal genomes,genes for several new Lsm proteins have beenidenti®ed.18,19 Archaea appear to contain only oneor two Lsm genes within each genome, in contrastto the large number (16 or more) found in eukar-yotes. The archaeal Lsm genes can be divided intotwo types on the basis of sequence homology andgenomic location; one highly conserved gene com-mon to nearly all archaea (Lsma), and a secondgene type, which varies in sequence and is not pre-sent in all species (Lsmb and Lsmg).

We present here the crystal structure of theLsma protein from the archaeon Methanobacteriumthermoautotrophicum (MtLsma) at 2.0 AÊ resolution.This archaeal Lsm protein forms a heptameric ringcomplex in the crystal, providing direct experimen-tal support for the model proposed for Sm proteininteraction in eukaryotes. While the function ofthese proteins in archaea remains unclear, our

Figure 1. Sequence alignment of archaeal Lsma proteins acum (Mthermo), Pyrococcus horikoshii (Phoriko), PyrococcuAeropyrum pernix (Apernix) and human SmB, SmD3, SmD1as is the secondary structure for MtLsma derived from thisM. thermoautotrophicum, A. fulgidus, P. horikoshii and P. abyconserved) and blue (100 % conserved in the four sequences,and human species). The remaining sequences are coloured u

structure reveals a highly conserved pocket on theinner surface of the homo-oligomeric complex thatis a likely candidate for RNA binding. Thissuggests a common mode of action for all Sm/Lsmproteins.

Results

Isolation of an archaeal Lsm protein

Using all known yeast and human Sm/Lsmamino acid sequences, genomic searches were car-ried out to identify Sm/Lsm homologs in archaea.Homologous proteins were identi®ed for theeuryarchaeal species M. thermoautotrophicum(MT0649; Lsma), Pyrococcus horikoshii (PHS042;Lsma), Archaeoglobus fulgidus (AF0875;Lsma,AF0362;Lsmg) and Pyrococcus abyssi (PAB8160),extending the previously de®ned family.17 ± 19

These sequences are aligned in Figure 1, togetherwith the sequence of a new Lsm protein(APES022), found by searching the genome of thecrenarchaeote Aeropyrum pernix. In order to deter-mine the structure of the Sm/Lsm domain, thegene LSma was isolated from the thermophilicarchaeon M. thermoautotrophicum, and inserted intoan Escherichia coli expression plasmid for high-levelproduction via a GST-fusion recombinant protein.

Structure of the MtLsmaaa monomer

Within the heptameric structure, each MtLsmamonomer consists of ®ve b-strands arranged in astrongly bent anti-parallel sheet from residue Ser24onwards. N-terminal to this, residues Asp16 toSer21 form an a-helix with residues Arg13 toLeu15 forming a short 310-helical extension(Figure 2(a)). The highly curved nature of theMtLsma b-sheet allows the formation of a compacthydrophobic core. The resulting disruptions toregular b-sheet hydrogen bonding are essentiallythe same as those reported for the human Smstructures.27 Least-squares alignment of the back-bone fold determined for MtLsma with the family

nd human Sm proteins: Lsma from M. thermoautotrophi-s abyssi (Pabyssi), Archeaglobus fulgidus (Afulgid) andand SmD2. Sm sequence motif locations are shown,24 ± 26

study. The four sequences from the euryarchaeal species,ssi are coloured green (75 % conserved), red (100 %and greater than 90 % conserved over all archaeal, yeastsing this scheme.

Page 3: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

Figure 2. (a) Structure of the MtLsma monomeric sub-unit. The side-chains of the conserved residues Asp44,Asn48, and Arg72 are shown, and hydrogen bondingbetween Asp44, Asn48 and Gly73. (b) Ca overlay of theMtLsma monomer (magenta) with human Sm proteinsSmB (cyan), SmD1 (blue), SmD2 (red) and SmD3(orange). Structures overlay well in conserved b-sheetregions, but helix A and loop L4 show signi®cant vari-ations. The images were created using SETOR,51 over-laid structures in the LSQMAN program implementedin O.49

Structure of a Heptameric Archaeal Lsm Protein 917

of previously de®ned human Sm structures (SmB,SmD1, SmD2 and SmD3) shows the close structur-al homology between these protein molecules.Over the fold de®ned by �70 Ca atoms, the root-mean-square deviation (rmsd) does not differ bymore than 1.2 AÊ for any pair-wise combination(Figure 2(b)). This compares closely to the rmsd of<0.8 AÊ between different subunits in the MtLsmaheptamer.

The archaeal Lsma sequence is essentially madeup of the core Sm motifs, and has none of theextended N-terminal, C-terminal or loop sequencescommon to many members of the eukaryotic Sm/Lsm family.24 ± 26 MtLsma does contain a slightlylonger N-terminal sequence relative to the otherarchaeal Lsma proteins (Figure 1), which forms ashort extended structure from residues Arg8 toGln12 just prior to the a-helix. Density was notobserved for amino acid residues N-terminal toArg8 in any of the seven subunits, indicatingdisorder and/or mobility for this region. TheseN-terminal residues are not essential for complexformation, as a deletion mutant MtLsma[10-81] canstill organise as multimers in solution (unpublishedresults). At the C terminus, the b-sheet structureextends to incorporate the penultimate residue ofthe protein, Ser80.

Residues Glu57 to Arg65 of MtLsma fall withina region of highly variable sequence for the Sm/

Lsm family (Figure 1). This region includes loop L4of the structure, as well as parts of the b3 and b4strands, and shows signi®cant variations in con-formation amongst the different subunits in theoligomer. Consistent with this, subunits D-G haveill-de®ned density and high B-factors for theseside-chains, indicating ¯exibility of this loop. Whencompared with the human Sm structures,27 thisarea shows a large degree of variation (Figure 2(b)),con®rming this to be a common characteristic ofthe Sm/Lsm family.

As seen in the structures of the human Smproteins, loops L3 and L5 of the MtLsma fold areconnected by a hydrogen bonding network thatincludes the highly conserved residues Asp44,Asn48 and Gly73 (Figure 2(a)). The side-chain ofAsp44 is hydrogen bonded to the side-chain amidegroup of Asn48, while the side-chain carboxylgroup of Asn48 forms a hydrogen bond with thebackbone amide group of Gly73. This network isfurther stabilized by hydrogen bonding betweenthe second oxygen atom of the Asp44 side-chainand main-chain amide groups of His46 and Asn48.Consistent with the hypothesis put forward byKambach et al.,27 this region appears to be suitablystructured for RNA interactions (see below).

Quaternary structure

MtLsma forms an oligomer arranged as a ringstructure of seven identical subunits, with a widthof 30 AÊ and a diameter of 65 AÊ . The central holehas a diameter of approximately 10-15 AÊ (Figure 3).The individual adjacent monomers interact viapairing of the b4 and b5-strands, as well as hydro-phobic interactions and other side-chain inter-actions, as summarized in Figure 4. Theorganisation is exactly analogous to that seenwithin the human Sm dimers, con®rming themodel that b-strand pairing between Sm/Lsm pro-teins will result in an extended b-sheet structurethat ultimately forms a closed ring.27 Each MtLsmasubunit interacts with its adjoining neighbours byb-sheet extension, whereby its b4-strand interactsvia hydrogen bonding with the b5-strand of aneighbouring subunit, at the same time as its ownb5-strand hydrogen bonds to the b4-strand of asecond subunit at the opposite interface. Overall,this b-strand pairing results in a left-handed pro-peller-like structure, in which three strands fromchain n form an extended b-sheet with four strandsof chain n � 1 (Figure 3(a) and (b)). The loops L2(Lys31 and Gly32), L3 (Leu45 and His46) and L5(Gly73, Asp74 and Asn75) of all seven subunitsline the inner surface of the archaeal MtLsma ringcomplex. For clarity, we refer to one face of thering as the helix side, and to the other as the loopL4 side (Figure 3(c) and (d)).

A simple understanding of the subunit inter-actions is that each monomer must associate withanother so as to shield its hydrophobic core. Thecurved nature of the b-sheet of an individualmonomer creates a hydrophobic core that is

Page 4: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

Figure 3. Overall structure of the MtLsma heptamer and potential RNA binding sites. (a) Stereo ribbon diagram ofthe MtLsma complex, including conserved Asp44, Asn48 and Arg72 side-chains. The helix side of the ring is facingupwards. (b) Side-view of (a). (c) Stereo image of the van der Waals surface (excluding hydrogen atoms) of theMtLsma complex, oriented as in (a). Conserved hydrophobic residues (>40 % over all archaeal, yeast and humanSm/Lsm sequences) are coloured blue, conserved polar and charged residues (>40 %) are magenta and highly con-served (>90 %) residues are red. All remaining residues are yellow. (d) van der Waals surface showing the oppositeloop L4 face of the ring. Figure 3(a) and (b) were produced with SETOR.51

918 Structure of a Heptameric Archaeal Lsm Protein

exposed on both sides to solvent (Figure 2). Fol-lowing b-strand pairing with another MtLsma pro-tein, both subunits are positioned so that one endof each hydrophobic core is capped; the formationof a ring results in enclosure of the hydrophobiccores of all component monomers.

To discuss subunit interactions, we will adoptthe terminology introduced by Kambach et al.,27

and de®ne chain A as the b4-neighbour of chain B,and chain B as the b5-neighbour of A. At the chainA/chain B interface, the complex is stabilized byhydrophobic contacts, hydrophilic interactions anda salt-bridge between Glu35 (A) and Arg65 (B)(Figure 4). This latter salt-bridge is almost identicalin position (but involves different residues) with

that between Glu21 and Arg65 of the human D3.Bdimer. The hydrophobic pocket (consisting ofIle27, Val77 and Tyr78 of chain A, and Leu30,Phe36, Leu66, Val69 and Ile71 of chain B) caps oneend of the B subunit. On the other side of theb-sheet, a second hydrophobic pocket is formedbetween side-chains of the amphipathic a-helixand hydrophobic core of chain A, with Val50 andLeu70 of chain B (Figure 4). In this way, the hydro-phobic core at one end of chain A is also enclosed.

Potential sites for RNA binding

In eukaryotes, the Sm/Lsm proteins bind single-stranded RNA. The spliceosomal Sm proteins bind

Page 5: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

Figure 4. Stereo images of the subunit interfacebetween two MtLsma monomers. The b5 strand ofchain A (yellow) is hydrogen bonded to the b4 strand ofchain B (magenta), forming an extended b-sheet. Thesalt-bridge between Glu35 of A and Arg64 of B is alsoshown. The hydrophobic core of B is capped by inter-action with A, and hydrophobic interactions on theopposite side of the b-sheet, between residues fromhelix A of chain A and the surface of the b-sheet ofchain B, also contribute to the stability of the interface.The Figure was produced with SETOR.51

Structure of a Heptameric Archaeal Lsm Protein 919

to the consensus sequence PuAU4-6GPu within thesnRNA.14,15 They have also been shown to bindspeci®cally to a synthetic Sm-site nonanucleotide,AAUUUUUGA, although the kinetics of protein/RNA association differ from those measured forthe intact snRNA.29 It is highly likely that thearchaeal Lsma complex will also bind to single-stranded RNA (or possibly DNA) in vivo.

In the light of several recent structures of pro-tein/RNA complexes, commonly identi®ed deter-minants of protein/RNA recognition have beenfound. These include base stacking with aromaticprotein side-chains, electrostatic interactionsbetween Arg/Lys side-chains and the phosphatebackbone, hydrophobic interactions, and hydrogenbond associations between the RNA and Arg, Asp,Glu, Gln and Asn protein side-chains that gener-ally govern the speci®city of binding.30 ± 34 Kam-bach et al. have previously proposed that theconserved Asp, Asn, Gly and Arg residues of thehuman Sm proteins are important for RNA bind-ing, and suggested that a region of positive chargeon the inner surface of the human Sm ring modelfurther implicates this area as the RNA-bindingsite.27

We are particularly interested in the structurallocation of residues that are highly conserved inthe Sm/Lsm protein family, from archaeal throughto yeast and human sequences. The residues thatare most highly conserved across the family(>90 %) are Gly38, Asp44, Asn48 (100 %), Leu51,Arg72 and Gly73 (Figure 1). Leu51 forms part ofthe hydrophobic core of the MtLsma monomer,and Gly38 is a structural component of a con-served b-bulge on the external edge of the ring.

The remaining conserved residues, however, forma highly ordered hydrogen bonding network andare positioned on the inside surface of the ring(Figures 2(a) and 3(a), see above).

These residues are spatially clustered in afashion that strongly suggests they play a func-tional role in the Sm/Lsm complex. The threeloops, L2, L3 and L5 from any given subunit arealigned with each other, parallel with the 7-foldrotation axis (Figure 3(a)). A cleft is formedbetween the loops of neighbouring subunits, result-ing in seven slots on the helix side of the ring thatlead into the central hole. These clefts are lined bythe strictly conserved hydrogen bonded residuesAsp44, Asn48, Gly73 and the neighbouring Arg72(Figure 3), as well as the highly conserved residuesAsp74 and Asn75. In contrast, the loop L4 side ofthe ring shows no clusters of strictly conservedresidues (Figure 3(d)).

We propose that these clefts lining the inner sur-face are appropriately structured to bind individ-ual bases of a single-stranded nucleic acid. Theamino acid side-chains present at this site arecapable of hydrogen bonding to RNA bases, andthe distance between each slot appears to be highlysuitable for accommodating a single-stranded RNA(or DNA) coiled around the inside surface of thering. Thus, our proposed model for RNA bindingsomewhat resembles that seen in the RNA complexof the trp RNA-binding attenuation protein(TRAP),35 except that the snRNA contacts the innersurface of the Sm ring (cf the outer surface inTRAP). TRAP also contains some of the features ofprotein architecture of the Sm ring, with subunitsinteracting via b-strand hydrogen bonding. Ourmodel for RNA binding is further supported bythe fact that the conserved polar side-chains thatline the slots do not appear to be essential for ringformation. This is consistent with mutagenesis andtwo-hybrid studies, which show that the equiva-lents of Asn48, Arg72, Gly73 and Asn75 in yeastSmE do not affect its ability to interact with otherSm proteins.36

Our model is also consistent with the electronmicroscopy reconstruction of the human U1snRNP recently described by Stark et al.28 Thisshows the uracil-rich Sm motif making contactwith several of the Sm proteins in the heptamericring as the snRNA spirals around within its centralcavity, traversing the ring from back to front (inthe orientation depicted in Figure 3(a)). In thehuman snRNP, the neighbouring binding clefts ofSmF, SmE and SmG are likely to be involved inthis speci®c protein/RNA interaction.

As expected, analysis of the location of con-served hydrophobic residues shows that themajority form either the hydrophobic core of indi-vidual subunits, or occur at subunit interfaces.Interestingly, there is an additional region of con-served hydrophobicity in the MtLsma complex, onthe surface of the ring on the helix face, the sameface as the putative RNA-binding surface(Figure 3(c)). This is formed by the interaction of

Page 6: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

920 Structure of a Heptameric Archaeal Lsm Protein

Leu18 of the N-terminal helix with the Phe43 side-chain in strand b2 of the same subunit. This regionmight also be important for the function of theLsma complex, possibly involving other protein-protein interactions.

Discussion

The Role of Lsmaaa in archaea

It is likely that the major function of the archaealLsm complex will involve RNA binding. In eukar-yotes, without exception, proteins containing theSm domain have been found to bind RNA insnRNP complexes. These complexes have variousfunctions, including pre-mRNA splicing, mRNAdegradation, histone pre-mRNA processing, telo-mere processing and trans-splicing. Archaea do notperform any sort of spliceosomal-mediated RNAsplicing, and without a nuclear membrane there isno requirement for RNA transport. AlthoughtRNA and rRNA splicing does occur, these pro-cesses involve simple enzymes.37 ± 39 At present, itremains dif®cult to positively identify a role forarchaeal Lsma, or its potential RNA-binding part-ners. Possible stable RNA partners might includeribosomal and transfer RNA species, the ribonu-clease P RNA subunit or the 7 S RNA componentof the signal-recognition particle complex. Evenwith ®ve archaeal genomic sequences now com-pleted, no putative snRNA-like sequences have yetbeen identi®ed. More sophisticated sequencesearches, as recently adopted in the search forsmall nucleolar RNA (snoRNA)-like genes in Pyro-coccus species40 may change this.

We note that in ®ve of the six archaeal genomessequenced to date, including M. thermoautotrophi-cum, the Lsma gene is located directly upstream ofthe putative ribosomal protein gene L37e (Figure 5).Examination of these genomic sequences for poten-tial transcription and translation start sites suggeststhat Lsma and L37e are present in a conservedoperon. We have used this observed genomicstructure to identify for the ®rst time the geneAPES022 from the crenarchaeote A. pernix as anLsma orthologue (Figures 1 and 5). This is despite

Figure 5. Genetic organization of the Lsma and ribosomayarchaeaotes P. horikoshii, P. abyssi, A. fulgidus, M. thermoautoLsma and L37e appear to be located in a conserved putatiP. abyssi, AJ248285, A. fulgidus, AE001044, M. thermoautotroph

a relatively high degree of sequence divergencefrom the previously identi®ed euryarchaeal Lsmagenes.18,19

It is also important to recognize that genes forribosomal proteins and related factors are com-monly found in gene clusters in archaea andbacteria.41 If we assume that the genes for Lsmaand L37e are co-regulated and transcribed, then itis likely that the archaeal Lsma complex plays arole in ribosomal function or biogenesis. Thiswould mirror the example of the operon structureof genes for the ribosomal methylation guidesnoRNP components Nop1/®brillarion and Nop5/Nop58.42,43 In support of this, we note that levelsof yeast pre-5 S ribosomal RNA are reduced instrains with deletions of Lsm genes.17

Evolution of the Sm domain

The functions of Sm/Lsm proteins in eukaryotesappear to rely on heterologous interactionsbetween different Sm/Lsm proteins. In eukaryotes,there are at least 16 different genes coding for sep-arate Sm/Lsm proteins. In marked contrast,archaeal genomes appear to code for one, or atmost two Lsm proteins. In the case of Methanococ-cus janaschii no Lsm genes are evident at all, evenwhen the region surrounding the M. janaschii L37egene is examined carefully. This study providesconclusive evidence that stable interactions arepossible between identical Sm/Lsm proteins, andthese interactions result in formation of a hepta-meric ring complex with strong similarity to theeukaryotic snRNP cores. It is likely that archaealLsma represents the most ancient of the Sm/Lsmproteins and might therefore provide importantclues to the evolution of the Sm domain and thespliceosomal machinery.

The exact evolutionary relationship between thearchaea and eukarya is still under debate, but themost generally accepted view is that the twodomains have evolved from a common ancestorthat diverged from the bacterial domain.44 If this isthe case, then it is possible that the Sm domainwas present in the common ancestor and wasinherited by both the archaeal and eukaryotic

l protein L37e genes in several species of archaea; eur-trophicum and the crenarchaeote A. pernix. The genes forve operon. Genbank accessions: P. horikoshii, AP000006,icum AE000666, A. pernix, AP000059.

Page 7: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

Structure of a Heptameric Archaeal Lsm Protein 921

domains after the splitting of their lineages.Another possibility is that the Sm/Lsm genes werepassed between domains by lateral gene transfer.

It has been proposed that the evolution of thespliceosomal machinery from group II intronsinvolved ancestral Sm/Lsm proteins acting as posi-tively charged RNA chaperones to facilitate inter-actions between trans-acting RNA species.27 Theease with which the Lsma protein forms a homo-oligomer suggests it as a likely candidate for theancestral Sm/Lsm RNA chaperone. The gene mayhave been co-inherited by archaea and eukaryotesor borrowed by eukaryotes via lateral gene trans-fer, eventually co-evolving with group II intronsand other RNA species to form the modern spli-ceosome and related snRNPs.

Conclusion

Here, we report the crystal structure of anarchaeal Lsm complex comprised of seven identicalSm/Lsm domains and show how this structure, viaconserved Asp, Asn and Arg residues, might directthe binding of single-stranded RNA or DNA. Theorganization of the genes for Lsma and L37esuggest a possible role for Lsma in ribosomal func-tion or biogenesis and has allowed us to de®ne anew Lsma ortholog from the crenarchaeaote,A. pernix. Our results provide strong experimentalevidence for the heptameric model of Sm/Lsmprotein association in eukaryotes, and have import-ant implications for the mechanism of complex for-mation and RNA binding in snRNPs.

Materials and Methods

Cloning, expression and purification of MtLsmaaa

The gene encoding Lsma from M. thermoautotrophicum(MtLsma) was isolated by PCR from genomic DNAusing primers that incorporated BamHI (50) and SmaI/XmaI (30) restriction sites. The gene was inserted into thepGEX-4T-2 expression vector (Amersham Pharmacia Bio-tech) and co-transformed into the E. coli strain BL21 withthe pRI952 rare codon plasmid.}45 The pGEX plasmidcoded for expression of MtLsma with an N-terminal glu-tathione-S-transferase (GST) fusion. The expression strainwas grown at 37 �C in Luria broth containing ampicillin(50 mg mlÿ1) and chloramphenicol (25 mg mlÿ1). Proteinexpression was induced by addition of 0.1 mM IPTGwhen the culture reached an A595 of 0.5-0.6 cmÿ1 and thecells were harvested by centrifugation (1500 g) after afurther four to ®ve hours of growth. Cells were sus-pended in buffer A (20 mM Tris, 50 mM NaH2PO4,100 mM NaCl, pH 8.0) and lysed using a French press inthe presence of 0.5 % (w/v) Tween201 (Calbiochem),10 mg mlÿ1 ribonuclease A and protease inhibitor cocktail(Sigma). Lysate from 1 l of culture was incubated with3 ml of glutathione-Sepharose 4B (Amersham PharmaciaBiotech) and unbound proteins removed after pouringthe slurry into a gravity column. The column waswashed thoroughly with buffer A and bound fusion pro-tein cleaved with thrombin (12 hours, 22 �C). CleavedMtLsma was eluted in buffer A and further puri®ed bysize-exclusion chromatography (Superose1 12) into

10 mM Tris, 200 mM NaCl, pH 8.0. Puri®ed MtLsmawas concentrated (ca 14 mg mlÿ1) and stored at ÿ80 �Cafter ¯ash-freezing in liquid nitrogen.

Protein crystallization

Crystallization conditions were screened using theHampton Research Crystal Screen kit by sitting dropvapour diffusion at 21 �C. Large monoclinic crystalscould be grown in magnesium formate, but diffractiondid not extend beyond 3.0 AÊ . Other conditions wereoptimized for crystal growth with the best crystalsgrowing in 0.1 M Tris, 16 % (v/v) PEG3350, 0.25 Mlithium sulfate at pH 8.5. These primitive monocliniccrystals grew to maximum dimensions of0.3 mm � 0.35 mm � 0.25 mm within several days, dif-fracted to 2.0 AÊ resolution and belong to space groupP21 (a � 40.4 AÊ , b � 71.9 AÊ , c � 94.7 AÊ and b � 93.7 �).

Data collection and structure determination

Diffraction data were collected on a MacscienceDIP2030 imaging plate detector mounted on a NoniusCu rotating anode generator with focusing mirrors. Crys-tals were ¯ash-frozen in a stream of nitrogen at 100 Kafter soaking in a cryoprotection buffer consisting ofmother liquor with 30 % (w/v) glucose. Data from asingle crystal were processed using the programsDENZO and SCALEPACK.46 The self-rotation functionof the MtLsma crystal was calculated using thePOLARRFN program in the CCP4 suite.47 This unam-biguously revealed a 7-fold axis of symmetry for theMtLsma complex, indicating that MtLsma is a hepta-meric ring.

Initial phases were determined using a heptamericmodel constructed from human SmD2 (1B34), essentiallyas described by Kambach et al.27 This initial model con-sisted of seven SmD2 molecules arranged in a ring aspoly-serine chains. Using molecular replacement (asimplemented in CNS),48 clear solutions were obtainedfor the cross-rotation and translation functions. CCP4programs were used to calculate a preliminary mapusing 7-fold non-crystallographic symmetry (NCS) aver-aging and density modi®cation. The map showed clearlyinterpretable electron density for main-chain and side-chain atoms, including density for N-terminal and loopsequences present in MtLsma but not in the originalphasing model. This map was used for the ®rst round ofmodel rebuilding in the program O.49 The molecularmodel was subsequently improved with several roundsof rigid-body, simulated annealing and group B-factorre®nement (CNS) and manual rebuilding in O. NCSrestraints were maintained throughout re®nement.Releasing these restraints did not yield signi®cantly bet-ter R-factors, and in some circumstances made themworse. Unambiguous density was observed for all resi-dues except those at the N terminus. Clear density wasseen for loop residues Glu58 to Glu61 for chains A, Band C, while the other subunits showed poor side-chaindensity, but clearly interpretable main-chain density forthese residues. This allowed positioning of the loop inthese subunits. The ®nal model consists of seven chains(A-G) with residues A8-A81, B8-B81, C8-C81, D9-D81,E9-E81, F12-F81 and G9-G81. The average B-factors foreach subunit varied signi®cantly around the ring(A � 34 AÊ 2, B � 26 AÊ 2, C � 27 AÊ 2, D � 44 AÊ 2, E � 51 AÊ 2,F � 57 AÊ 2 and G � 48 AÊ 2 calculated for backbone atoms)and this appears to be due to crystal packing. The model

Page 8: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

Table 1. Crystal data collection and re®nement statistics

Number of crystals 1Number of measured reflections 119,267Number of unique reflections 35,595Resolution range (AÊ ) 15.0-2.0Completeness of data (2.07-2.00 AÊ shell) 98.2 (85.6)I/s (2.07-2.00 AÊ shell) 20.7 (2.8)Rmerge (2.07-2.00 AÊ shell) 0.049 (0.307)Number of protein atoms 4022Number of water molecules 155Crystallographic R-factor 0.218Rfree

a 0.257rmsd bond lengths (AÊ )b 0.005rmsd. bond angles (deg.)b 1.3Ramachandran plotc

Most favored region (%) 84.8Additionally allowed (%) 15.2

a Rfree was calculated on a test set consisting of 7.5 % of thedata.

b From CNS.48

c From PROCHECK.50

922 Structure of a Heptameric Archaeal Lsm Protein

contains 210 water molecules arranged primarily aroundthe A, B and C subunits. The ®nal model has an R-factorof 0.22, an Rfree of 0.26 and shows good stereochemicalproperties. Table 1 shows the statistics for data collectionand structure re®nement.

Protein Data Bank accession numbers

The coordinates and X-ray amplitudes and phaseshave been deposited with the Protein Data Bank (acces-sion code 1I81).

Acknowledgments

We thank Rick Cavicchioli, Julie Lim and TorstenThomas for the kind gifts of M. thermoautotrophicumDNA and the plasmid pRI952. We acknowledge thelively discussions and ongoing assistance from LizaCubeddu. This work was supported by the AustralianResearch Council and Macquarie University researchfunds.

References

1. Burge, C. B., Tuschl, T. & Sharp, P. A. (1999). Spli-cing of of precursors to mRNAs by the spliceo-somes. In The RNA World (Gesteland, R. F., Cech,T. R. & Atkins, J. F., eds), pp. 525-560, Cold SpringHarbor Press, Cold Spring Harbor, NY.

2. Zieve, G. W. & Sauterer, R. A. (1990). Cell biologyof the snRNP particles. Crit. Rev. Biochem. Mol. Biol.25, 1-46.

3. Tharun, S., He, W., Mayes, A. E., Lennertz, P.,Beggs, J. D. & Parker, R. (2000). Yeast Sm-like pro-teins function in mRNA decapping and decay.Nature, 404, 515-518.

4. Bouveret, E., Rigaut, G., Shevchenko, A., Wilm, M.& SeÂraphin, B. (2000). A Sm-like protein complexthat participates in mRNA degradation. EMBO J. 19,1661-1671.

5. Seto, A. G., Zaug, A. J., Sobel, S. G., Wolin, S. L. &Cech, T. R. (1999). Saccharomyces cerevisiae telomer-

ase is an Sm small nuclear ribonucleoprotein par-ticle. Science, 401, 177-180.

6. Strub, K., Galli, G., Busslinger, M. & Birnstiel, M. L.(1984). The cDNA sequences of the sea urchin U7small nuclear RNA suggest speci®c contactsbetween histone mRNA precursor and U7 RNAduring RNA processing. EMBO J. 3, 2801-2807.

7. Schaufele, F., Gilmartin, G. M., Bannwarth, W. &Birnstiel, M. L. (1986). Compensatory mutationssuggest that base-pairing with a small nuclear RNAis required to form the 30 end of messenger-RNA.Nature, 323, 777-781.

8. Galli, G., Hofstetter, H., Stunnenberg, H. G. &Birnstiel, M. L. (1983). Biochemical complementationwith RNA in the Xenopus oocyte: a small RNA isrequired for the generation of 30 histone mRNAtermini. Cell, 34, 823-828.

9. Blumethal, T. (1995). Trans-splicing and polycistronictranscription in Caenorhabditis elegans. Trends Genet.11, 132-136.

10. Nilsen, T. W. (1995). Trans-splicing: an update. Mol.Biochem. Parasitol. 73, 1-6.

11. Moore, M. J., Query, C. C. & Sharp, P. A. (1993).Splicing of precursors to mRNA by the spliceosome.In The RNA World (Gesteland, R. F. & Atkins, J. F.,eds), pp. 303-357, Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, NY.

12. Newman, A. (1998). RNA splicing. Curr. Biol. 8,R903-R905.

13. Staley, J. P. & Guthrie, C. (1998). Mechanical devicesof the spliceosome: motors, clocks, springs andthings. Cell, 92, 315-326.

14. Mattaj, I. W. & Robertis, E. M. D. (1985). Nuclearsegregation of U2 snRNP requires binding ofspeci®c snRNP proteins. Cell, 40, 111-118.

15. Liautard, J. P., Sri-Widada, J., Brunel, C. & Jeanteur,P. (1982). Structural organization of ribonucleopro-teins containing small nucelar RNAs from HeLacells. Proteins interact closely with a similar struc-tural domain of U1, U2, U4 and U5 small nuclearRNAs. J. Mol. Biol. 162, 623-643.

16. Vidal, V. P. I., Verdone, L., Mayes, A. E. & Beggs,J. D. (1999). Characterization of U6 snRNA-proteininteractions. RNA, 5, 1470-1481.

17. Mayes, A. E., Verdone, L., Legrain, P. & Beggs, J. D.(1999). Characterisation of Sm-like proteins in yeastand their association with U6 snRNA. EMBO J. 18,4321-4331.

18. Salgado-Garrido, J., Bragodo-Nilsson, E., Kandels-Lewis, S. & SeÂraphin, B. (1999). Sm and Sm-like pro-teins assemble in two related complexes of deepevolutionary origin. EMBO J. 18, 3451-3462.

19. Achsel, T., Brahms, H., Kastner, B., Bachi, A., Wilm,M. & LuÈ hrmann, R. (1999). A doughnut-shapedheteromer of human Sm-like proteins binds to the30-end of U6 snRNA, thereby facilitating U4/U6duplex formation in vitro. EMBO J. 18, 5789-5802.

20. Fischer, U., Sumpter, V., Sekine, M., Satoh, T. &LuÈ hrmann, R. (1993). Nucleocytoplasmic transportof U snRNPs: de®nition of a nuclear location signalin the Sm core domain that binds a transport recep-tor independantly of the m3G cap. EMBO J. 12, 573-583.

21. Mattaj, I. W. (1986). Cap trimethylation of U snRNAis cytoplasmic and dependant on U snRNP proteinbinding. Cell, 46, 905-911.

22. Malatesta, M., Fakan, S. & Fischer, U. (1999). TheSm core domain mediates targeting of the U1snRNP to subnuclear compartments involved in

Page 9: Crystal structure of a heptameric Sm-like protein complex from archaea: implications for the structure and evolution of snRNPs

Structure of a Heptameric Archaeal Lsm Protein 923

transcription and splicing. Expt. Cell Res. 249, 189-198.

23. Nelissen, R. L. H., Will, C. L., van Venrooij, W. J. &LuÈ hrmann, R. (1994). The association of the U1-speci®c 70 K and C proteins with U1 snRNPs ismediated in part by the common U snRNP proteins.EMBO J. 13, 4113-4125.

24. SeÂraphin, B. (1995). Sm and Sm-like proteins belongto a large family: identi®cation of proteins of the U6as well as the U1, U2 U4 and U5 snRNPs. EMBO J.14, 2089-2098.

25. Hermann, H., Fabrizio, P., Raker, V. A., Foulaki, K.,Hornig, H., Brahms, H. & LuÈ hrmann, R. (1995).snRNP Sm proteins share two evolutionarily con-served sequence motifs which are involved in Smprotein-protein interactions. EMBO J. 14, 2076-2088.

26. Cooper, M., Johnston, L. H. & Beggs, J. D. (1995).Identi®cation and characterisation of Uss1p(Sdb23p): a novel U6 snRNA-associated proteinwith signi®cant similarity to core proteins of smallnuclear ribonucleoproteins. EMBO J. 14, 2066-2075.

27. Kambach, C., Walke, S., Young, R., Avis, J. M., de laFortelle, E., Raker, V. A., LuÈ hrmann, R., Li, J. &Nagai, K. (1999). Crystal structures of two Sm pro-tein complexes and their implications for the assem-bly of the spliceosomal snRNPs. Cell, 96, 375-387.

28. Stark, H., Dube, P., Luhrmann, R. & Kastner, B.(2001). Arrangement of RNA and proteins in thespliceosomal U1 small nuclear ribonucleoproteinparticle. Nature, 409, 539-542.

29. Raker, V. A., Hartmuth, K., Kastner, B. &Luhrmann, R. (1999). Spliceosomal U snRNP coreassembly: Sm proteins assemble onto an Sm sitenonanucleotide in a speci®c and thermodynamicallystable manner. Mol. Cell. Biol. 19, 6554-6565.

30. Nagai, K. (1996). RNA-protein complexes. Curr.Opin. Struct. Biol. 6, 53-61.

31. Varani, G. & Nagai, K. (1998). RNA recognition byRNP proteins during RNA processing. Annu. Rev.Biophys. Biomol. Struct. 27, 407-445.

32. Muto, Y., Oubridge, C. & Nagai, K. (2000). RNA-binding proteins: TRAPping RNA bases. Curr. Biol.10, 19-21.

33. Guzman, R. N. D., Turner, R. B. & Summers, M. F.(1998). Protein-RNA recognition. Biopolymers, 48,181-195.

34. Antson, A. A. (2000). Single stranded RNA bindingproteins. Curr. Opin. Struct. Biol. 10, 87-94.

35. Antson, A. A., Dodson, E. J., Dodson, G., Greaves,R. B., Chen, X.-P. & Gollnick, P. (1999). Structure ofthe trp RNA binding attenuation protein, TRAP,bound to RNA. Nature, 401, 235-242.

36. Camasses, A., Bragado-Nilsson, E., Martin, R.,SeÂraphin, B. & BordonneÂ, R. (1998). Interactionswithin the yeast Sm core complex: from proteins toamino acids. Mol. Cell. Biol. 18, 1956-1966.

37. Lykke-Anderson, J., Aagaard, C., Semionkov, M. &Garrett, R. A. (1997). Archaeal introns: splicing,intercellular mobility and evolution. Trends Biochem.Sci. 22, 326-331.

38. Abelson, J., Trotta, C. R. & Li, H. (1998). tRNA spli-cing. J. Biol. Chem. 273, 12685-12688.

39. Belfort, M. & Weiner, A. (1997). Another bridgebetween kingdoms: tRNA splicing in archaea andeukaryotes. Cell, 89, 1003-1006.

40. Gaspin, C., CavailleÂ, J., Erauso, G. & Bachellerie, J.-P. (2000). Archaeal homologs of eukaryotic methyl-ation guide small nucleolar RNAs: lessons from thePyrococcus genomes. J. Mol. Biol. 297, 895-906.

41. Dennis, P. P. (1997). Ancient ciphers: translation inarchaea. Cell, 89, 1007-1010.

42. Smith, D. R. et al. (1997). Complete genomesequence of Methanobacterium thermoautotrophicum�H: functional analysis and comparitive genomics.J. Bacteriol. 179, 7135-7155.

43. Hickey, A. J., Macario, A. J. & Conway de Macario,E. (2000). Identi®cation of genes in the genome ofthe archaeon, Methanosarcina mazeii that code forhomologs of nuclear eukaryotic molecules involvedin RNA processing. Gene, 253, 77-85.

44. Woese, C. R., Kandler, O. & Wheelis, M. L. (1990).Towards a natural system of organisms: proposalfor the domains Archaea, Bacteria and Eukarya. Proc.Natl Acad. Sci. USA, 87, 4576-4579.

45. Del Tito, B. J., Ward, J. M., Hodgson, J., Gershater,C. J., Edwards, H., Wysocki, L. A., Watson, F. A.,Sathe, G. & Kane, J. F. (1995). Effects of minor iso-leucyl tRNA on heterologous protein translation inEscherichia coli. J. Bacteriol. 177, 7086-7091.

46. Otwinowski, Z. & Minor, W. (1997). Processing ofX-ray diffraction data collected in oscillation mode.Methods Enzymol. 276, 307-326.

47. CCP4, (1994). The CCP4 suite: programs for proteincrystallography. Acta. Crystallog. sect. D, 50, 760-763.

48. Brunger, A. T., Adams, P. D., Clore, G. M., Delano,W. L., Gros, P., Grosse-Kuntsleve, R. W., Jiang, J.-S.,Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J.,Rice, L. M., Simonson, T. & Warren, G. L. (1998).Crystallography & NMR system: a new softwaresuite for macromolecular structure determination.Acta Crystallog. sect. D, 54, 905-921.

49. Jones, T. A., Zou, J. Y., Cowen, S. W. & Kjeldgaard,M. (1991). Improved methods for binding proteinmodels in electron density maps and the location oferrors in these models. Acta Crystallog. sect. A, 47,110-119.

50. Laskowski, R. A., MacArthur, M. W., Moss, D. S. &Thornton, J. M. (1994). PROCHECK: a program tocheck stereochemical quality of protein structures.J. Appl. Crystallog. 26, 283-291.

51. Evans, S. V. (1993). SETOR: hardware-lighted three-dimensional solid model representations of macro-molecules. J. Mol. Graph. 11, 134-138.

Edited by J. Doudna

(Received 24 January 2001; received in revised form 12 April 2001; accepted 12 April 2001)