a novel mitochondrial genome organization for the blue ... · matode, meloidogyne javanica (okimoto...

16
Copyright 0 1992 by the Genetics Society of America A Novel Mitochondrial Genome Organization for the Blue Mussel, Mytilus edulis Richard J. Hoffmann,*" Jeffrey L. Boore+ and Wesley M. Brownt *Department of Zoology and Genetics, Iowa State University, Ames, Iowa 5001 1-3223, and ?Department of Biology, University of Michigan, Ann Arbor, Michigan 48109-1048 Manuscript received December 1 1, 1991 Accepted for publication February 7, 1992 ABSTRACT The sequence of 13.9 kilobases (kb) of the 17.1-kb mitochondrial genome of Mytilus edulis has been determined, and the arrangement of all genes has been deduced. Mytilus mitochondrial DNA (mtDNA) contains 37 genes, all ofwhich are transcribed from the sameDNA strand. The gene content of Mytilus is typically metazoan in that it includes genes for large and small ribosomal RNAs, for a complete set of transfer RNAs and for 12 proteins. The protein genes encode the cytochrome b apoenzyme, cytochrome c oxidase (GO) subunits 1-111, NADH dehydrogenase (ND) subunits 1-6 and 4L, and ATP synthetase (ATPase) subunit 6. No gene for ATPase subunit 8 could be found. The reading frames for the ND1, COI, and COIII genes contain long extensions relative to those genes in other metazoaq mtDNAs. There are 23 tRNA genes, one more than previously found in any metazoan mtDNA. The additional tRNA appears to specify methionine, making Mytilus mtDNA unique in having twotRNAMet genes. Five lengthy unassigned intergenic sequences are present, four of which vary in length from 79 to 119 nucleotides and the largest ofwhichis 1.2 kb. The base compositions of these are unremarkable and do not differ significantly from that of the remainder of the mtDNA. The arrangement of genes in Mytilus mtDNA is remarkably unlike that found in any other known metazoan mtDNA. - - T HE mitochondrial genome of metazoans is a closed circular DNA; only among hydrazoans (Hydra spp.) is this DNA linear (WARRIOR and GALL 1985). Although metazoan mitochondrial DNA (mtDNA) ranges in size from ca. 14-40 kb (SEDEROFF 1984; MORITZ, DOWLING, and BROWN 1987; SNYDER et al. 1987), its gene content is highly conserved. The same set of genes is usually present: 12 or 13 for proteins, one each for the small and large subunit ribosomal RNAs (s-rRNA and I-rRNA), and 22 for transfer RNAs (tRNAs;a sufficient number to trans- late the simplified mitochondrial genetic code). In addition, at least one extensive noncoding sequence is present which, in vertebrates (CLAYTON 1982, 1984) and in insects (CLARY and WOLSTENHOLME 1985), is known to contain elements controlling the initiation of replication and transcription. Size variation in mtDNA is usually due to differences in this noncoding sequence (BROWN 1985; HARRISON 1989), but is oc- casionally due to duplications of other portions of the genome (MORITZ and BROWN 1986, 1987; ZEVERING et al. 199 1). Although the nucleotide sequence of mtDNA changes at a rate as much as ten times higher than that of nuclear DNA (BROWN, GEORGE and WILSON 1979), the order in which the genes are arranged appears to change at a muchslowerrate (BROWN ' To whom correspondence should be addressed. Genetics 131: 397-412 (June, 1992) 1985). For example, among chordate mtDNAs there appears to be one basic arrangement (ANDERSON et al. 198 1, 1982; BIBB et al. 198 1; GADALETA et al. 1989; ROE et al. 1985; JOHANSEN, GUDDAL and JOHANSEN 1990) with only minor variations (PAABo et al. 1991 ; DESJARDINS and MORAIS1990, 1991; L. SZURA and W. BROWN, unpublished data); this within-phylum stability also appears to hold true among echinoderm (JACOBS et al. 1988; CANTATORE et al. 1989; DE GIORGI et al. 1991; HIMENO et al. 1987; SMITH et al. 1989, 1990; M. SMITH, personal communication) and arthropod mtDNAs (CLARY and WOLSTENHOLME 1985; BATUECAS et al. 1988; HSUCHEN, KOTIN and DUBIN 1984; DUBIN, HSUCHEN and TILLOTSON 1986; MCCRACKEN, UHLENBUSCH and GELLISSEN1987; UHLENBUSCH, MCCRACKEN and GELLISSEN 1987; D. STANTON, L. SZURA and W. BROWN, unpublished data), even though the chordate, echinoderm and arthropod arrangements differ substantially from each other. One exception to the conservation of gene arrange- ment within a phylum occurs in nematodes. Ascaris suum and Caenorhabditis elegans have gene arrange- ments that areidentical except for the location of an A + T-rich noncoding region (WOLSTENHOLME et al. 1987; OKIMOTO and WOLSTENHOLME 1990; OKI- MOTO, MACFARLANE and WOLSTENHOLME 1990), but that are radically different from that of a third ne-

Upload: ngobao

Post on 27-May-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

Copyright 0 1992 by the Genetics Society of America

A Novel Mitochondrial Genome Organization for the Blue Mussel, Mytilus edulis

Richard J. Hoffmann,*" Jeffrey L. Boore+ and Wesley M. Brownt *Department of Zoology and Genetics, Iowa State University, Ames, Iowa 5001 1-3223, and ?Department of Biology,

University of Michigan, Ann Arbor, Michigan 48109-1048 Manuscript received December 1 1, 199 1

Accepted for publication February 7, 1992

ABSTRACT The sequence of 13.9 kilobases (kb) of the 17.1-kb mitochondrial genome of Mytilus edulis has

been determined, and the arrangement of all genes has been deduced. Mytilus mitochondrial DNA (mtDNA) contains 37 genes, all of which are transcribed from the same DNA strand. The gene content of Mytilus is typically metazoan in that it includes genes for large and small ribosomal RNAs, for a complete set of transfer RNAs and for 12 proteins. The protein genes encode the cytochrome b apoenzyme, cytochrome c oxidase (GO) subunits 1-111, NADH dehydrogenase (ND) subunits 1-6 and 4L, and ATP synthetase (ATPase) subunit 6. No gene for ATPase subunit 8 could be found. The reading frames for the ND1, COI, and COIII genes contain long extensions relative to those genes in other metazoaq mtDNAs. There are 23 tRNA genes, one more than previously found in any metazoan mtDNA. The additional tRNA appears to specify methionine, making Mytilus mtDNA unique in having two tRNAMet genes. Five lengthy unassigned intergenic sequences are present, four of which vary in length from 79 to 119 nucleotides and the largest of which is 1.2 kb. The base compositions of these are unremarkable and do not differ significantly from that of the remainder of the mtDNA. The arrangement of genes in Mytilus mtDNA is remarkably unlike that found in any other known metazoan mtDNA.

- -

T HE mitochondrial genome of metazoans is a closed circular DNA; only among hydrazoans

(Hydra spp.) is this DNA linear (WARRIOR and GALL 1985). Although metazoan mitochondrial DNA (mtDNA) ranges in size from ca. 14-40 kb (SEDEROFF 1984; MORITZ, DOWLING, and BROWN 1987; SNYDER et al. 1987), its gene content is highly conserved. T h e same set of genes is usually present: 12 or 13 for proteins, one each for the small and large subunit ribosomal RNAs (s-rRNA and I-rRNA), and 22 for transfer RNAs (tRNAs; a sufficient number to trans- late the simplified mitochondrial genetic code). In addition, at least one extensive noncoding sequence is present which, in vertebrates (CLAYTON 1982, 1984) and in insects (CLARY and WOLSTENHOLME 1985), is known to contain elements controlling the initiation of replication and transcription. Size variation in mtDNA is usually due to differences in this noncoding sequence (BROWN 1985; HARRISON 1989), but is oc- casionally due to duplications of other portions of the genome (MORITZ and BROWN 1986, 1987; ZEVERING et al. 199 1).

Although the nucleotide sequence of mtDNA changes at a rate as much as ten times higher than that of nuclear DNA (BROWN, GEORGE and WILSON 1979), the order in which the genes are arranged appears to change at a much slower rate (BROWN

' To whom correspondence should be addressed.

Genetics 131: 397-412 (June, 1992)

1985). For example, among chordate mtDNAs there appears to be one basic arrangement (ANDERSON et al. 198 1, 1982; BIBB et al. 198 1; GADALETA et al. 1989; ROE et al. 1985; JOHANSEN, GUDDAL and JOHANSEN 1990) with only minor variations (PAABo et al. 1991 ; DESJARDINS and MORAIS 1990, 1991; L. SZURA and W. BROWN, unpublished data); this within-phylum stability also appears to hold true among echinoderm (JACOBS et al. 1988; CANTATORE et al. 1989; DE GIORGI et al. 1991; HIMENO et al. 1987; SMITH et al. 1989, 1990; M. SMITH, personal communication) and arthropod mtDNAs (CLARY and WOLSTENHOLME 1985; BATUECAS et al. 1988; HSUCHEN, KOTIN and DUBIN 1984; DUBIN, HSUCHEN and TILLOTSON 1986; MCCRACKEN, UHLENBUSCH and GELLISSEN 1987; UHLENBUSCH, MCCRACKEN and GELLISSEN 1987; D. STANTON, L. SZURA and W. BROWN, unpublished data), even though the chordate, echinoderm and arthropod arrangements differ substantially from each other.

One exception to the conservation of gene arrange- ment within a phylum occurs in nematodes. Ascaris suum and Caenorhabditis elegans have gene arrange- ments that are identical except for the location of an A + T-rich noncoding region (WOLSTENHOLME et al. 1987; OKIMOTO and WOLSTENHOLME 1990; OKI- MOTO, MACFARLANE and WOLSTENHOLME 1990), but that are radically different from that of a third ne-

398 R. J. Hoffmann, J. L. Boore and W. M. Brown

matode, Meloidogyne javanica (OKIMOTO et al. 1991). Whether this difference is due to a change in the rearrangement rate or to a very ancient split in the lineages leading to these taxa is unknown, although such a split in nematodes has been postulated (POINAR 1983). Neither of the nematode gene arrangements shows an affinity with that of any other metazoan mtDNA. At present, nematodes are the only phylum known in which representatives of different subgroups fail to share a basic mitochondrial gene arrangement, and they are further distinguished from other meta- zoans by their lack of a mitochondrial gene for ATPase8 (OKIMOTO et al. 199 1).

The slow but perceptible rate of rearrangement and the very large number of arrangements possible suggest that mitochondrial gene order may provide useful information about the phylogeny of distantly related metazoan groups. Although present data pro- vide encouragement for this approach, and methods for analyzing the evolutionary relationships among gene rearrangements are being developed (SANKOFF, CEDERCREN and ABEL 1990), only a few groups have been investigated and data from many more are needed for even a preliminary assessment. T o this end, we have determined the mitochondrial gene arrangement for a bivalve mollusk, the blue mussel Mytilus edulis, the first such reported for any mollusk. Mytilus mtDNA is especially interesting in that it is commonly inherited biparentally (HOEH, BLAKLEY and BROWN 1991), providing a possible opportunity for recombination. Its gene arrangement is unlike that reported for any other metazoan mtDNA. Like nem- atodes, Mytilus apparently lacks a mitochondrial ATPase8 gene. Further, Mytilus mtDNA contains a unique tRNA gene which appears to specify methio- nine and, thus, has two tRNA genes for this amino acid.

MATERIALS AND METHODS

A X EMBL3 vector containing an entire Mytilus mtDNA was a gift of D. 0. F. SKIBINSKI. The mtDNA had been inserted into X EMBL3 DNA using a unique BamHI site, as described by SKIBINSKI and EDWARDS (1 987). Although the mtDNA was isolated from Mytilus galloprovincialis-like mus- sels, it appears to be M . edulis mtDNA, perhaps acquired by hybridization between the two species (see below). Frag- ments produced by digesting DNA from the X EMBL3 clone with combinations of BamHI, XbaI, EcoRI, XhoI and NheI endonucleases were subcloned into pBluescript I1 KS(-) plasmids (Stratagene), producing five nonoverlapping sub- clones that jointly comprise the entire mtDNA, as well as additional overlapping subclones. Further subclones of the largest pBluescript I1 subclone were made using unique NheI, NcoI, BstXI and PstI restriction sites found within it. Detailed cleavage maps of each of the five original nonov- erlapping subclones were constructed using combination digests of restriction endonucleases.

Double stranded DNA sequencing reactions employed modified T7 DNA polymerase (Sequenase 2.0, United

States Biochemical Gorp.), used according to supplier's in- structions. [55S]dATP (Amersham) was used to label the products, which were separated on 5-6% polyacrylamide gels containing 8 M urea and cast with wedge spacers. Initially, sequences from each end of each subcloned frag- ment were obtained using primers complementary to the T7 and T 3 priming sites in pBluescript 11; sequences inter- nal to those were obtained using synthetic 17 nucleotide (nt) primers designed on the basis of the sequences obtained. Some automated sequencing, primarily to confirm se- quences manually determined from only one strand, was performed by the Iowa State University Nucleic Acid Facil- ity on an Applied Biosystems model 373A DNA sequencing system, following manufacturer's protocols. The subcloning and sequencing strategies are illustrated in Figure 1.

Computer analysis was performed with the University of Wisconsin Genetics Computer Group package of programs (DEVEREUX, HAEBERLI and SMITHIES 1984). Tentative iden- tifications of protein coding genes were made using FastA searches of the animal mtDNA nucleotide sequences in GenBank and confirmed using BestFit comparisons of in- ferred amino acid sequences. Because both its nucleotide and amino acid sequences are known to be poorly conserved, ND6 gene identification was further confirmed by hydrop- athy profile comparisons (KYTE and DOOLITTLE 1982) with the ND6 proteins of Drosophila and the sea urchin Paracen- trotus. Final amino acid alignments were made using Gap with program defaults (gap weight = 3.0, gap length weight = 0.1). The s-rRNA gene was identified by its sequence similarity with known s-rRNA genes, and ends were assigned by adjacency to inferred tRNA genes. The I-rRNA gene has no directly adjacent tRNA gene to define its 3' end and, therefore, its approximate 3' end was designated as the average position of the end of its sequence similarity with several known metazoan mitochondrial I-rRNA genes. Se- quences encoding tRNA genes were identified generically by their ability to fold into typical metazoan mitochondrial tRNA structures, and specifically by their anticodon se- quences.

RESULTS AND DISCUSSION

The total length of this Mytilus mtDNA is 17.1 kb, as determined by the sequence data combined with the restriction fragment size estimates for the few unsequenced portions. The cleavage map generated for this cloned mtDNA appears to be identical to that for the M. edulis mtDNA that was designated A in the report by HOEH, BLAKLEY and BROWN (1991). We did not map BclI sites in this study, but all other cleavage sites reported in that study appear identical to those we found. Comparison of 588 nt from the 5' and 3' ends of the cloned ND4 gene with ND4 se- quence data obtained by S. MCCAFFERTY (personal communication) from two mtDNAs representing os- tensibly pure populations of M. edulis and M. gallo- provincialis revealed differences at 32 positions; the cloned DNA sequence was identical with M. edulis at 20, with M . galloprovincialis at five, and differed from both at seven. Although the original X clone was produced using DNA from M. galloprovincialis-like mussels (SKIBINSKI and EDWARDS 1987), the ND4 sequence and the cleavage map similarities suggest that the cloned mtDNA is derived from M. edulis, and

Mytilus mtDNA Organization

Mytilus edulis

I-rRNA COlI COIIl N D 3 ATPsscd ND6 s-rRNA

c ND4L

1 1 1 1 1 1 1 1 ~ 1 1 ~ ~ ~ ~ ~ ~ 1

0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7

Kilobase Pairs

399

FIGURE 1.-Map of the genes con- tained in the M . edulis mitochondrial ge- nome, together with subcloning and se- quencing strategies. The cleavage map shows only those sites employed in sub- cloning. The original genome was cloned into X EMBLS using a unique BamHl site, which appears at both ends of the linear- ized map illustrated here. Ranges encom- passed by the pBluescript I1 subclones ap- pear immediately above the scale line, with the five nonoverlapping subclones shown as a single line marked by the cleavage sites used to make them. Arrows show the locations and directions of individual se- quencing runs. Locations of tRNA genes are marked by their single-letter amino acid codes. Alternative tRNA genes for methionine, leucine and serine are noted by subscripts designating the codons they recognize. Numbers indicate the quantity of nucleotides contained within five lengthy unassigned intergenic regions. The direction of transcription of all genes is left to right.

it is so identified in this study. This observation is not especially surprising, since hybridization between the two species is frequent (SKIBINSKI, AHMED and BEARD- MORE 1978) and could easily result in interspecific exchange of mtDNA. A caution in this assignment to M. edulis is that on-going analysis of 1-rRNA and ATPase6 sequence has so far failed to distinguish M. edulis from M . galloprovinicialis (R. KOEHN, personal communication).

Although 3.2 kb of the mtDNA remain unsequ- enced, the information from the 13.9-kb sequence obtained is sufficient to identify unambiguously all genes, to locate all gene boundaries and to sequence fully on both DNA strands 21 of 23 tRNA genes, all noncoding regions, plus unusual protein gene termini. The few nucleotides that were not determined by sequencing both strands of the two remaining tRNA genes are at their 3’ ends; thus, even if there are errors at these positions, the correct identification of the amino acid specificity of the two tRNAs is uncom- promised. Aside from the five noncoding regions dis- cussed in a following section, Mytilus mitochondrial genes either abut directly or are separated by short (1 - 15 nt) sequences, as in other metazoan mtDNAs.

Gene content: M. edulis mtDNA (Figure 1, APPEN- DIX I) contains 12 open reading frames (ORFs) that correspond to genes for cytochrome b (Cyt b) , cyto- chrome c oxidase subunits 1-111 (COI-COIII), NADH dehydrogenase subunits 1-6 and 4L (ND1-6 and ND4L), and ATPase6. The ATPase8 gene appears to be missing (see below), as is also the case in nematodes (WOLSTENHOLME et al. 1987; OKIMOTO and WOLSTEN- HOLME 1990; OKIMOTO, MACFARLANE and WOLSTEN-

HOLME 1990; OKIMOTO et al. 199 1). Both rRNA genes are also present. There are 23 regions that can be folded into structures that are characteristic for tRNA genes, one more than is present in other metazoan mtDNAs. The additional tRNA gene employs an an- ticodon (TAT) for methionine that is unique among animal mtDNAs, giving Mytilus two mitochondrial tRNAMet genes rather than the single one found in other metazoans. (This unique tRNA gene is herein designated tRNAy&, see below.) All of the genes are transcribed from the same DNA strand.

Unassigned DNA: A 1.2-kb region with no readily assignable function lies between the I-rRNA and the tRNATyr genes. This region contains several poly(A) tracts near the tRNATyr gene and one G + C-rich tract (27 of 35 nt; see APPENDIX I). These sequence features and the size of the region suggest it as a candidate for a role in initiating replication and/or transcription, but experimental data are needed before a functional assignment can be made. If the region does contain the origin of replication, it is much less A + T-rich (6 1 %) than the control region in Drosophila, which is >90% A + T (CLARY and WOLSTENHOLME 1985).

This 1.2-kb region plus the adjacent 3’ end of the I-rRNA gene contain six ORFs that begin with the metazoan mitochondrial initiation codons, ATG, ATA or ATT, end with a termination codon and are >40 codons long (Segment #1, APPENDIX I). One ORF, of 11 1 codons, begins 13 nt beyond the 3’ end of the I-rRNA gene, on the same mtDNA strand as the known protein coding genes. The remaining five are on the opposite DNA strand. They range in size from 47 to 170 codons and overlap the 11 1 codon

400 R. J. Hoffmann, J. L. Boore and W. M. Brown

TABLE 1

Stabilities (kcal/mol) of predicted RNA secondary structures (ZUKER and STIEGER 1981) for five unassigned intergenic

sequences and a 5’ extension to the NDl reading frame from M. edulis mitochondrial DNA

Energy

True Randomized Region sequence sequences

1.2-kb unassigned sequence -277.3* -253.5 f 4.5 Cyt b to COII -19.1* -16.4 f 1.3 ND3 to tRNA:& -9.8 -11.0 f 1.2 tRNA:h to CO1 -20.6* -16.6 f 1.9 COIIl to tRNA!$,, -48.8* -39.1 f 2.1 ND1 5’ extension -30.8 -32.7 f 2.0

Stabilities of the true sequences are compared to the means f 9.5% confidence intervals of 10 random sequences of nucleotides with the same nucleotide composition to aid in assessing signifi- cance. The 5 ‘ extension to the COI and the 3’ extension to the < : 0 1 1 I reading frames are included with their flanking unassigned sequence.

* Significantly more stable than random, in that they fall outside o f the 95% confidence intervals for 10 randomized sequences with the sane nucleotide composition as the native sequence.

ORF, each other, and/or the 3’ end of the 1-rRNA gene extensively. None of the ORFs or their inferred protein products show any similarity to sequences in the databases, and none appears to correspond to the ATPase8 gene (see below). Because of this, and be- cause of the extensive overlaps with each other and the 1-rRNA gene, we doubt that any of these ORFs represent functional genes. However, without direct experimentation no strong conclusion is possible.

Optimal folding of the 1.2-kb region by the method of ZUKER and STIEGER (1 98 1) yields a stable secondary structure (Table 1). Comparison of its free energy value with the mean energy value for 10 randomized sequences with the same nucleotide composition and size suggest that the true sequence is significantly more stable than a random collection of nucleotides having the same nucleotide composition (Table 1). The predicted secondary structure of the 1.2-kb re- gion contains several long potential stem and loop structures. There is evidence that one such structure mediates initiation of lagging strand synthesis during vertebrate mtDNA replication (WONG and CLAYTON 1985), and it has been speculated that others might serve as recognition signals for replication, transcrip- tion, and RNA processing (BROWN 1985; FORAN, HIXSON and BROWN 1988; JACOBS et al. 1988).

There are four shorter noncoding regions in addi- tion to the 1.2-kb region just discussed. One, of 115 nt, is between Cyt b and COII; another, of 79 nt, lies between the genes for COT11 and tRNAy&; and two, of 88 and 119 nt, flank the tRNA& gene at its 5’ and 3’ ends, respectively (Figure 1, APPENDIX I). Like the 1.2-kb region, each contains sequence elements capable of secondary structure formation (Table 1)

and, thus, could be potentially important for process- ing or transcription. The sequences between the Cyt b and COII genes and between the tRNA& and COI genes fold into structures that are significantly more stable than random. The predicted structure at the 5’ end of the tRNA& gene, however, is not signifi- cantly different in stability than random. Finally, the 79-nt sequence between the end of the COIII reading frame and the beginning of the tRNA%$ gene could be included in a stable structure including a 3’ exten- sion of the COIII gene (discussed below). Assessing biological significance of these structures, if any, re- quires further investigation. In the aggregate, these four smaller unassigned regions represent more non- coding nucleotides outside of the control region than has been found previously. Finally, because of uncer- tainties about the exact lengths of some of the protein genes (discussed below), there may be additional non- coding regions in Mytilus mtDNA.

A missing ATPase8 gene: Release 17 of the Swiss- Prot database includes 13 entries for ATPase8 genes of metazoan mtDNAs: eight from vertebrates, two from echinoids and three from insects. There is very little sequence conservation among these, even at the amino acid level, which, when combined with the small size of the gene (159-204 nt), makes its identi- fication exceedingly difficult. A few features aid in its recognition, however. In 12 of the 13 entries, ATPase8 is inferred to begin with Met-Pro-Gln (in some, the initial codon used for Met is a nonstandard one), and Leu follows Gln in all but the three insect proteins, which have Met instead. All 13 are identical in having Trp as the sixth amino acid following Pro. Additionally, all but the five mammals have the se- quence Trp-X-Trp at or within one amino acid of the C terminus. Examination of hydropathy profiles (KYTE and DOOLITTLE 1982) indicates that all of the proteins are hydrophobic over the amino terminal half of their length and hydrophilic over the C ter- minal half. Finally, for all metazoan mtDNAs in which an ATPase8 gene has been identified, it immediately precedes (and sometimes overlaps in another reading frame) the gene for ATPase6. In human cells, both gene products are translated from a single mRNA (OJALA, MONTOYA and ATTARDI 198 1).

Translation of all portions of the sequence from Mytilus mtDNA in all possible reading frames and examination of all inferred polypeptides, including those with nonstandard initiation codons, yielded one with an initial Met-Pro-Gln-Leu sequence. This read- ing frame is 34 codons long, lacks any Trp codons, has a hydropathy profile that is different from those of all other ATPase8 proteins, and lies entirely within the ND2 gene but on the opposite DNA strand. The initial sequence Met-Pro-Pro-Gln-(X)5-Trp occurred once, in an ORF of only 21 codons that also lacks the

Mytilus mtDNA Organization 40 1

characteristic hydropathy profile of ATPase8. Allow- ing for the possibility of a nonstandard initiation co- don by searching for Pro-Gln-(X)5-Trp failed to reveal additional candidates. Two open reading frames with the sequence Trp-X-Trp within one codon of a ter- mination codon were found, but neither had the other characteristic features of ATPase8. None of the six ORFs in the 1.2-kb unassigned region discussed above has the characteristics of ATPase8. Finally, there is no reading frame with any of these characteristics in the vicinity of the ATPase6 gene. In summary, we have been unable to find evidence of an ATPase8 gene in the sequence obtained from Mytilus mtDNA and, barring the unlikely possibility that it is entirely within an unsequenced portion of one of the other protein genes, we conclude that, like nematodes, My- tilus lacks a mitochondrial ATPase8 gene.

Although not resolvable with present data, at least two interesting and testable evolutionary questions are worth noting in this regard. One concerns the fate of the missing ATPase8 gene. Has it been transferred to the nucleus, or has its function been subsumed by a different nuclear-encoded protein that is trans- ported into the mitochondrion, or is that function somehow dispensable for some organisms, such as Mytilus and nematodes? The other concerns the shared absence of this gene among the bivalve and nematode species examined. Does this reflect two independent losses from two separate lineages or one loss from a taxon that was ancestral to both bivalves and nematodes? Both questions can be addressed by further investigation.

Comparisons of gene order: As Figure 2 shows, Mytilus mtDNA is organized in a radically different fashion from that of Drosophila yakuba, the only pro- tostome for which a complete mtDNA sequence is available (CLARY and WOLSTENHOLME 1985). Few genes share the same boundaries in both; if tRNA genes are ignored, only the s- and 1-rRNA genes are found in the same relative order and transcriptional polarity. However, in Drosophila (and in other arthro- pods and chordates as well) the two rRNA genes are separated by one tRNA gene, for valine, whereas in Mytilus they are separated by seven tRNA genes, none of which specifies valine. With regard to mitochon- drial gene order, there is substantially less similarity between Mytilus and other representative protos- tomes than between those of protostomes and deuter- ostomes, in marked contradiction to expectations based on an assumption of monophyly among protos- tomes.

The only other metazoans that have a spiral pattern of cleavage during development and for which a com- plete mitochondrial gene order is published are two nematodes, Ascaris m u m and Caenorhabditis elegans ( WOLSTENHOLME et al. 1987; OKIMOTO and WOLSTEN-

HOLME 1990; OKIMOTO, MACFARLANE and WOLSTEN- HOLME 1990). Comparing Mytilus with these, only the ND5 and ND6 genes have the same relative orienta- tion, but there are three intervening tRNA genes in nematodes and none in Mytilus. Although these two genes are also adjacent in chordates and echinoderms, their relative polarities are opposite, whereas they are the same in Mytilus.

Partial gene orders are known for a few other taxa with spiral cleavage. Although all of its tRNA genes have not been mapped, a third nematode mtDNA, from Meloidogyne javanica (OKIMOTO et al. 1991), shares little of its order with the other two nematodes or with Mytilus. The ND2 and COIII genes are adja- cent in both Mytilus and Meloidogyne, but their rel- ative polarities are different. Although its full mtDNA organization is unknown, in the flatworm Fasciola hepatica the genes for ND3 and COI are separated by those for tRNAp& and tRNATyr (GAREY and WOL- STENHOLME 1989); these two protein genes are in the same order and orientation in Mytilus, but with a tRNAgEN gene and two noncoding regions between them. Finally, no two mitochondrial protein or rRNA genes share the same boundaries in Mytilus and the sea anemone Metridium senile (D. WOLSTENHOLME, personal communication).

If tRNA genes are included in the comparisons, a few additional common organizational features are revealed among the mtDNAs of Mytilus and other metazoans. The COII and tRNALys genes are directly adjacent and have the same relative polarity in Myti- Ius, in Drosophila and several other arthropods, in echinoderms, and in chordates. Also, the 5’ end of the ND1 gene is preceded by the gene for tRNA&R in Mytilus, in chordates and in echinoderms; regard- ing this, we note that the gene for tRNA& is up- stream of the ND1 gene in several arthropod mt- DNAs, which suggests the possibility that the tRNAEN and tRNAE’,”, genes might be homologous in some lineages and simply have “exchanged” anti- codons reciprocally. The tRNATh‘ gene is upstream of the ND4L gene in both Mytilus and Drosophila, albeit with reversed polarity in Drosophila. The 5’ end of the s-rRNA gene is bounded by the tRNAPhe gene in Mytilus, in chordates, and in echinoderms. In Mytilus and echinoderms, the tRNATrp gene is fol- lowed by that for tRNAA1”, but with their relative polarities switched; in chordates, these two tRNA genes are also adjacent with opposite polarities, but in the reverse order. The tRNA?& and tRNAHis genes are adjacent and have the same polarity in Mytilus, chordates and echinoderms, but the order is 5’- tRNASer-tRNAHis-3’ in Mytilus and 5’-tRNAHis- tRNASer-3’ in the others. The tRNAG1” gene follows the tRNA”‘ gene in Mytilus, Drosophila and chor- dates, but the genes are on opposite strands in the

402 R. J. Hoffmann, J. L. Boore and W. M. Brown

Mytilus edulis ATPase6 ND4L ND6 srRNA I rRNA

I s rRNA

H 1 kb // ATPase6 I c k ?,, ATPase8

cot1

Drosophila yakuba latter two. Finally, the genes for tRNAMet and ND2 are directly adjacent in Mytilus, chordates, and Dro- sophila (but not locusts); however, the tRNAMet gene in Mytilus is the one with the anticodon TAT, which has not been found in other animal mtDNAs.

Initiation and termination of translation: Al- though uncertainty about the exact positions of the 5’ ends of the COI and ND1 genes makes a reasonable assignment of their initiation codons difficult, it is less so for the remaining protein genes. Initiation of trans- lation at ATG codons seems clearly indicated for the COII,COIII,Cytb,ATPase6,ND2-ND4,ND4Land ND6 genes. However, ND5 appears to initiate at ATA, because (1) this codon begins 11 nt immediately 3’ of the termination codon for ND4L, (2) this results in a predicted protein of the same length as in Dro- sophila, and (3) the first in-frame ATG is 98 codons downstream. Initiation of COI and NDl , discussed in detail below, could also begin at ATG, although other in-frame codons are also available.

The mitochondrial termination codon TAA is found at the 3’ ends of the reading frames for COI, COIII, Cyt b , ND3, ND4, ND4L and ND6, and the alternative termination codon TAG occurs at the ends of the genes for COII, ATPase6 and ND2. Two genes, ND1 and ND5, may end with an abbreviated termi- nation codon, TA, which is presumably completed by polyadenylation of their mRNAs. For ND1 this must be so, because the nucleotide following the 3‘-TA is T, which is the first nucleotide of the tRNA””’ gene. The case for ND5 is less clear; the nucleotide follow- ing the 3”TA is A, which would complete the ND5

ND4L \ ND6

FIGURE 2.-Comparison of mitochon- drial gene organization between M. edulis and D. yakuba. The circular genomes were linearized and aligned at the Cyt b gene. Lines connect identical large genes, and el- lipses with arrows highlight reversals in po- larity. Positions of tRNA genes, indicated by cross hatching, have been ignored in this comparison. Arrows under gene labels show direction of transcription. The bars spanning the s-rRNA and the I-rRNA genes show the only two large genes that share the same orientation and transcriptional polarity be- tween the genomes. Stippled areas in the Mytilus genome show regions of unassigned function. The D. yakuba gene order is from CLARY and WOLSTENHOLME (1985).

stop codon, but this same A must also participate in the presumptive ATG initiation codon for ND6. While it is possible that ND5 and ND6 do not overlap and that their transcripts are processed by cleavage between these adjacent As, several alternatives that allow ND5 and ND6 to overlap and share this A also exist, such as the initiation of translation from alter- nate start codons on a common, digenic mRNA, al- ternative (semidestructive) RNA processing, or initi- ation of ND6 translation from a codon further down- stream. The present data do not allow us to choose among these alternatives. (Depending upon the status of a 3’ extension of the COIII gene, discussed below, it may terminate with a single T.)

Protein genes: Those portions of the protein genes that were sequenced were translated and their amino acid sequences compared with those of other animals. Most are similar in sequence and size to their homo- logs in other species. However, some Mytilus proteins appear to be longer or shorter than their Drosophila counterparts, and uncertainties about initiation sites of some genes confound inferences about their lengths.

Based upon alignment of conserved amino acids near its ends, the COI gene of Mytilus is potentially longer at both its 5’ and 3’ ends than it is in Drosoph- ila (APPENDIX I). If the amino terminus of COI is the same in both species, translation in Mytilus must start at a TGG codon, which normally specifies tryptophan. The nearest in-frame methionine codon (ATA) occurs nine codons upstream from this TGG; we note in this regard that the inferred start of COI in nematodes is

Mytilus mtDNA Organization 403

also 9-1 0 amino acids upstream from that in Dro- sophila (OKIMOTO, MACFARLANE and WOLSTENHOLME 1990). However, the ORF for COI in Mytilus actually begins with an ATT codon that is 22 codons upstream of the TGG codon just mentioned. ATT, which or- dinarily specifies isoleucine, is an inferred initiation codon in Drosophila mtDNA (CLARY and WOLSTEN- HOLME 1985) and also in mouse mtDNA, where any ATN codon apparently can initiate translation (BIBB et al. 198 1). In Mytilus, three methionine codons (both ATA and ATG) are just downstream from this ATT, and each could serve as a start site. As a final compli- cation, there is evidence that G T T is an initiation codon for one of the protein genes in nematodes (OKIMOTO, MACFARLANE and WOLSTENHOLME 1990), and in Mytilus there is an in-frame GTT three codons upstream of the TGG codon that corresponds to the position of the start codon for COI in Drosophila.

Given the above uncertainties, we are unwilling to assign this sequence to the Mytilus COI gene. How- ever, we are also unable to suggest any alternative assignment for it. The sequence is adjacent to the 1 19- nt unassigned sequence that follows the tRNAg& gene, and it may simply be a part of that sequence that, purely by chance, is open and in-frame with the following COI gene. It is possible to form a stable RNA secondary structure that includes both the 1 19- nt unassigned sequence and all of the 5‘ extension of COI (Table 1). We note that it includes all but the last nucleotide before the first methionine codon (ATA) upstream of the start of the Drosophila protein in the COI reading frame, coincident with the length of COI in nematodes. This structure or another like it might serve as an RNA processing signal and thus result in the elimination of this region before transla- tion.

The COI gene also appears to be longer at its 3’ end in Mytilus than in Drosophila (APPENDIX I). The reading frame remains open to a termination (TAA) codon that is located 17 codons beyond the 3’ end in Drosophila. There are 9 nt between this TAA and the initiation codon of the ATPase6 gene. It may be worth noting that the second codon following the asparagine at the end of the COI alignment between the Mytilus and Drosophila is TAT, part of which could serve as an abbreviated termination signal, but there is no obvious reason to favor such an inference over one in which COI is 16 amino acids longer at its C terminus in Mytilus than in Drosophila. Moreover, no candidate stem-loop processing signals are pre- dicted to be formed in this region.

The apparent 3‘ extension of COI may be related to the absence of an ATPase8 gene in Mytilus mtDNA. As noted, the start of ATPase6 follows the termination (TAA) codon of COI by 9 nt. In all other metazoans, the ATPase8 gene precedes and some-

times extensively overlaps the 3’ end of the ATPase6 gene (BIBB et al. 198 1). If an ATPase8 gene was once present at this location, it is possible that part of it became attached to the COI gene after the ATPase8 gene became dispensable.

The Mytilus COIII gene may also be longer at its 3’ end than that of Drosophila. The first termination codon (TAA) in this gene is 48 codons beyond the C terminus of Drosophila (APPENDIX I). In Mytilus, a TGA (Trp) codon follows the last alignable residue (Gly) between Mytilus and Drosophila, the T of which could serve as an abbreviated termination signal. In this regard, two things are worth noting: One is that this region and the unassigned 79 nt following it in Mytilus have the potential to fold into stable stem- loop structures (Table 1) and thus may be eliminated during RNA processing. The other is that the COIII gene in Lasaea, another bivalve, lacks this 3’ extension (D. O’FOIGHIL and M. SMITH, personal communica- tion), as do all of the other COIII proteins in the databases. That the 3‘ ends of other COIII proteins are highly conserved in length makes it seem unlikely that the 3’ extension in Mytilus is actually translated.

In contrast to the ambiguities just described, the NDl gene of Mytilus is almost certainly truncated at its 3’ end. Although overlap with the downstream tRNAVa’ gene cannot be ruled out absolutely, the dinucleotide TA, which could serve as an abbreviated stop codon for NDl, immediately precedes the first nucleotide of the tRNA””’ gene. Moreover, even if overlap occurred, an in-frame termination (TAA) co- don is only four codons downstream from the start of the tRNAVa’ gene. This 3’ truncation of ND1 results in an inferred protein that is 20 amino acids shorter than its Drosophila counterpart.

The converse may be true at the 5’ end of ND1, where a substantial addition, as compared with Dro- sophila, is possible. If the two NDl proteins had identical lengths at their N-terminal ends, translation in Mytilus would have to initiate with GTG (APPENDIX I). GTG is occasionally used as an initiation codon in bacterial systems (STORMO, SCHNEIDER and GOLD 1982), and a similar use in animal mtDNA has been suggested (CLARY and WOLSTENHOLME 1985; JACOBS et al. 1988; GADALETA et al. 1989; JOHANSEN, GUDDAL and JOHANSEN 1990; DESJARDINS and MORAIS 1990, 1991). However, if GTG can serve as a start signal in Mytilus, the NDl reading frame is open back to another GTG codon, which is the first codon down- stream of the tRNAf6, gene. In this context, GADA- LETA et al. (1 989) have observed that GTG appears to serve as an initiation codon only when it is directly preceded by the last nucleotide of the preceding gene. In Mytilus, this GTG is located at a position that is 54 codons upstream of the inferred 5‘ end of the Dro- sophila ND1 gene.

404 R. J. Hoffmann, J. L. Boore and W. M. Brown

However, RNA sequences in this 54-codon region have the potential to form a stable secondary structure (Table 1) that includes all of the nucleotides before the first GTG mentioned, and this could result in its complete or partial elimination from the final proc- essed transcript. However, the predicted structure is not significantly more stable than random. There are one ATG, two ATTs, two GTTs and another GTG in this region, any of which could serve as start sites if a portion of the region were eliminated. Also, there are three TTGs among the 54 codons, which might function as start codons in Mytilus as well as in nem- atodes (OKIMOTO, MACFARLANE and WOLSTENHOLME 1990). Although we are reluctant to assign this region to the ND1 gene, we have been unable to assign an alternative function to this DNA sequence.

The remaining differences in lengths of gene prod- ucts between Drosophila and Mytilus are minor. The most notable extension among these is at the 5' end of the Cyt 6 gene, which is somewhat longer than other Cyt b genes. There are no alternative start codons in the Cyt b reading frame, however. In addi- tion to truncations or extensions at gene termini, there are a few cases of internal sequence deletions/addi- tions. The longest of these, equivalent to 21 codons, is near the middle of the ND6 gene.

In conclusion, we have found evidence of relative size changes in several Mytilus genes, among which those involving internal deletions or additions are less ambiguous than those suggesting truncations or ad- ditions to the ends of the genes. As a caveat, it must be remembered that all such assignments depend on the validity of inferences drawn by comparative se- quence analysis and that any final conclusions about gene lengths must await independent confirmation by direct analysis of the gene products themselves.

Ribosomal RNA genes: The region between the genes for tRNAPhe and tRNAG'y shows extensive se- quence similarity to other metazoan mitochondrial s- rRNA genes. The Mytilus s-rRNA gene is 945 nt, similar in size to mammalian mtDNAs and consider- ably larger than in Xenopus (8 19 nt; ROE et al. 1985), echinoderms (ca. 880 nt; JACOBS et al. 1988; CANTA- TORE et al. 1989; DE GIORCI et al. 199 l), or Drosophila (ca. 790 nt; CLARY and WOLSTENHOLME 1985).

The boundaries of the 1-rRNA gene are less certain, especially at the 3' end where there is no directly adjacent tRNA or protein gene. The computer pro- gram Gap was used to compare the final 300 nt at the 3' ends of six different 1-rRNA genes (those of Dro- sophila, Xenopus, Paracentrotus, mouse, human, and cow) against the region in Mytilus mtDNA bounded by positions 2222 and 3800 of Segment #1 in APPEN- DIX I . This region includes the entire 1-rRNA gene, plus approximately 300 nt of the downstream unas- signed sequence. The nt positions where the individ-

ual alignments ended were averaged to estimate the approximate 3' end of the gene. By this procedure, the average position of the end of similarity was found to lie at position 3465 of Segment #1 (APPENDIX I). From this, we estimate the size of the Mytilus 1-rRNA gene to be approximately 1245 nt, which, if accurate, makes it the smallest on record (Drosophila's is 1326 nt). However, the largest (1.2 kb) unassigned region is adjacent to the 1-rRNA gene in Mytilus, and it is possible that 100-400 nt of this region are part of the mt-l-rRNA gene. If so, however, sequence similarity between Mytilus and other species must be weak.

Transfer RNA genes: The 23 inferred Mytilus tRNA genes can be folded into the typical cloverleaf secondary structures shown in Figure 3. Structurally, these genes are similar to most other metazoan mt- tRNAs. Mismatches are present in a few stems, G-T pairs (= G-U in RNA) are common in stems, and all genes lack a 3' terminal CCA. The aminoacyl-accept- ing stem is usually of seven nt pairs, although there is the potential for 8 to 10 pairs in a few genes (those for Arg, Leu(CUN), Leu(UUR), Lys, and Ser(AGN); extra nucleotides not shown in Figure 3), and a few also have aminoacyl stems with mismatched termini (those for Cys, Lys, Ser(UCN) and Val; Figure 3). The dihydrouridine (DHU) arm consists of a stem of 3 or 4 paired and a loop of three to 9 unpaired nt. The T#C arm has a stem of 3-5 paired and a loop of 3-7 unpaired nt. The variable arm is of either 4 or 5 nt. The anticodon stem is of 5 nt pairs, with a few genes having one of these pairs mismatched. The anticodon loop is of 7 nt, with the possibility for pairing across the loop in some cases. Anticodons are preceded by T, except in the gene for tRNAy&, which has a C at this position. Anticodons are followed by either A or G with approximately equal likelihood.

There are two leucine tRNA genes (tRNA&N and tRNA&,), as in other animal mtDNAs. These are directly adjacent in Mytilus, in a cluster of tRNA genes that follows the COII gene. There are also two serine tRNA genes, tRNA?& and tRNA%. As in other metazoan mtDNAs, tRNAF&N has an unpaired loop that replaces the DHU arm. In Mytilus, this loop is of 13 nt, the longest so far reported. There is potential for some nucleotide pairing across the loop and also between its 3' end and the variable arm. Aside from the novel tRNAMet gene, the anticodons are the same as in Drosophila mtDNA, except that the anticodon for tRNALys is TTT in Mytilus, as it is in chordate and nematode mtDNAs.

A second tRNA gene specifying methionine? In all previously characterized metazoan mtDNAs there is a single tRNA"' gene that presumably can be charged with either N-formyl-methionine and used for translation initiation, or with methionine and used at internal codons. In mouse, its anticodon (CAT) is

Mytilus mtDNA Organization 405

ARGININE ASPARAGINE ASPARTIC ACID

1 1 - u 1 1 A - 1 A - 1 1 - 1

CYSTEINE

1 1 1 6 - C C - G 1 - u 1 - u

ALANINE

U G l 1 - A

1 G l G - C

A - 1 G - C a - r G T C - G

C - G C - G , - I A - 1

u - T 1 - A G - C

U G l 1 T l C G G

A U G C 1 A a I l l 1

u 1 1 1 c

U U U U G 1 U U I

A - T U 1 - 1 1 - 1 G l G - C

1 I l l 1 T C

1 - u 1 - A C U

1 U C U C A A A l I / G

C C G G 1 G G G A

G G C G A A 1 U G

1 - u u 1 - A 1 - u G - C

1 c A - 1

1 G T C G

G I l l G 1

A - I\-

1

A - 1

1 G G U A C G - C 1

1 A l T G C C l l 1 A 1 1 1 1

1 T A U C 1 1

A A / I l l A

1 U T G

A - 1 G - c 1 T 1

C G T l A 1 C A G U 1

G l C l G

.~

1 1 A I I I I 1 I l l 1 A U

G U l l G

T T A A C 1 A A 1 c

C 1 1 1 G

1 u u u c

u u u

1 I l l 1 A G

1 -

G C A G U I l l

T i

U u u 1 - U A

u - 1 G - C 1 - 1 1 - 1

1 G 1 A

G T C

HISTIDINE

A 1 - 1 u - 1 1 - 1 A - 1 A - 1

U T A - 1 A 1 - 1 G - C A - 1 1 - 1

C U T A

G C U

1 - u G - C A - 1 G l

T U c u

G l l

GLYCINE

1

C - G G - C

u - 1

T U 1 1

T G C

GLUTAMIC ACID GLUTAMINE ISOLEUCINE

A A - 1 6 - C

G - C T -I\

C - G

G T C - G 1 - A 1 - A A - 1 u - 1 G - C G

1 1 C l l u G 1 1 1 G U G U U G

G U A U C 1 U U A

1 - 1 G rl- 1

U A I I I I c

G I l l 1 c u

1 - A A - l 1 - u I - 1 u - 1 A - l

T - u u - T T G i - G

G C 1 1 1 l A

u - l 6 - C G A 1 - 1 U U T l l l C A T G U l l U

C l U A T 1 U

G 1 A l G

1 G l U C

1 / 1 1 c u 4 l l l l l ~ G U A A G U U A U l G

I l l 4 1 1 I l l c A G 1 1 1 U l 1

U G G G T

G T C C U 1

1 - u 1 G G

G - C C - G

T U 1 - 1

G G G I I I I / T U A U A l

1 I l l c A

U G A C G G I 1 - 4 u A - 1

U U A 1 - A G 1 - u 1 - u G - C

c u C - G

1 A T C C

G - C G - C u c

C l 1 u

T T G

I - 1 G - C G - C

G G

T G T G

G l G 1 G

G A l 1 A

1 1 C

METHIONINE IAUAI

1 G l 6 - C 1 - A u - 1 A - 1 d - T

METHIONINE IAUG)

1 G - C G l 1 - A u - 1 u - 1 1 - 1

LEUCINE ICUN) LEUCINE (UUR)

1

LYSINE

1 G U U U G C

1 A G G

G U C G

1 T G C 1 I l l

U U G c 1 1 G A

1 1

1 - a - c - 1 - 1 - 1 - c - G

1 1 C

A U G I l l

T A

1 1 1

- 4 - u - c - 1

U G

U G

1 1 A - 1 A - 1 G - C C - G T - "

C - G A - 1 C - G u - T u - 1 G T

1 - 1 , I ,. .

C G A G A A C G 1

G - C 1 1 C A A l A

C l l G l A 1 T C l G G l l U A T A 1 1 1 1 A A * A 1 1 1 1

G 1 1 C G A 1 / 1 1

A A A G C U A 1 G 1 1 1 1 1

A A G G U C G

G - C A G G l U

1 - A G

A - 1 A - T

A U

T - U A - T

G - c C G

G C

G

1 - A G

I I c

A

G l 1

C A c I 1 G

C G

u u 1 A T C

11 C I /

T

c - 1

A G G

- G T U 1 1 C l 1

A U G U A 1 / 1 1

1 1 c U

1 1 C l G A U G

A I l l C A C G

1 1 G C A I A

T

/ I 1 G G A l

1 A A C T

A - 1 C - G 6 - C 1 - 1

T G 1 c

T A A

PROLINE

A C - G

u - T T - u

G - C u - T

1 - 1 1 1

1 1 1 G U G A

A C T 1 1 G I \ 1 1 1

G I l l C l U 1 T l G

c U U U l U

A - 1 G - C G - C

c c G - C

1 G C A T

A - l A - 1

1 A C A

1 1 1

G l 4 - 1

C G C A

T U 1

THREONINE PHENYLALANINE SERINE IAGNI SERINE WCNl

U A - 1 u - 1 6 - C

c U G T G 1 - A

G - C C - G 1-0. 1 - u T - u

6 - C u - 1

u - 1 T G

G - C 1 - A I - , G - C c

G G

1

A 1 1 C G

1 U U G C

A U

T I l l 1 U G

1 C

- G 1 1

C C A

G G l I l l

T 4

U G G I

- G

1 - G

1

I." A - 1 1 - 1 c

A T l G U U

1 G 1

u c 1 I 1

1 U A G T G A T T A C C G A 1 G G U

U U l G G 1 C C l l l A C T 1 I l l 1 1 1 A 1 1 1 G

r G 1 1 . 1 C A A

1 U A l 1

U G 1 1 C G

1 U U G C A G A U

A - 1 G

1 4 I l l 1

c I l l 1 A A C A

U T C 6 A G A G

c G l T - A A

U A U 1 - A G 1 - u G T 6 - C

1 A C - G

1 G

G T 1 G 1 G

C G 1 - u

1 G T G U

C - G G - C

C - G G l

c u

1 - 4 C - G u - 1

1 A 1 A

G U U 1 u

1 G T T G G

TRYPTOPHAN

G u - l u - 1 G - C A - 1 1 - 1

TYROSINE VALINE

1 Ti G - C 1 - A

A C - G u - 1 1 - A

u - 1 G - C 1

T T C C U T A

G - A -

T G

G G

G - 1 -

A - G -

C T

c 1

1 1

A U c

A A

T U A c 1 C

/ I

U U

U C C G U

G G C 1 I l l

G

G U

1 A

I T A

T U 1 I l l

U A 1 1 c U 1

1

- c - 1 1

G l C l G

C U G U l 1 A G

G - 1 1 - 4 - u - G - 1

I I I I

C U

U T A U

C G A

c C G

G l I

, . L T

1 - A I ,

, .. I ,

G - C 1 - u

T U T U

1 C A G l U T A C

FIGURE 3.--Sequences of 23 tRNA genes from M. edulis folded into cloverleaf forms expected for their gene products. Amino acid identities are given above each sequence. Watson-Crick pairing is indicated by solid lines; G-T pairs are shown with dots.

406 R. J. Hoffmann, J. L. Boore and W. M . Brown

inferred to recognize any initial ATN codon (BIBB et al. 1981). In Mytilus mtDNA, a tRNAMet gene with a CAT anticodon is present and located in a cluster of four tRNA genes, between tRNALys and tRNAgN (Figure 1, APPENDIX I).

Unlike other metazoan mtDNAs, that of Mytilus appears to encode a second tRNAMet gene. Located immediately 5’ of the ND2 gene, it has the anticodon T A T (Figures 1 and 3), which presumably recognizes ATA codons most efficiently. ATA is a normal Met codon only in mitochondria (PEEBLES, GEGENHEIMER and ABELSON 1983); it specifies isoleucine in the cy- toplasm. However, it is unlikely that ATA specifies isoleucine in Mytilus mtDNA, since both ATA and ATG codons are employed at conserved methionine positions of mitochondrial proteins in both Drosophila (CLARY and WOLSTENHOLME 1985) and Mytilus.

The sequence and predicted structure of this novel tRNA are unexceptional; a C precedes the anticodon, but this is also the case in various mt-tRNA genes of other metazoans. Although the conclusion that this gene is transcribed and that its product is functional must await experimental verification, there is no se- quence-based reason to doubt that this is true. We therefore refer to this gene as to distinguish it from the second tRNAMet gene in Mytilus, which we call tRNA!&. Sequence comparison suggests that the two Mytilus tRNAM“ genes arose by duplication and subsequent divergence. At their 5’ ends the genes are identical at 15 of the first 16 positions and, if the genes are aligned by regions (excluding the T$C loop, which cannot be aligned), only 23 positions are differ- ent between them. Gene duplication has been thought to be involved in high rearrangement rates for mt- tRNA genes (e .g . , JACOBS et al. 1988; PAABo et al.

A tRNA!& gene is not only unusual among animal mtDNAs, but among all organisms. The only other tRNA with this anticodon (UAU) listed in SPRINZL et al.’s (1 99 1) compilation of all known tRNAs occurs in yeast, where it specifies isoleucine. It is possible that a few directly sequenced tRNAs have UAU anticodons, but in all cases examined the nucleotide in the anti- codon wobble position is modified and of uncertain derivation. For example, it is either U or C in Halo- bacterium volcanii (GUPTA 1984). Other tRNAs with NAU anticodons include those from phage T 4 (GUTH- RIE and MCCLAIN 1979), Escherichia coli (HARADA and NISHIMURA 1974), spinach chloroplasts (FRANCIS and DUDOCK 1982; KASHDAN and DUDOCK 1982), and potato mitochondria (WEBER et al. 1991). However, the sequences of the tRNA genes in the latter two cases show that CAT is the anticodon and, thus, rule o u t U as the original nucleotide at the modified wob- ble position in those cases.

Even if tRNA!fitA is expressed in Mytilus, it is un-

199 1).

clear how or whether its function would differ from that of tRNA!&. Both ATA and ATG codons are present at internal positions in Mytilus protein genes, and presumably each would be most efficiently rec- ognized by its cognate tRNA. Similarly, a differential function of the two tRNAs in initiation seems unlikely, since both ATG and ATA appear to be used as initiation codons in Mytilus.

Genetic code and codon usage: The mitochondrial genetic codes of Mytilus and Drosophila appear to be identical, with TGA specifying tryptophan, ATA me- thionine, and AGA and AGG serine. With the possible exception of the NDl gene, for which an initiation codon cannot presently be assigned with certainty, it appears that ATT is not used as an initiation codon in Mytilus. Although in vertebrate mtDNAs, AGA and AGG are unused and may be termination codons (e.g., see BIBB et al. 1981), both are used in Mytilus and Drosophila mtDNAs, and the positional similarity of this usage argues that AGA and AGG are serine codons in both cases.

A summary of usage for the 2824 codons sequenced (Table 2) indicates no pronounced codon bias in My- tilus. No unused codons occur in any codon family, and no third position bias is evident. C, G, A and T occur in the third position in 13.3, 23.6, 28.0 and 35.1 % of the codons, respectively, and these amounts hardly differ from those (14.0, 23.8, 28.0 and 34.2%) in the 13.9-kb portion of Mytilus mtDNA sequenced. Excluding termination codons, those used 10 or fewer times are ACG (Thr), TCC (Ser), CGC (Arg) and CCC (Pro).

CONCLUSIONS

The size, sequence, cleavage map data and, as in- ferred from them, the structure and genetic organi- zation of Mytilus mtDNA furnish yet another example of both the unity and diversity that appear to be characteristic of metazoan mtDNAs. With respect to unity: (1) The gene complement is identical to that of other metazoans, except for the presence of two tRNAMet genes (even the absence of ATPase8 has a precedent in nematodes). (2) The genome organiza- tion is very compact; no introns were found and, with the exception of a 1.2-kb noncoding region and four other unassigned regions of 79-1 19 nt, intergenic sequences are minimal or absent. (3) The tRNA genes exhibit the relaxed sequence and structural con- straints that have become a hallmark of metazoan mt- tRNAs.

With respect to diversity, the genetic organization of Mytilus mtDNA furnishes another example of the nonconservation of mitochondrial gene order among metazoans. Based on the assumption that coelomate metazoans are monophyletic, we anticipated that the Mytilus gene order would differ, but still be a recog-

M y t h rntDNA Organization

TABLE 2

Codon usage in sequenced portions of 12 protein genes from M. edulis mitochondrial DNA

Amino acid Codon N

Phe TTT 168 TTC 52

Leu TTA 135 TTG 85

Leu CTT 52 CTC 13 CTA 62 CTG 38

Ile ATT 122 ATC 37

Met ATA 102 ATG 65

Val GTT 79 GTC 27 GTA 93 GTG 96

Amino acid Codon N

Ser TCT 58 TCC 9 TCA 18 TCG 20

Pro CCT 43 ccc 10 CCA 21 CCG 20

Thr ACT 46 ACC 12 ACA 26 ACG 8

Ala GCT 55 GCC 18 GCA 42 GCG 19

Amino acid Codon N

TYr T A T 81 TAC 40

Term TAA 7 TAG 3

His CAT 29 CAC 20

Gln CAA 33 CAG 14

Asn AAT 59 AAC 30

AAG 34 LYS AAA 54

Asp GAT 33 GAC 22

Glu GAA 36 GAG 51

407

Amino acid Codon N

CYS TGT 43 TGC 25 TGA

Trp TGG 35 41

'4% CGT 25 CGC 10 CGA 20 CGG 12

Ser AGT 48 AGC 21 AGA 69 AGG 58

GlY GGT 49 GGC 29 GGA 39 GGG 103

nizable variant on the relatively conserved arrange- ments seen among other coelomates (e.g., chordates, echinoderms and arthropods). Further, based on pres- ent phylogenetic hypotheses, it was also reasonable to expect that traces of a shared ancestry might be pre- served most strongly between mollusk and arthropod mtDNAs. Neither of these expectations is met. The degree to which the arrangement in Mytilus is unique was not only unexpected, but, given the currently accepted placement of mollusks with other coelomate metazoans, is much more surprising than the highly divergent arrangements that have been found in pseu- docoelomate (e.g., nematode) and acoelomate (e.g., flatworm and sea anemone) metazoans. Understand- ing the evolutionary derivation of the Mytilus gene arrangement, for example by the method of SANKOFF, CEDERGREN and ABEL (1990), will require informa- tion about other metazoan groups.

Previous mechanistic speculations about the reasons for conservation of mitochondrial gene organization have included a suggestion that arrangement of genes, particularly rRNA and tRNA genes, is important for regulation of gene expression (e.g., OJALA, MONTOYA and ATTARDI 198 1 ; MONTOYA, GAINES and ATTARDI 1983). The plethora of unique and radically different arrangements, among which that of Mytilus is but the latest example, argue strongly against such a conclu- sion and, instead, suggest that metazoan mitochon- drial gene order is a relatively neutral characteristic with regard to selection.

Although the reasons for the scrambled gene order in Mytilus mtDNA are unknown, it may be germane that mtDNA heteroplasmy is widespread in this genus,

apparently as a result of occasional biparental inher- itance of mtDNA (FISHER and SKIBINSKI 1990; HOEH, BLAKLEY and BROWN 1991). Heteroplasmy may pro- vide an opportunity for recombination among differ- ent mtDNAs that is usually absent. HOEH, BLAKLEY and BROWN (1991) have presented cleavage maps from two Mytilus mtDNAs that were found in heter- oplasmy. The map for one of the mtDNAs is identical to that for the genome characterized in the present study, and it is, therefore, most probably also identical in gene order. However, the map for the other mtDNA differs extensively from this map, and that mtDNA is also longer by about 1.6 kb. Further, while the two mtDNAs share many cleavage sites mapped in the section of the genome containing ND3, COI, ATPase6, ND4L, ND5 and the two rRNA genes (Figure l) , they share essentially no sites in the re- mainder of the genome, raising the possibility that multiple gene arrangements might be found among individual mussels.

Little is known about the mtDNA of other bivalves. S. MCCAFFERTY (personal communication) has enzy- matically amplified and sequenced a region of the mtDNA from four different species of Mytilus that spans the ND4 and COIII genes, and he found that these two genes have the same relative order and polarity in all four species. The ND4 and COIII genes are in a region that appears, by cleavage map compar- isons (HOEH, BLAKLEY and BROWN 1991), to be greatly dissimilar among individuals, thus suggesting that the dissimilarity may not necessarily reflect a dissimilar gene arrangement. D. O'FOIGHIL and M. SMITH (personal communication) have data from an-

408 R. J. Hoffmann, J. L. Boore and W. M. Brown

other bivalve (genus Lasaea) indicating that, as in Mytilus, the ND2 gene is downstream from the COIII gene, but with one unidentified tRNA gene interven- ing. In Mytilus these genes are separated by a 79 nt noncoding region and the tRNAyfitA gene. Their data also suggest that the ND4 gene is not upstream of the COIII gene as it is in Mytilus. In the Japanese scallop, Patinopecten yessoensis, the COIII gene precedes and is proximal to the ATPase6 gene, but it is not yet known whether an ATPase8 gene is located between them (E. BOULDING and A. BECKENBACH, personal communication). COIII and ATPase6 thus have dif- ferent relative positions in Patinopecten than in My- tilus. If confirmed, these results would indicate that bivalves are nonidentical in their mtDNA organization and that the great dissimilarity of Mytilus mitochon- drial gene order from those of other metazoans may stem from an accelerated rate of intragenomic re- arrangement as well as from phylogenetic distance.

These results emphasize the need for a much broader and more detailed survey of mtDNA gene order. Without data for other protostome mtDNAs, including data for other phyla as well as for other molluscan classes, it will be impossible to determine whether Mytilus represents an aberration or whether some protostome groups will simply turn out to be much less conservative in mtDNA gene order than deuterostome groups and arthropods appear to be. It is important in this regard that representatives of two other molluscan classes (a chiton, Katharina tunicata, and a gastropod, Plicopurpura sp.) have gene arrange- ments that are different from that of Mytilus u. BOORE, T. COLLINS and W. BROWN, unpublished data), demonstrating that gene order is not a con- served feature among all molluscan classes, at least as they are presently constituted.

This work was supported by grants from the National Science 1;oundation (BSR-8904633 and BSR-8916466, to R.J.H.; BSR- 8613902 and BSR-8517830, to W.M.B.), and the National Insti- tutes of Health (GM30144, to W.M.B.). We thank LYNNE SZURA and CAROL MANTHEY for expert assistance and DAVID SKIBINSKI for providing cloned Mytilus mtDNA. The DNA sequences pre- sented in this report have been deposited in GenBank (accession numbers M83756, M83757, M83758, M83759, M83760, M83761 and M83762, corresponding to Segments 1-7, respectively, of APPENDIX I).

LITERATURE CITED

ANDERSON, S., A. T. BANKIER, B. G. BARRELL, M. L. H. DE BRUIJN, A. R. COULSON, J. DROUIN, I. C. EPERON, D. P. NIERLICH, B. A. ROE, F. SANGER, P. H. SCHREIER, A. J. H. SMITH, R. STADEN and 1. G . YOUNG, 1981 Sequence and organization of the human mitochondrial genome. Nature 290: 457-465.

ANDERSON, S., M. H. L. DE BRUIJN, A. R. COULSON, 1. C. EPERON, F. SANGER and I. G. YOUNG, 1982 Complete sequence of bovine mitochondrial DNA: conserved features of the mam- malian mitochondrial genome. J. Mol. Biol. 156 683-717.

BATUECAS, B., R. GARRESSE, M. CALLEJA, J. R. VALVERDE and R.

MARGO, 1988 Genome organization ofArtemia mitochondrial DNA. Nucleic Acids Res. 16: 6515-6529.

BIBB, M. J., R. A. VAN ETTEN, C. T . WRIGHT, M. W. WALBERG and D. A. CLAYTON, 1981 Sequence and gene organization of mouse mitochondrial DNA. Cell 26: 167-1 80.

BROWN, W. M., 1985 T h e mitochondrial genome of animals, pp. 95-130 in Molecular Evolutionary Genetics, edited by R. J. MACINTYRE. Plenum Press, New York.

BROWN, W. M., M. GEORGE, JR., and A. C. WILSON, 1979 Rapid evolution of animal mitochondrial DNA. Proc. Natl. Acad. Sci. USA 7 6 1967-1971.

CANTATORE, P., M. ROBERTI, G. RAINALDI, M. N. GADALETA and C. SACCONE, 1989 The complete nucleotide sequence, gene order, and genetic code of the mitochondrial genome of Par- acentrotus liuidus. J. Biol. Chem. 264: 10965-10975.

CLARY, D. O., and D. R. WOLSTENHOLME, 1985 The mitochon- drial DNA molecule of Drosophila yakuba: nucleotide sequence, gene organization, and genetic code. J. Mol. Evol. 22: 252- 271.

CLAYTON, D. A,, 1982 Replication of mammalian mitochondrial DNA. Cell 28: 693-705.

CLAYTON, D. A., 1984 Transcription of the mammalian mito- chondrial genome. Annu. Rev. Biochem. 53: 573-594.

DE GIORGI, C., C. LANAVE, M. D. MUSCI and C. SACCONE, 1991 Mitochondrial DNA in the sea urchin Arbacia lixula: evolutionary inferences from nucleotide sequence analysis. Mol. Biol. Evol. 8: 5 15-529.

DESJARDINS, P., and R. MORAIS, 1990 Sequence and gene orga- nization of the chicken mitochondrial genome: a novel gene order in higher vertebrates. J. Mol. Biol. 212: 599-634.

DESJARDINS, P., and R. MORAIS, 1991 Nucleotide sequence and evolution of coding and noncoding regions of a quail mito- chondrial genome. J. Mol. Evol. 32: 153-161.

DEVEREUX, J., P. HAEBERLI and 0. SMITHIES, 1984 A comprehen- sive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12: 387-395.

DUBIN, D. T., C.-C. HSUCHEN and L. E. TILLOTSON, 1986 Mosquito mitochondrial tRNAs for valine, glycine and glutamate: RNA and gene sequences and vicinal genome or- ganization. Curr. Genet. 1 0 701-707.

FISHER, C., and D. 0. F. SKIBINSKI, 1990 Sex-biased mitochon- drial DNA heteroplasmy in the marine mussel Mytilus. Proc. R. SOC. Lond. B 242: 149-1 56.

FORAN, D. R., J. E. HIXSON and W. M. BROWN, 1988 Comparisons of ape and human sequences that regulate mitochondrial DNA transcription and D-loop DNA synthesis. Nucleic Acids Res. 16: 5841-5861.

FRANCIS, M. A., and B. DUDOCK, 1982 Nucleotide sequence of a spinach chloroplast isoleucine tRNA. J. Biol. Chem. 257:

GADALETA, G., G. PEPE, G. DE CANDIA, C. QUAGLIARIELLO, E. SBISA and C. SACCONE, 1989 The complete nucleotide sequence of the Rattus norvegzcus mitochondrial genome: cryptic signals revealed by comparative analysis between vertebrates. J. Mol.

CAREY, J. R., and D. R. WOLSTENHOLME, 1989 Platyhelminth mitochondrial DNA: evidence for early evolutionary origin of a tRNA"'AGN that contains a dihydrouridine arm replacement loop, and of serine-specifying AGA and AGG codons. J. Mol.

GUPTA, R., 1984 Halobacterium volcanii tRNAs: identification of 41 tRNAs covering all amino acids, and the sequences of 33 class I tRNAs. J. Biol. Chem. 259: 9461-9471.

GUTHRIE, C., and W. H. MCCLAIN, 1979 Rare transfer ribonucleic acid essential for phage growth. Nucleotide sequence compar- ison of normal and mutant T 4 isoleucine-accepting transfer ribonucleic acid. Biochemistry 18: 3786-3795.

HARADA, F., and S. NISHIMURA, 1974 Purification and character-

11 195-1 1198.

EvoI. 28: 497-5 16.

EvoI. 28: 374-387.

Mytilus mtDNA Organiza t ion 409

ization of AUA specific isoleucine transfer ribonucleic acid from Escherichia coli B. Biochemistry 13: 300-307.

HARRISON, R. G. , 1989 Animal mitochondrial DNA as a genetic marker in population and evolutionary biology. Trends Ecol. Evol. 4: 6-1 1.

HIMENO, H., H. MASAKI, T . KAWAI, T . OHTA, I. KUMAGAI, K. MIURA and K. WATANABE, 1987 Unusual genetic codes and a novel gene structure for tRNA% in starfish mitochondrial DNA. Gene 56: 219-230.

HOEH, W. R., K. H. BLAKLEY and W. M. BROWN, 1991 Heteroplasmy suggests limited biparental inheritance of Mytilus mitochondrial DNA. Science 251: 1488-1490.

HSUCHEN, C.-C., R. M. KOTIN and D. T. DUBIN, 1984 Sequences of the coding and flanking regions of the large ribosomal subunit RNA gene of mosquito mitochondria. Nucleic Acids Res. 12: 7771-7785.

JACOBS, H. T. , D. J. ELLIOTT, V. B. MATH and A. FARQUARSON, 1988 Nucleotide sequence and gene organization of sea ur- chin mitochondrial DNA. J. Mol. Biol. 202: 185-217.

.JOHANSEN, J., P. H. GUDDAL and T. JOHANSEN, 1990 Organization of the mitochondrial genome of Atlantic cod, Gadus morhua. Nucleic Acids Res. 18: 41 1-419.

KASHDAN, M. A., and B. DUDOCK, 1982 The gene for spinach chloroplast isoleucine tRNA has a methionine anticodon. J. Biol. Chem. 257: 11191-1 1194.

KYTE, J., and R. F. DOOLITTLE, 1982 A simple method for dis- playing the hydropathic character of a protein. J. Mol. Biol.

MCCRACKEN, A,, I . UHLENBUSCH and G. GELLISSEN, 1987 Structure of the cloned Locusta migratoria mitochon- drial genome: restriction mapping and sequence of its ND-I (URF-I) gene. Curr. Genet. 11: 625-630.

MONTOYA, J., G. L. GAINES and G. ATTARDI, 1983 The pattern of transcription of the human mitochondrial rRNA genes re- veals two overlapping transcription units. Cell 34: 151-159.

MORITZ, C., and W. M. BROWN, 1986 Tandem duplication of D- loop and ribosomal RNA sequences in lizard mitochondrial DNA. Science 233: 1425-1427.

MORITZ, C., and W. M. BROWN, 1987 Tandem duplications in animal mitochondrial DNAs: variation in incidence and gene content among lizards. Proc. Natl. Acad. Sci. USA 84: 7183- 7 187.

MORITZ, C., T . E. DOWLING and W. M. BROWN, 1987 Evolution of animal mitochondrial DNA: relevance for population biol- ogy and systematics. Annu. Rev. Ecol. Syst. 18: 269-292.

OJALA, D., J. MONTOYA andG. ATTARDI, 1981’ tRNA punctuation model of RNA processing in human mitochondria. Nature

OKIMOTO, R., J. L. MACFARLANE and D. R. WOLSTENHOLME, 1990 Evidence for the frequent use of T T G as the translation initiation codon of mitochondrial protein genes in the nema- todes, Ascaris suum and Caenorhabditis elegans. Nucleic Acids Res. 18:6113-6118.

157: 333-348.

2 9 0 470-474.

OKIMOTO, R., and D. R. WOLSTENHOLME, 1990 A set of tRNAs, that lack either the TGC arm or the dihydrouridine arm: towards a minimal tRNA adaptor. EMBO J. 9 3405-341 1.

OKIMOTO, R., H. M. CHAMBERLIN, J. L. MACFARLANE and D. R. WOLSTENHOLME, 1991 Repeated sequence sets in mitochon- drial DNA molecules of root knot nematodes (Meloidogyne): nucleotide sequences, genome location and potential for host race identification. Nucleic Acids Res. 19: 1619-1626.

I’AABo, S. , W. K. THOMAS, K. M. WHITFIELD, Y. KUMAZAWA and A. C. WILSON, 1991 Rearrangements of mitochondrial trans- fer RNA genes in marsupials. J. Mol. Evol. 33: 426-430.

PEERLES, C. L., P. GEGENHEIMER and J. ABELSON, 1983 Precise excision of intervening sequences from precursor tRNAs by a membrane-associated yeast exonuclease. Cell 32: 525-536.

POINAR, G. O., JR., 1983 The Natural History of Nematodes. Pren- tice-Hall, Englewood Cliffs, N.J.

ROE, B. A., D.-P. MA, R. K. WILsoNandJ. F.-H. WONG, 1985 T h e complete nucleotide sequence of the Xenopus laeuis mitochon- drial genome. J. Biol. Chem. 2 6 0 9759-9774.

SANKOFF, D., R. CEDERGREN and Y . ABEL, 1990 Genomic diver- gence through gene rearrangement. Methods Enzymol. 183: 426-438.

SEDEROFF, R. R., 1984 Structural variation in mitochondrial DNA. Adv. Genet. 22: 1-108.

SKIBINSKI, D. 0. F., M. AHMAD and J. A. BEARDMORE, 1978 Genetic evidence for naturally occurring hybrids be- tween Mytilus edulis and Mytilus galloprovincialis. Evolution 32:

SKIBINSKI, D. 0. F., and C. A. EDWARDS, 1987 Mitochondrial DNA variation in marine mussels (Mytilus), pp. 209-226 in Proceedings of the World Symposium on Selection, Hybridization, and Genetic Engineering in Aquaculture, Vol. I. Heeneman, Berlin.

SMITH, M. J., D. K. BANFIELD, K. DOTEVAL, S. GORSKI and D. J. KOWBEL, 1989 Gene arrangement in sea star mitochondrial DNA demonstrates a major inversion event during echinoderm evolution. Gene 7 6 181-185.

SMITH, M. J., D. K. BANFIELD, K. DOTEVAL, S. GORSKI and D. J. KOWBEL, 1990 Nucleotide sequence of nine protein-coding genes and 22 tRNAs in the mitochondrial DNA of the sea star Pisaster ochraceus. J. Mol. Evol. 31: 195-204.

SNYDER, M., A. R. FRASER, J. LAROCHE, K. E. GARTNER-KEPKAY and E. ZOUROS, 1987 Atypical mitochondrial DNA from the deep-sea scallop Placopecten magellanicus. Proc. Natl. Acad. Sci. USA 84: 7595-7599.

SPRINZL, M., N. DANK, S. NOCK and A. SCHON, 1991 Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1 9 2127-2171.

STORMO, G. D., T. D. SCHNEIDER and L. M. GOLD, 1982 Characterization of translation initiation sites in E. coli. Nucleic Acids Res. 10: 2971-2996.

UHLENBUSCH, I., A. MCCRACKEN and G. GELLISSEN, 1987 The gene for the large (16s) ribosomal RNA from the Locusta migratoria mitochondrial genome. Curr. Genet. 11: 621-638.

WARRIOR, R., and J. GALL, 1985 T h e mitochondrial DNA of Hydra attenuata and Hydra littoralis consists of two linear mol- ecules. Arch. Sci. 38: 439-445.

WEBER, F., A. DIETRICH, J.-H. WEIL and L. MAR~CHAL-DROUARD, 1990 A potato mitochondrial isoleucine tRNA is coded for by a mitochondrial gene possessing a methionine codon. Nu- cleic Acids Res. 18: 5027-5030.

WOLSTENHOLME, D. R., J. L. MACFARLANE, R. OKIMOTO, D. 0. CLARY and J. A. WAHLEITHNER, 1987 Bizarre tRNAs in- ferred from DNA sequences of mitochondrial genomes of nematode worms. Proc. Natl. Acad. Sci. USA 8 4 1324-1328.

WONG, T . W., and D. A. CLAYTON, 1985 In vitro replication of human mitochondrial DNA: accurate initiation at the origin of light-strand synthesis. Cell 42: 951-958.

ZEVERING, C. E., C. MORITZ, A. HEIDEMAN and R. A. STURM, 1991 Parallel origins of duplications and the formation of pseudogenes in mitochondrial DNA from parthenogenetic liz- ards (Heteronotia binoei; Gekkonidae). J. Mol. Evol. 33: 431- 441.

ZUKER, M., and P. STIEGER, 1981 Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9: 133-148.

354-364.

Communicating editor: A. G. CLARK

APPENDIX I

The nucleotide sequences of seven contiguous seg- ments of the Mytilus edulis mitochondrial genome are

410 R. J. Hoffmann, J. L. Boore and W. M. Brown

$ g % $ g 0 0 0 0 - O O D O 0 0 0 0 o z o o o o o

0 0 0 = 0 0 0 0 0 O R R S S 8 r N 0 ~ ' " " C O I 1 0 g g g g 9 N 0 " g ; ; g g I .:> .e .C .e> .e

z E: E- &- f: f. ga E: 'E' e '- $i ', B '. e= 'E, ' E W g- E> j E, I- ga

.*- . e 2 . 5: 2 .S> .E" .3 .$. .uo .6- e - ?'. 4' c g; ' Y - 'E- E' $: ,j- g E: 5- 2: 8 E- x" P

+a ' 5 - E" '5' '$'

. = .e 3- I- i! 8- .El, - $> 3' .y 'E- l; f - g: 8: 3:

e' a < f- :a <> $= 1:

#- g> 9 5; 1: e'

7" - x

8- & 1: 4- [E

5' g= a .I. 'E, 'E: 8- $- !! e: E-

.g> . x 3: .I. .t' .z; .E- . 3 .eY . g y .g: E' - I> 5: 1' le f- 4 I. p: $= E" 8, e- 6, e< 5' E' 1 p- E' e- s .i: .$I . fa .5. .J:

.yo .Eu ' 6 .E> .E- .t: : '5' & g - E> i E: I' u

e :- I> g 8 Eu 3 2; gu e, C Y

" .e- .E .e .I- .io .& e> c !. t> Bu t 2 c; e= e.> e> 4 e> e, - 5 , .E' .$e . E' 2: E 8- E. E' 2- .j8 '+ g z> !i8

= 3 .<> .e- . t u . .E> .E- E !E- .& .I- i: g go 1- 5: E N 3: g: E y 5: 5- pa 5: !a e: 'i' t t t - 5, E c g< z w ex

g .e- .? .$ .E' .E- .g g: p g. $: # g 3 $> 3:: E 2 2. $: 2.

I> 6- e- E

.E' .g; .e> .g> .tu .I- .eZ .

<> c

'1: 2' .& t- .[: .kE .I; 5: E- E; 5- f E> 5L.i $ g: Eb .fE

E- E> ? e- 'E> ' 4 .e . m . e , .j= ':>

EL E" s* e> /:.E; g' I" g: I8 E' 5 i- gu E: I8

.e> '5. +- .+a. 0s- e e> c cy.

S f f f Z f I - P i E . z g f

'2- 'EY

t :- 5-

!i 2o

5 ,EL , Iy ,; p z Eo .E' 3 5 5' 6 , E '$> t'

8 & E> e' E e L 2- :> 8: :

p. E> E: $:

.Bx .E- E 6 4 .E: .i: . p 8" e- :I f 'z E, 6 , e> a' 6 , si 0 8

'5: ' p 9 5 .i: .E: 'E>

1: 3- 5- t .g; " .& $> .5- 5' a & .& .E' i 1: '1 E: 'I" j- ; 0 $3 >

g g: 3 ! g , j : ,$- 6-

3: .I: 2 E - 5-

iij 4 3- $a @> E' e> 1 t E> in 6;

e 3- 8, , g 3. E> &I f .I: . g . g I c E' .%- .E' .I- io 5- 6' -5 2 p is- 2- 6- 5' E E> ou

.E: . p .E: 2 g i$: . f - ,I: .I: i; [; g ;g g 'E c $' * - e i: 3; c

.E- .5' .0 :> 3 $ i .;> $1 . p

z ti; .$* .$ g g p .E- .E- .E1 H &> g- E" 0 i g E> 8 s30 2- 5- 5 E e yo 1: g

c c

C> c u

c z- 3- t'

o' E- p 2 e,

ke $1 5- r ! ;. $; & ) C' c c- 3> 8"

I - P K """

P S 8 8 O X

f c B

'$ '3' :- t- E- 3- 6; $a

$, 'g;

g: $8

$- 'tu

.E. .3' < oc

Eu g o 5' E.

.E, .E> E, 6'

5> : .E> :>

o c e' .a .e c e'

6' c e>

.f: . E -

fa 5; .I: .I- f$, E.

:> e'

t' 5. EL E' E- , E' .e" .e

e< 3- g; .E; $3 2- "

E

'5" 'E> .?> '3' '3' '2" 'tu ' 5 E- E> E> $ E> 3" E, :> P > EL E c' E- 5- .E- .Fu .g: ti, .g: .E: .Eu .g; 5- 3' E: 2:' i: k' - #< 3. 5'

p t- td I> - 0 pa f;

E E: E: 1- 0

B 5. .%e .& . ; - .:> .E>

y t- e E L 2

2' t> c

Ea E- E.

e C> E. 5-

: : I->

g '5- 'iz '5; cE

. Y .E- .le 4 .*I .I; . j - . l j

Q +- - E'

1 ,# j: ,$' 5: .t' 5' . 2. < .e i::

$ '$1 " : I ,e' .j ,E> 6 t- !i E- ,&

:: 6, 1; f 5 - e> t> E- f $'

3 .& .io .!: .I, .E> .$- $0 t Eo 1: g: 2;

5' g 1- . e ; . .5' 3 .:a .Eu .z: .s: t'? 5: B' : z= P 8. 5

& i< e> 5 $: E- E: - J. .E: 23 .k 1:. .E' z

6, 5- :> ?> 2- 1 c 4 :* %

6' c

u c

z .:

, 8" 23 s= - I< E

.E& E .E' 'E .la .: .e> +> ,&# .pa t- E' t- & p . !: E- :>

'1; [i 1; 3: e> E- E: $ 4 a- 1: %: E- Ye

y o 8 8( e> 6' u t' E'

5 5 6 Z g E f 6 5 N 0 P m z

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 g 8 S 2 g ,o 8 ; ~ ; ~ ~ ~ 5 E : 4 E E t E a p;p;pL%w:zgz zfgfM$$ff:;g f g

I - P K f 5 5 f ' i s i g ? f $ p g $ f f f E i E s z z $ ; w f g f E g $ f g i g g $ $ $ $ $ $ 5 f - - - " _

Mytilus mtDNA Organization

o O o 0 o o o o , 8 8 8 8 ~ 8 8 8 8 ~ g p , o o o 8 8 O R 2 5 4 8 8 8 8 O 4 = c 2 - - - - - - " , o h -

0 0 0 0 0 0 A : " " " N N N

41 1

.e c .<> c . x? , I , .2z . E ~ .z2 .I- .:, .e 1 : .ea .e ,> c C . C , r c "0

, E " ' S 'EO '2' 'p+ '31 '8: '$+ ' g 'f; '2- 'E' 'I: '8; $' E; ?- & 0- c Y EY 1' *' ;> 2" 3, g: c $5: iI !$ tiz $!; i; $= :' y I-' : .t- .?Z i- .g- . E - .3z .E' .& . g q .E; .:< & .g; .@ .i: . g .[! y $< .?" E: 'E, 5: .E' id .$- p: ;: 3- x

%- !I e i- 'E: := E- E' 2: :- E u : I, 8- :$' E e: E' I< %> 3: 2. 8, 6, e. b> :u 1: 1: j x 0- r ( n u

c 0 u u & g e- c u- urn c p 3- f - Ed = ,E: . k .j: .s: .E- .I- E& 8 . F 1, .$ .E: .I: .$= .& .E: . E - .E. .3; .E" .*> . $Z .E" . E ,

1. g; i- 8- 1; E' 2, 5: 3- F" 3 F: f- 3; I > 3; 5: !x ;' 5: e* ix E e> c p c .e g- .E, .$I . I - .& . i; . E - r 3- .E" 2- .8< .E, 2- .2= .E* .j, .;> .:, .g: u 3: .$= .E'

'F' i: 5- e- Z' b> E' Ea 2' g- 6' 8- E" 2; E 8- F: :I I : E: g 2 $1 0 f u k; so 2, ./: E- 8; 5- E- Z' E- 2- :- c b, 3- e> i= g $1 g- f- ,E" .$" ,!; ,%> ,Iw :" c .+" .!+ c . .:- .Y .:- .E, .? c .g .;' .B' - c c 5- ,& ,E' E' e= :" 5 <

'E: j 5' '3: j- 3' i; 3- E- 3: f := -, 2 :> (n I g- -' a= ,I* 's .;> .yo

e' 2, E, E" $9 y 5- 5- & +" 5- 5. i: $' I, 5' - E, 3 E - 5: 5: u E" E* E, 5' 3- E> g ,j= 3; .E. <> .e .:- .I 5'. - .& 3 .!; .?* . 5 , . + .:- .E . .E> .: .s, .g, .$; .j: .E: .I: .s< $:

z .e* 2- E, :- '+> :, :" 5. L i: E' & g i; E, 3- g; E= e" f- '&

E E- j* :> e Jr f * y: 8" 2; 8' E: g g e y o & E- m - E' :- c E- 3" 6, $: E" E; 5, g. .I! .E: $ 4 . " 3- .5- .E- .:= .& 6 E, 3- . so .EL .$= .& . b .@ .p: ,u,

b> 'go (n 5' i; ;, 2: EY gg 'ju 3: Io E' c := :c 3, j : ,id . f n .$ 'E- .e ga 0 r Y !z> g 8: E, 5: I" 2, S > I' r- oc u

,E; .E- .E' .$I .$- .E. .EY .!j- .?> . g 3 .& .;> .5> .E .E' .$> .SI .E> .Ey .F' .:- .& .[: .E& .:- .@ %

c * e c e- c 2' e- y > u & b> 5: gr :- u 8 0 ?> 5' 5 5: 3- E- $> E: !" 5- I : $0 $ 2' g g E: 5- 2' Q :> r> E, ;I & g 6, F" ?I ,C' .iw .: !$a .& .:- .f; .E" .:, 2 .& .E .E' .c> .e2 .: .%' .$, .I: .$a .E. . p .ku .j= .I: .S.- .& 3- +* 5: s3; E' i; Eo 1' y : : E. 8: 5: $= := 2: B Y $0 L- - E, -= u

+- 3- P

t E a- . I , . : q = . E > .: " .e: .$: .& . g .EY 2' .- 8: .+ Z' .e- E" .:, 2. .: 3: 'SI $.= .: 34 ,!- ,j;,[- E>,;; ,E; ,g: E * PI

i p" E- u e+ u.8: Ed 8- 1- 5- 3. 5: & - 1- f u E* 5' g c 3: E- 2- g .j3 ;- $> !I E" 5 ; -5' p 8' 2 2' 1: E" I- r, 2, r. u YI SI- r Y r r UL c 0 +I I 8" F d 3, 3- E - 8 - a: ?x 2' E, 2 8" Zo r 8- :, ; 2' 5, e v)

. o m c uI 'E' ' (n

UI . 'E, e= :> e, < e. E, E- YI '5; 5 :A :< 2 Z > :" (n ;- Ia

E- .& .E' 5- x 2: 8- :a j: f: uc

" r r r Y Y e- e-

1 - I .u- .e

"I E :A k< ei

E'

I -

c e- < c r Y c u J c> .-

r r' -2 .e

E- 3. 5, :' x Fz 5, 5- E- 5' 9: ss E, I:: t u < "< Y y t- 5' EL u+

Y .- E 'J ._ - $0 E, 3: 2- eo c C Y 2 r= - e> 2 rLL

c*.

I- e> .C e- e o : e c-

e' c -0 E' E' 3' 2- :-

u < e> +a r- 2, E' E > go E "< go 6 C Y

L

I - E i ; 6 ; g 6 5 ; 6 6 6 z G z 6 6 6 6 6 " r " "

0 ~ Y ) Y ) C ~ D O I 0" _ " z z - : : 2 z E N : ~ o o O O o O 3 " " " "

N N N I

0 0 0 0 0 I z o O o N O 1 x g E g l i E E l : s z ; s f 1 0 0 O g 0 % ;z;;zgZf J $ E 0 0 D 0 0 a 0 0 0

.$ .I" .?, .6 , .e.- .C .r .8" .co .b .E= .go .E, .E

b> e 5 - 1. E' :A EL r c 8.. e

< r .r .ua .U .r ,p e c '3 ."1 c .u .CY . y r

5: 5- E > E, $, i: E: g f: ur $> 2: iI E-

: c > E, E: g, 8, I' :Y g: E' 5. 6 8' 3, $1 e> Eo 2 i .E, .go .:a .!: . E l t ." $ . y) .5' .gr .I, .& .g- .$- .: .E' r r .bA .E, .;- .+ .:= .f- . g< .a: .:

CY 1 E$=. g 5: b; u E 5 'E g> tu E; g 'i 2 f z : p 2 Y- F> E x '{x

!j r! i '8" . f x .E> .G" 5' .E> .E: .I' .I; .E; .I- .?, r> .Eu ' fE E- .E= .:, : .#, .5' .Y> 2 g :- 5 , E, 3- : .I E 5: 9 $ [ f - f - ,S $' f, E" E> 8' 8; E> 2' g* g- 2

g, g 5: p- 2 .5 .j= E $ 5 i 1;g; E 1- j- '8, y .; .C E .e 3, e' w : 1' !; t 3. 3< 5- E& 1: 5- g 8: g< iE 5- - o c :e 1 is: g; tu 1; ,:> ,E; ,EL E, c 5. :' ,r, +a ,E- .I: ,E' .pr ,5 B 1: j: !,I: 1: I: sE 1 !f c $a .Cy .E' r> i {if E- f 3- ' $ 0 ' g '[: '$ tu 2, U r 3; i; 1; E: 2: 9 'E- 2d p 2, g; r> IE 3 3 E ii 1- E' k: i p n 8: E" b> E- E- Y e " j .

2 8. 5 " 0 ' E E' :, 5' 5' $> 9: u I' " E= 3- c e" CY $ X = 1 .$ .E: i : a E ~ ~ d s a 'td .E { E i <> c c e' e 8' u E 4- 8, +I E' 3 E F 1: E , 1.5: 5- 8 : ? '5 e

.Z> e' .$ .$: .$Z .*- .i" .I. .$* .:: .s; .E- .Ed .;; .:> .F> . i o .a. 23 .$; . f- . Z z -f' g: r u E, 'E

L- g, 2; "+ 1: :- 5' j x E" E- Z Y E' um 6- $1 1: .E: .I: 3" .b' ,? g - ,j ,:' ,e_ .e* .EL .?, .pz .5> c .Er 0- '3' c .r

. j X . E > .p- 3 1 3 I I_ L ;I 1 .1 .E>

EL k- E Y 6' g jE E: 3' ZA I:: E- & EL 3: 2 e' c u

8, E- s= 5 , $x 3' y .I> g j; g; E: E 1: p 8: : : 5 2 s = :$ : - '1: Z' 'i 1 1 1: .!: .E: .E< . j ; 15- 2: . p .& .i; .E: .$- .E" .& .:> .? . $ x .g" .:: f - 1 .:- .E" .5- ,p! 5 5% 1 .$; .iY .%- .e id EO $, % X $: 85' E- io 1' 2- + e> E, e 20 u - s $ 3. 5: ;- E 1 I 1: .l" c .E- r> .[. +' .: .:- . .+a .EY .E' . I - .$I .$: .f- .E .3: .3= .I' .E- .& .$: .Iu -1 ir E E .&t-.P:.I - :a E' EL e- 5: E; 30 $5 g g go 5: E: 1: : X 5 k E E : L E >

I: c f; c * > I - x ;= u 1: $1 {L $ .f: t- 0- 3- 5" !: .[ 11: .k; 5- f: I- :-

5- 1" E, t E- &,I= .E' 8: ,?: E, I" , B - j: ,I 4 i= 1: 1: g % g: ba 5- 3; 1: e= 1 4 1 e E E:.[: s; E; :E C. 8- < r 'e,

.e, f .p $ $2 .E::. ; .E; .Ew i$ .[ .I: .g: 5 !?I f. E - f: E' 2: 8- ?> 7: 5; 3; E * E, :> 8: r $ E f- ':: 'F- 9- i 1;:';- &' 1 $ g 23 g, I u 2. 8- E E. E:

.U

u +' 1 $ %; i x 8, 3; c c c C Y

5< , g x e 0 5 > + 0 - c E t "U S 8 !i> e >

6 p" p E' 5- s I' r

e- 8 :- u 3 E= 30 'E ':

c-

c E- r e r 3- L' 5, C Y r r

.$ .e: E. E, x c u e- c

" c -3' 2- s== E" jr 5; '5' E' t: ; uc E, Ed $, Z := <> E, ' L p FL !& I : -

?; 2- e- EL Ed .t . c

+, E a z 'E- E x ' 8 f: . s; " 5. e- 2: p e- j u n :- s '!" <> 'k- "" e' :> I 1: G : . p { J .I jji. 6 .$ . j : .j: 1 1- r, L3

P .Z" .+" .E, .-- .*' 2. .u q .j- .8: E> <>

I st- z: i- 5, 5: E - 2: I:: 4- $, E- 5' E' 5, 3 9 g: ;> E: g: $ !& ;: :- & !: ; $ $* ;. $ $8" $: I-

E

% 0 Y a M

E .- a

I 8 E Y L

v) ; 3 U 0 2 E

I

PI

c - U

z 4 w 3

Lr i!

I - : z : z j ~ g g ~ g " " " " ~~~z $ $ $ $ $ g I - I z g g g g E s g $ - - " P l Z S I

412 R. J. Hoffmann, J. L. Boore and W. M. Brown

shown in Figure 4. Together these sequences fix the locations of all of the genes and unassigned sequences in the genome. Gene products are identified above the sequences. Translations of protein genes are shown with single letter amino acid codes centered below the second nucleotide of their corresponding codons; they include the extensions of reading frames for COI, COIII and ND1, which are marked with

curly brackets. Unsequenced regions between seg- ments all fall within protein genes, so that the 5’ portions of incompletely sequenced genes are fol- lowed in one segment by their 3’ portions in the following segment. The BamHI site used to make the original X EMBL3 clone begins at nucleotide 1243 in Segment #1 , within the s-rRNA gene. N = unidenti- fied nucleotides. x = unidentified amino acids.