nucleotide sequence analysis of title chloroplast dna … · title nucleotide sequence analysis of...
TRANSCRIPT
TitleNUCLEOTIDE SEQUENCE ANALYSIS OFCHLOROPLAST DNA FROM A LIVERWORT,MARCHANTIA POLYMORPHA L.( Dissertation_全文 )
Author(s) Fukuzawa, Hideya
Citation Kyoto University (京都大学)
Issue Date 1986-11-25
URL http://dx.doi.org/10.14989/doctor.k3630
Right
Type Thesis or Dissertation
Textversion author
Kyoto University
",
; <
" ,
...
'..;. l..
'\-. ':', , . ' ...
+," I.
'~'\ .
·.<~"""',!,"-'+·t""'~· ' ...... .' " ~ ;";: ,; ...... - ~.~.
"
..
~.
. ,.'.;.,{ t."'''' .
,: \~.:'" -;. ~: ~ j
./ -. ... ~ r¥ ,~. ,.-
',. .'f ~,: .~1 •
i. ~ r\.,,',1-~: -::'" ,I,
".:- ......
~~ ~ " ;'~.:. ' ..
.\ ~ •• -L
.. . ~'
t·:· -.
~. ~'. '
,,~ ,"-. -
• ~ J • , ' . .t / .",
",
',-0{
It
-.,
.'.
"
'.,;.
" . ,.
".
....
INTRODUCTION
CHAPTER 1
CONTENTS
,Molecular cloning of promoters functional in Escherichia coli from chloroplast DNA 7
CHAPTER II Structure and gene organization of the chloroplast genome -~ 19
11-1 Transfer RNA genes
11-2 Genes for photosynthetic polypeptides
II-3 Genes for ribosomal proteins and cf.. subunit of RNA polymerase (rpoA)
37
--- 44
55
11-4 Putative gene ndh3 and unidentified open reading frames --- 61
CHAPTER III Split gene for chloroplast ribosomal protein S12 63
REFERENCES 73
SUMMARY 86
LIST OF PUBLICATIONS 88
ACKNOVILEDGMENT 90
I
DNA
1. coli
IPTG
IR
LSC
.t:1. polvmorpha
00
ONPG
.. ORF
RNA
Rubisco
SO sos sse Tris
bp
kb
.. kd mRNA
rRNA
tRNA
atp
infA
ndh
pet
psa
psb
rbcL
rpl
rpo
rps
rrn
trn
ABBREVIATIONS
deoxyribonucleic acid
Escherichia coli
isopropyl-p-q-thiogalactopyranoside
inverted repeat
large single copy
Marchantia polymorpha L .
optical density
o-nitrophenyl-~D-galactopyranoside
open reading frame
ribonucleic acid
ribulose-l,5-bisphosphate carboxylase/oxygenase
Shine-Oalgarno
sodium dodecyl sulfate
small single copy
tris(hydroxymethyl)aminomethane
base pairs
kilobase pairs
kilodaltons
messenger RNA
ribosomal RNA
transfer RNA·
Gene symbols
genes for subunits of H+-ATP synthase
gene for initiation factor 1
ORFs homologous to human mitochondrial NAOH
dehydrogenase
genes for photoelectron transfer polypeptides
genes for photosystem I chlorophyll ~ apoproteins
genes for photosystem II chlorophyll ~ apoproteins
gene for the large subunit of Rubisco
genes for 50S subunit of ribosomal proteins
genes for subunits of RNA polymerase
genes for 70S subunit of ribosomal proteins
genes for ribosomal RNAs
genes for transfer RNAs
n
INTRODUCTION
The chloroplast is the organelle of plants and green algae that has the
photosynthetic apparatus and its own house keeping machinery. Chloroplasts and
other plastids contain their own autonomously replicating DNA genome that is
composed by double stranded covalently closed circular DNA. In 1960s, DNA molecules
were visualized in chloroplasts by electronmicroscopic analysis (Ris and Plaut 1962,
Sager and Ishida 1963). In 1976~ maize chloroplast DNA was analyzed by rest~iction
enzymes and physically mapped to be a circle of about 140,000 nucleotide pairs
(Bedbrook et~. 1976). This included two large inverted repeat regions (IRA and
IRS) of 22 kb coding for a set of ribosomal RNA genes. These IR regions were
separated by large and small single copy regions (lSC and SSC regions). Chloroplast
DNA of most plants and green algae has chromosomes of this general structure except
for a few legumes (Pisum sativum and Vicia faba) and Euglena gracilis. Pea
chloroplast DNA has no large repeat regions and Euglena gracilis chloroplast DNA has
tandemly repeated regions coding for ribosomal RNA operons. Among chloroplast DNAs
of the inverted repeat type from green plants, one of the smallest size is that of
Marchantia polymorpha (121 kb) with 10 kb IR sequences (Ohyama et~. 1983), whereas
that of Chlamydomonas reinhardii is the largest (195 kb) with 21 kb IR sequences
(Rochaix and Malnoe 1978).
The majority of proteins present in chloroplasts are encoded by nuclear DNA, but
the rest are encoded by chloroplast DNA and synthesized by the chloroplast
transcription-translation machinery. The nucleotide sequences of many chloroplast
genes from various plant species have been determined-(Crouse et~. 1985). The
first chloroplast gene sequence to be determined for a chloroplast protein was that
for the large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (rbcL)
(McIntosh et~. 1980). Since that time, a sizable number of plastid genes for
proteins have been sequenced as summarized in Table 1. However, the complete
sequence has not been determined as yet for any species of plants.
-:-1-
Gene
rbcL
psaA
psaB
psbA
I N
I
psbB psbC
pshO
psbE
psbF
psbG
Table 1. Genes for proteins coded by chloroplast genome.
Protein product
Ribulose-l.5-bisphosphate carboxylase/oxygenase
PS 1. P700 apaprotein
PS I. P70D apaprotein
PS II. ·"32 kd" protein
PS II. P6BO "51 kd" protein PS II. "44 kd" protein
PS I I. "02 ·po 1ypept ide"
PS II. cytochrome b-559
PS II. cytochrome b-559 (URF39)
PS II. G-protei n
Plant source
Maize Spinach Tobacco Anabaena Chlamydomonas reinhardii Synechcoccus Anacycti~ nidulans Rhodospiri1lum rubum Barley Euglena gracilis
·l1aize Spinach r~aize Spinach Nicotian" debneyi Spinach Am"ranthus hybridus Soybean Anabaena Chlamydomonas reinhardii Pea Solanum nigrum Euglena gracilis Nicatiana tabacum Mustard ---Spinach Spinach Spinach Chlamydomonas reinhardii Pea Spinach Spinach Spinach Oenothera hookeri Nicotiana tabacum Wheat ---Spinach Oenothera hooker; Nicotiana tabacum Wheat ---Maize
Reference
McIntosh ei.al. 19BO Zurawski et aT. 1981 Si nozak i &Sugi ura 1982 Curtis & Haselkorn 1983 Oron et al. 1983 ReicheTt~ Delaney 1983 Shinozaki et a1. 1983 Nargang et-al~ 1984 Zurawsk fet 81.1984 Gingrich &HaT1ick 1985 Fisch et al. 1985 Kirsch-et!il. 1986 Fisch e~a~ 1985 Kirsch-et!i1. 1986 Zurawsk~e~al. 1982 Zurawski et aT. 1982 Hirshberg~ McIntosh1983 Spielmann & Stutz 1983 Curtis & Haselkorn 1984 Erickson et al. 1984 Oishi et aT.-- 1984 GoloubTnoff et al. 1984 Keller & Stutz -- 1984 Sugita & sugiura 1984 Link & Langridge 1984 I~orri s & Herrmann 1984 Alt et al. 1984 Holschu~et al. 1984 Rochaix e~a~ 1984 Rasmussen-e~al. 1984 Alt et al.- -- 1984 HolsCiiu~et a1. 1984 Herrmann et aT. 1984 Carrillo et aT. 1986 Carrillo et aT. 1985 Hird et a~ -- 1986 Herrmannet a1. 1984 Carrillo et aT. 1986 Carrillo et aT. 1986 Hird et a~ -- 1986 Steinmetzet .B. 1986
w
Table 1 (continued). ------ ----------Gene
petA
petB petD
atpA atpB
atpE
atpF
atpH .
atpI
rp12
rpl16 rps4 rps7 rpsll rps12 rps14
rps16 rps19
infA tufA rpoA rpoB
Protein product
Cytochrome f preprotein
Cytochrome b6 Cytochrome b6/f complex, subunit 4
H+-ATP synthase, alpha subunit ~-ATP synthase, behta subunit
~-ATP synthase, epsilon subunit
~-ATP synthase, subunit
~-ATP synthase, subunit III
H+-ATP synthase, subunit IV
50S ribosomal protein 2
50S ribosomal prot~in 16 30S ri bos omal protei n 4 30S ribosomal protein 7 30S ribosomal protein 11 30S ribosomal protein 12 305 ribosomal protein 14
305 ribosomal protein 16 30S ribosomal protein 19
-initiation factor 1 Elongation factor Tu
Plant source
Pea Spinach wheat Oenothern hookeri Spinach Pea Spinach Tobacco 11aize Spinach Tobacco Barley Maize Spinach Tobacco Barley Wheat TODacco Wheat Spinach Tobacco Pea
Nicotiana debneYi Spi nach . Spirodela o1igorhiza Maize Eug 1 ena oraci 1 i s· Spinach Euglena gracilis Marcnantia polyrnorpha Spinach Tobacco Tobacco Nic'otiana debneYi Spinach Spinach Euglena gracilis Spinach Tobacco
Reference ------------_._----
Hi lley et a l. Alt et af.Willey et ~. Tyagi & Herrmann Heinemeyer et al. Phillips & Gray Heinemeyer et al. Deno et al.- -Krebberset a1. Zurawski et af. Shinozaki~ Sugiura Zurawski & Clegg Krebbers et a l. Zurawski et aT. Shinozakietal. Zurawski&Clegg Bi rd et Ii l. Shinozakiet al. Howe et a l~ cAlt- eta-';Deneeta1. CozenS-at ~.
. Zurawsk iet a l. Zurawski et af. Pes no et aT.Subramanian et al. Montandon & Stutz l1ul1er et al. Montandon.~Stutz Umesono et al. Kirsch eta"f:'" Shi n'ozaki et a l. Sugita & Sugiura Zuraws ki et a 1. Zurawski et aT. l-1uller eta1:-'" Montandon ~Stutz Muller lit al. Ohm~ et al~
1984 1984 1984 1986 1984 1984 1984 1983 1982 1982 1983 1984 1982 1982 1983 1984 1985 1986 1982 1983 1984 1986
1984 1984 1986 1983 1984 1986 1984 1984 1986 1986 1983 1984 1984 1986 1983 1986 1986
RNA polymerase 0/ subunit RNA polymerase f3 subunit -----_. ._--'-_. ----------
The chloroplast DNAs from common bean, soybean, the fern Dsmunda and the algae
Chlamydomonas reinhardii have been shown to exist as a 50:50 mixture of two
genetically identical but physically distinct molecules called isomer differing only
for in the relative orientation of their single copy regions (Palmer 1983). In the
case of ~. polymorpha, isomers have been identified by fragmentation using PstI, and
southern hybridization analysis of the chloroplast DNA with cloned DNA fragments as
shown in Fig. 1. 32P-Labeled DNA fragments involved in either single copy region
(probe 1) or inverted repeat (probe 2) hybridized to the chloroplast DNA segments
generated by BamHI or PstI restriction enzyme (Fig. 2). Probe 1 containing a DNA
segment of an inverted ,repeat hybridizea with the four largest fragments (Pl, Pl',
P2'and P2') produced by PstI. On the other hand, probe 2 containing a DNA segment
of a single copy region only hybridized with two fragments (Pl and Pl'). So in the
~. polymorpha chloroplasts two types of DNA molecules exist in an equal amount. It
is interesting to know whether the events on DNA replication have any relationsh,ip
to the formation of isomers. But the mechanisms of flip-flop recombination have not
been clarified yet.
The chloroplast biogenesis depends on the coordinative gene expression of both
the chloroplast genome and the nuclear genome. But the interaction between
chloroplast and nuclear genetiC systems has not been clarified. In this study, the
author has intended to reveal the mechanisms of gene expressio~ of the chloroplast
genome on th~ basis of the molecular biological methods as follows.
Chapter I; Chloroplast transcriptional promoters were cloned and characterized
using the I. coli gene fUSion method. Chapter II; The primary structure of the ~.
polymorpha chloroplast DNA was determined and genes for transfer RNAs,
photosynthetic polypeptides, ribosomal proteins, and 0( subunit of RNA polymerase
were identified. Chapter III; An unique system for gene expression that may be
called trans-splicing was found in the biogenesis of ribosomal protein 512.
-4-
Bam HI Pst I 1 2 1 2
91 -1 I "I PI~I B2- Pl r ~t:.-
94-
1 - Pl-II-I • • - -
• •
Figure 1. Southern hybridization of the probes containing DNA segments of an
inverted repeat region and a single copy region. Chloroplast DNA fragments
digested by BamHI and PstI restriction enzymes were electrophoresed in 0.6%
agarose gels (each left lane), transferred to nylon membrane filters and
hybridized with 32P-labeled probes . Filters were hybridized with the probe
containing the Bg16 segment of a single-copy region (black box in the right
figure and Lane 1) and probe 2 containing the Bg21 fragment of an inverted
repeat (Lane 2) . Autoradiograms are shown right side of each electrophoretic
pattern .
- 5-
Isom~rs i nchloroplastDNA from ~. polymorRha
Figure 2. Isomers and their restriction maps of the chloroplast DNA. kinds of molectlles (form I and II) of the chlot'1oplast DNA are shown.
Two
IR and L5 indiCate inverted-I"epeat-sequences containing ribosomal RNA genes, the gene
for the large subunit Of rlbulose"':-l,~5"'bisphosphate carboxylase/oJ:(ygenase.
respectively.
CHAPTER I Molecular cloning of promoters functional in Escherichia coli
from chloroplast DNA
Chloroplasts have their own transcriptional and translational systems including
unique transfer RNAs, ribosomal RNAs, RNA polymerase and ribosomal proteins
different from those of the cytoplasm. Determination of the locations of
transcriptional and translational regulatory regions on the chloroplast genome would
lead to a basic understanding of the mechanisms of gene expression in chloroplasts.
It has been suggested that mechanisms of gene expression in chloroplasts are similar
to those in I. coli (Kidd and Bogorad 1979. \~hitfeld and Bottomley 1983. Gruissem et
~. 1983, Gruissem and Zurawski 1985). If this is the case, I. coli plasmid vectors
can be used to clone chloroplast DNA fragments containing transcriptional start
signals which can function in ~. coli. Casadaban et ~. constructed a plasmid in
which (3-galactosidase activity was expressed by insertion of exogenous DNA fragments
having promoter sequences, ribosomal binding sites. and translation start codons in
the right frame, and showed that the strength of promoters and ribosome binding
activity can be assayed by measuring the enzyme activity (Casadaban et~. 1980).
In this chapter, the author shows an evidence that some of the chloroplast
promoters can be expressed in 1. coli by using gene fusion techniques with the
~-galactosidase gene. The nucleotide sequence of the DNA fragment with the highest
enzyme activity in ~. coli was determined and the location of the promoter was
mapped on the chloroplast genome. To find out whether the chloroplast DNA fragment
in fact initiates transcription in 1. coli as well as in chloroplasts, Sl nuclease
mapping experiments were done. The mechanisms of gene expression in chloroplasts·
are discussed.
MATERIALS AND METHODS
DNA isolation. restriction endonuclease digestion. and hybridization
Chloroplast DNA was isolated from liverwort, ~ .. polymorphasuspension culture by,
the method describes previously (Ohyama, et~. 1982) with some-modifications.
-7-
Chloroplasts were isalatei fromcu·ltured cells byus'ing blender containing sea sand,
(20-25 mesh). Restriction digestion, hybridization and plasmid DNA isolation were
carried out by the procedure d'escribed by Maniatis et ~.(1982).
Cloning of chloroplast DNA fragments
Chloroplast DNA was cTeaved by EcoRI. Sau3A, BglII, Sma I and BamHI restriction
endonuc"1eases~'Restri cted ch 1 orop 1 ast DNA fragments were 1 i gated into
dephosphorylated EcoRI, ·BamHI and SmaI sites of plasmid pMC1403 (Casadaban et ~.
1980). The ligated DNA was used to transform competent cells of ~. coli strain
MC1061 (~D139, .d{ar-a, leti)7697, AlacX74, illU-, £@}K-, hsr-, hsm-, strA)
(Casadaban and cohen 1980). Transformed cells were selected on lactose MacConkey
agar plates containing 100 }Jg/mg of ampicillin. Red colonies using lactose (lac+)
and resistant· to ampicillin were picked up and cultured in L-broth in the presence
of ampicillin.
f3-Ga 1 actosidaseassay
~. co"; cells harboring recombinant plasmids were grown to 2-5 x 108 cells/ml
and ~-galactosidase activities were assayed as described by Platt et ~.(1972).
DNA sequenci rig . , '
The nucleotide sequence was determined by'the dideoxychain termination method
(Sanger 'et aT. 1977) using M13 phages mp10 and mpl1 (Messing 19B3). For size
markers in 51·nuclease ~appi~g, a HincIl-EcoRI fragment (see Fig. 3) was labeled'
with [o32p]-ATP (3000 Ci/mmol, Amersham) at the 5' terminus by T4 polynucleotide
kinase after alkali phosphatase treatment, and sequenced by the method of Maxam and
Gi 1 bert (1977).
RNA isolation and Slnutlease mapping
Chloroplast RNA' Was extracted from ·chloroplast peliets by phenol chloroform
- -S·,
extraction described by Yamano et ~.(1984). ~. coli total RNA was extracted by the
SDS-hot phenol method and S1 mapping was performed by the procedure described by
Aiba et .5U..(1981) using 5' end labeled EcoRI-Sau3A fragment (105- 6 cpm/)Jg) and 100
}lg of RNA.
RESULTS
Cloning and characterization of chloroplast DNA fragments having promoter
function
Eleven recombinants were selected On lactose MacConkey agar plates containing
ampicillin as red colonies showing fi-galactosidase production in ~. coli. The
(3-ga lactosidase assays of recombi nants' showed a' variety of 1 evels of the act; vities
summarized in Table 1. The recombinants can be gr6uped in three tategories by
comparing with IPTG induced (3-galactosidase activity in ~. coli wild strain W3110:
" a group with a slightly higher level of the enzyme activity such as recombinants
harboring plasmids pMP901-903; 2, a group with a rather low level of the activity
such as recombinants carrying plasmids pMP921, pMP954-956; and 3. a group with a
very low level of the activlty such as recombinants carrying plasmids pMP910-912.
pMP953.
Restriction analysis of recombinant plasmids (pMP901-903) with high levels
(3-galactosidase activity showed that they had the hme 5.1 kb DNA .fragment, which
corresponded to Ec8 fragment (see Fig. 3). To confirm the inserted ,DNA fragment in
the plasmid pMC1403 was generated from.11. polymorphachloroplast DNA and to find its
site on the chloroplast physical maps," plasmid 'pMP903 was lab~led with [d.32p]-dCTP . '. ':.
using the nick translation reaction and hybridized with chloroplast DNA fragments on
nitrocellulose filter (Fig. 1). 32P-LabeledpMP903'hybfidized io th~ c~loroplast
DNA fragments Ba2. Ba4, Ba6 and Ball (Fig. lB). The 1.1 kb BamHI' fragment derived
from the 5.1 kb EcoRI fragment with an additional 10 bp of pMC1403 was recloned into
pMC1403 and the recombinant plasmid was named pMP904 (Fig. 2). Southern
hybridization with chloroplast DNA fragments showed that the inserted DNA fragment
-9'--:"
Table 1. ~~alactosidase activity and copy numbers in clones containing
recombinant plasmids.
{J-Ga Illel osidasc Relative 1-1051 slrains Plasmids aClivity umounts of
(units)" plasmidsh
W3110 (- IPTG) <II W3110 (+ IPTG) 344±28 MCI061 <5 MCI061 pMCI403 <6 1.0
MCI061 pMP901 361 ±20 O.9±O.1 pMP902 328±27 0.9±O.1 pMP903 360±23 O.9±O.1 pMP904< 408±39 I.D± D.I pMP905" 365±20 O.9±0.1 pMP906< 308± 33 1.0±O.1
MCI06J pMP954d 200± 14 1.0±O.1 pMP955d 267± II D.9±O.1 pMP956J 233±24 O.8±O.2 pMP921 253±2D 1.1 ±O.I
MClO61 pMP953'· 49± 7 D.7±D.2 pMP9JD 17± 3 D.9±D.1 pMP911 73± 8 0.9±D.I pMP912 56± II 1.0±O.1
a) Numbers are averages of three experiments and expressed as units per mg of
total protein (Plattet ~. 1972). b) Plasmid copy numbers were measured by the method of Projan et~. (1983)
and expressed as the number relative to pMC1403 in £. coli strain MC1061 as 1. O.
c) Derived from pMP903 (see Figure 2) d) Containing rbcL gene. e) Containing (3 subunit gene of H+-ATP synthase.
-10-
(1.1 kb) of pMP904 hybridized to Ec4 (located on the other IR region), Ec8, BaG and
Ball (see Fig. le). These results and no cleavage ~ite of EcoRl on the Ball
fragment indicate that the cloned DNA fragment originates from the Ba6 fragment and
is located on the large single copy region close to the inverted· repeat (IR) (Fig.
3). As the clone harboring plasmid pMP904 still expressed the high level of the
enzyme activity, the internal HincII-EcoRI fragment was further subcloned into
pMC1403 (Fig.2). The recombinant plasmid, named pMP905, kept its high level of
~-galactosidase production. However, the recombinant plasmid pMP906. which was
constructed by deletion of HincII-AluI fragment (position at 3 to 348) from pMP905
(see Fig. 4), gave slightly less activity than plasmids pMP904 and pMP905 (Table 1).
Nucleotide sequence of a promoter and its downstream region in chloroplast DNA
As the HincII-EcoRI chloroplast DNA fragment inserted' pMP905 appeared to have a
transcriptional and translational start signals, the nucleotide sequence between two
HincI! sites was determined by the strategy' shown in Fig. 3. The nucleotide
sequence of the 1192 bp chloroplast DNA fragment and junction region between the
chloroplast DNA fragment and the lac'Z gene of pMP905 is shown in Fig. 4A and 4B,
respectively.
51 nuclease mapping
The position in the sequence corresporiding to the 51 end of the mRNA was
determined by an Sl nuclease mapping procedure. The 533 bp HincII-EcoRI fragment
(Position at 4 to 536 in Fig. 4A) was labeled at the 51 ends with [~32PJ-ATP and
digested by Sau3A restriction enzyme, generating two fragments, 66 nucleotides long
and 467 nucleotide long. The 32P-labeled 467 bp fragment was hybridized with either
~. £2li RNA or chloroplast RNA at 370C~ Hybridized DNA-RNA molecules were digested
by Sl nuclease and the length of the Sl nuclease~resistant DNA fragment was measured
by electrophoresis with a Maxam-Gilbert sequence ladder generated from the 467. bp
fragment as shown in Fig. 5. The 51 end of the mRNA from both~. coli and
-11-
A B C 1 234 5 1 234 5 1 234 5
3 .-89 3 -B9 3 4 "89 4 Ec4- , -Bg4 _Bg6 -B97 Ba6_
Ec EcB- _
Boll- -- Ball-
-Bg21
Figure 1. Southern hybridization of chloroplast DNA fragments with plasmids
pMP903 (8) and pMP904 (C). Panel A shows agarose gel electrophoresis stained
by ethidium bromide. Lanes 1, 2, 3, 4 and 5 correspond to the electrophoretic
patterns of digests by EcoRI, EcoRI/BamHI, BamHI, BamHI/BglII and BglII
restriction enzymes, respectively.
pMP903
0 s B
1~ Bam HI ! ~
-Z B d igestion
~ y
E H
pMP904 "Os s HnclI, B~l ur
y 'z B digest ion
~tion topMC1403
pMP905 ( Sma!. BamHls ites)
~s 'z B
Figure 2. Construction of plasmids pMP9Q4 and pMP905 from pMP903 containing
promoter regions. Heavy lines indicate chloroplast DNA segments. The
structural gene of ~-ga1actosidase is shown as IZ.
-12-
Ball
Bat.
;/~\ /" "?l\ BaG'
MpCt -DNA , , , '\,
'\ , ,
165
BamHI Ba4 EcoRI Ec8 Hincll
5.1 kb
, , , '\
'\, '\, , ,
'\
Ha He
SOD 1000 --------~p- •
------~.- -.~-----------• •
Figure 3. Location of the DNA fragments having promoter function on the physical map of chloroplast DNA. and the 'sequencing' strategy. IR indicates
inverted repeat region c6ntainihg ribosomai'RNA operon~'including 16S and 23S
rRNA genes. LS and (3 indicate the genes for the large subunit of ribulose-
1, 5-bi sphosphate carbo~yl ase!oxygena'se and for the (3 stibuni~of H+.:...ATP synthase. respectiv~ly. A, E,'Ha, H~ and S iridicate cleavage sites of th~
restriction enzymes AluI, EcoRI, HaeUI, HincII and Sau3A, r~spectively.
Bottom arrows indicate sequencing strategies using the M13 phagesmplO and mpll.
-13-
A IU.nI:II s.uJi\ ,. . -" ~ . -'" . ," " '., .. . . ..
.S." CTT(;"C'C"I'C"I'GMA'%'CiTC'TCATCAT.MMTTCTCTTCT-MA'TAAATACTTAT'%'TTT'lTt'Cc:t'AATC'rAc(:'\'rCATAMnc~TTCAATTTrT'I'tTATCCA.uCMlr:"t'C1'~ 1lC . . '. .. .------, (------. . .. ;.------ ---::-' (----------.
c:rACACC'M';'ACTTCA,"""atT"'1'''fA~cri-rrrrMa..n~T'''GAcAATTTttrAeiT'''~A:.~~::~;AiTrn~~ATi\C;:~~~T~
AluE . . . . .. . '. . . . ,. . .... CJ.t:T>T.TTCA.M.TAC ........ O.CATMAt:TCJ.CAGM~MAG.~~~:~~t:T .... =AI~TCJ.CJ.CAC.TMGt:TCACCt:TMCC
. . .'. /. -3: . . '~T~. . . ~TrI'T1'A'fC"M"~T1\CATAT ... GA~i\AAAG.A..o\MA1"AT'f'G~AA~AMTM.MTCTl'ATAM~ ... 'tCCM.TTM'TAT'CT~r.ArrAA.TAAA~ •• ..:, t ... ~. .--____ ,_ (--____ ... .__ _ ____ ., (--
tMA xlaICAu,)- £.coRI Et::c-Rl
AA:TAT"T~C1r.·TCCATINC1'I;;,:",~AAAGo.tt~A"t"T'CA,.AATTGcr;gMTij!AcACCi'iC#~CA'l·cCllITriA.v.A~-rd"MnCAAG-rMT .... TA'rCT"l"ci ____ MM. . "" ;. & - If - r. I( K P T H H .,.. R 1 tI R f" rf '$ C V H If" to 'J: -,: L 'N S S .rf ISS
(ORr~~11) . . ,D 1'TAT ..... TATWTATAAmi-rrATAT...-rnrni.io.CAM~~~ ... TACAC"TC"i6AA~rnATAMTT~T"GM.i.T ... G~TI.'fO.AA.nCMTT'T~M~ATAA L T I H [ I FYI" L r H Jt, V. K S S'c I 0 c· t. 1: 'X, ,. t 'N 'I R H .R Y' 0 I-' c r 'I{ !:; II! t .-.. • ___ -.:.--. .. ____ '1 ____ •. - +~M_) t ___ •· ... ___ 1- c_
A'I"'i'T~'l"CM-io/..CT"fA.AGiACCCACT"cTrAc:Ac.v.AA.i.ACAATTCcttrATTAto~1:'C.AC1'AT"'GTTHGAWc..r.ATAmi:rrc.'M'rM.u.CACA.Vr.TAAA.w.,..TCGAT --~.pt ""0 V I:: \' P V·L 't' .e X T· 1 R···t.·. to t K ·N· _0 Y S l' -0 V 'PI I D B If: JC 1"·0 l K It V'1 IDRFfiD2~' . " .. .. '
TCi\A~M'l'C'n'.v..,.Gt'1'ATAJ\~AMtAc7cA~T;'OG1''''CO"rucAGGJ.TJ.T'''~AiA.\AC'CiMTC.\riAn.AAAn E-' r r H V ~ Y ~ S ~ .~ ~. II K L P ~ ~ It K It leT r ~. G Y T, V'~ .Y·K R H 11K 1.
". ,.O
720
•• 0
liaaIII
GOA.TC""rocTrAni:rAmCA1TA~"TAAAT~T'M'T""tt,i.,CGTTTT~CA1"ACCTATi .. u-f"'TATCC'ca.tA-CGTTT~T"'T'CC"'GcnAT~CGCCJ.CGCA.CCCCTM·C: "ODO Q S G 't SIP L I" S " 1(.', ....... . 'IC~'O]'~' .. H i\" I R 1 'i 'R' It. Y T peT R n
, HinctI ~ .~. ~ ;,; ~ ~ . . . '.' .,.
CCAI'CtGTAC:CTA.u.Tn'C""TOA.AATAGT"rMA~GCCACAMA.oU.M1'"TAACATA'tMTAMClA"I"";T1'AMAMCcrccAMCAACAGAGGAATCATMCMG1"CMC :3' R S V P ~ f D·C [ V It C ~l·P.~ ~ K L .T· y. N 'K H ! ~ K' C R N' H R ·C [ '1 T S' Q
SO-UkD teaRt Sad B4mHI
MTAT~TCc:A.TCGc:TC.v.TCGnAAAGc.ACea...CTCA;~~G~f"""·TTC""·-:-C:-:=r:-1 :::1:"' .-,"'.cC--C-O--TC-::C--r.I1'T--"!"· .---__ -_-'-::,.-.-:.-:}--; ------. . H Ai!:,., L It. R' P. T H' N' \II R. I .P G· D p V V L
(CU'01'" ." ........ - -.. --'.. -
•••
Figure 4. Nucleotide sequence of a DNA fragment having promoter function
derived from liverWort chloroplast DNA. Nucleotide sequence of the 1192 bp
HincIIfpagme~t (Fig. 3) is shown in (A). Possible stem, ~nd loop 'structure~
are ,sho~n ,by broken. lines with. arrows. Bold vert.icalilrro\,/s indicate the
start sites of ~RNA. trans~riPtipn i~~. coli Cis well as in Ghloroplasts. The
reg; OI1~ of 1~'-35", ,"-10", Shine-Da 1 gCirn9 sequence ( SP) ,and genes for
tRNAIle(CAU)' and,p?"g~l,a~tQsiclase~ are boxed~ Deduted a~;rlO acid sequences of
ORF601. 602. ,and 6Q~ are ~hown undTr,th~. nlicleo~id7 sequencesl:>Y!iingle letter
symbols. Doub.lE1" under.lines indicate termination codons. N-Terminal region of
thefllsed'protein. between' ORF60; ~nd ~-ga'actO!)idasecoded by pMP903-'-.906 i~ shown in (B).
C G 3 ' 5'
+ +
j j 1'\ 123G C TAG 1
-G C T
~1 A G G T A T A
\ l T T A A
A L/
B 5 ' 3 '
Figure 5. S1 nuclease mapping by ~. coli RNA (A) and chloroplast RNA (8) of the
promoter region. The Sl nuclease protected DNA fragments (lanes 1, 2 and 3) were
electrophoresed in parallel with the Maxam-Gilbert sequence ladders. Lane 1 (A),
and lanes 1, 2 and 3 (8) correspond to the concentration of Sl nucleases, 5000, 50,
500 and 5000 units per reaction, respectively. Sl mapping in lane 1 (8) should read
to be A, because of the smiling pattern of the gel electrophoresis. An arrow
indicates the direction of the transcription.
chloroplast was thus mapped on the sequence to be 45-46 nucleotides upstream from
the ATG translational start codon of ORF601 (Fig. 4).
DISCUSSION
From~. polymorpha chloroplast DNA, DNA fragments were cloned functional in I. coli transcription and translation system. Several chloroplast genes have been
cloned into I. coli plasmids. With the rbcL gene, its expression was observed in an
in Yi!rQ coupled transcriptional translational system derived from E. coli. Gatenby
-15-
et al. reported that rbcL genes from maize and wheat chloroplasts were expressed in
I. coli (1981). Kong et al. reported the cloning of promoter-containing restriction
fragments from Nicotiana chloroplast DNA and location of the fragments on the
chloroplast genome (1984).
In this study 11 recombinants were obtained which were selected on lactose
MacConkey plates as red colonies. These clones, howev~r,varied in their enzyme
activity (Table 1). As plasmid copy numbers were not much different in each clone,
those variation may reflect the efficiency of the transcriptional start signals and
ribosome binding activities in the chloroplasts. For ihstance, a recombinant
carrying plasmid pMP954-956 (containing the rbcL promoter) had a rather high level
of the enzyme activity as expected. A recombinant harboring plasmid pMP953
(containing the promoter region of the ~subunit gene of H+-ATP synthase) showed
quite a low level of the activity. These results coincide with the fact that the
mRNA ~ynthesis of the (3 subunit gene is considerably lower than that of rbcL gene
(Shinozaki et~. 1983). Therefore, the efficiency of I. coli transcriptional
system may reflect that of the transcription in chloroplasts.
The plasmid pMP905, carrying a promoter region of an unidentified open reading
frame named ORF601, gave the highest level of enzyme activity in £. coli. Analysis
of the nucleotide sequence of the promoter and its downstream region revealed that a
translational initiation codon (ATG) of ORF601 was found 38 bp upstream from the
EcoRI site and its open reading frame was fused to the lac'Z gene in the right frame
(see Fig. 4B). And 12 bp upstream from ATG codon, a sequence (TAAaaAG) partially
complementary to the 3' end of £. coli 16S rRNA (Shine-Oalgarno (SO) sequence)
(Shine and Oalgarno 1974) was found. There was a typical sequence for the
transcriptional promoter signal (TATAAT), called the Pribnow-box(Pribn"ow 1975) or
"-10" region at 53 bp upstream from the ATG codon. There was an unique sequence
(aTTGAat) at 82 bp upstream; called the "-35" region which is thought to be a RNA
polymerase recognition site ih I. coli (Takanami et ~. 1976). In addition, three
possible stem-loop structures can be formed between the "-35" region and SO-like
-16""-
sequence as indicated by underlining with arrows in Fig. 4A. But at the position
493-566 a tRNA gene, whose anticodon was CAU, was identified by forming typical
secondary structure as shown in Fig. 6. This tRNA gene has 94.6% homology (79/84)
with spinach chloroplast isoleucine tRNA gene (Francis and Oudock 1982) located in
IR regions. So this tRNA gene was confirmed to code isoleucine tRNA inM.
polymorpha chloroplasts. This highly active promoter may be for tRNAI1 e(C*AU) and
for the downstream ORFs in the LSC region. Three open reading frames were
identified downstream from the promoter region, and designated ORF601, ORF602 and
ORF603. Their possible gene products were estimated to be 73, 91 and >52 amino acid
residues, respectively. A typical SO sequence (AGGAG) was found seventeen bp
upstream from the ATG codon of ORF602, and a stem structure can be formed between
tOhe SO sequence and ATG codon as reported in tb"e Chlamydomonas rbcL gene (Oron et
~. 1983). This stem structure may have a role to let the SO sequence close to the
ATG codon. ORF602 and ORF603 were identified to be putative genes for proteins
corresponding to I. coli ribosomal protein L23 and L2, respectively. Detail data
are presented in chapter II.
Nucleotide sequence analysis revealed that the organization of this chloroplast
promoter was similar to that of I. coli promoters. Furthermore, these results of 51
mapping using in vivo transcripts from both I. coli and chloroplasts showed that the
transcription starts at almost the same position downstream from the promoter
region. So the gene fusion method described °here could be a powerful technique to
clone and characterize the promoters on the chloroplast genome.
-17-:-
A (3') g A
·(5') G-C C-G A-U U-A. C-G C-G A-U U
U UGUCC U A UA G I I I I I A
G AGUCG ACAGG U C G II leu
AAAGC U UU A . U
C CG A C-G A C-G A-U A-U
C A U A .C A U
Figure 6. Secondary structure of !!. polymorphachloroplasttRNAlle(C*AU)
.deduced from the DNA sequenc~. The nucleotides GGA .at the 3' terminus are not
encoded by the chloroplast genome. C* is putative hypermodified base.
~18-
Chapter II Structure and gene organization of the chloroplast genome
To understand the genetic system in the chloroplast, the nucleotide sequence of
the liverwort, ~. polymorpha chloroplast DNA was determined. The~. polymorpha
chloroplast DNA has been physically mapped previously (Ohyama et ~. 1983). The
gene for the large subunit of ribulose-1.5-bisphosphate carboxylase/oxygenase has
been mapped on the chloroplast genome by heterologous .hybridization with tobacco
rbcL gene. The genes for ribosomal RNAs: 235, 165, 5S and 4.5S have been also
localized in the inverted repeat regions (Ohyama et al. 1983,Yamano et~. 1984,
Yamano et~. 1985). The overall gene organization deduced from the complete
nucleotide sequence is described by Ohyama et~. (1986). In this study, properties
~nd characterization of genes on the LSC region (from psbG to 16S .rRNA gene: 30,600
bp) deduced from the nucleotide sequence are presented and discussed. The region
analyzed in this study is shown as a black box with physical ~aps in Fig. 1. In the
region described here, putative genes for seven tRNAs, ten photosynthetic
polypeptides, thirteen ribosomal proteins and 0( subunit of RNA polymerase were
identified. In addition, an open reading frame (ORF) was found to show significant
amino acid sequence homology to a subunit of NADH dehydrogenase.in human
mitochondria.
MATERIALS AND METHODS
Chloroplast DNA was isolated from cell suspension culture of ~. polymorpha as
described previously (Ohyama et !l. 1982). Chloroplast DNA was cloned into £. coli
plasmid vectors: pBR322, pKC7, pUC13 •. pUC18 and pUC19. Recombinant plasm1ds used
for the nucleotide sequencing were summarized in Table 1. The. locations of
chloroplast DNA fragments used in sequence determination are shown in Fig. 2. Each
plasmid was sonicated (Deininger 1983) by TOMY handy sonicator and randomly cloned
into SmaI or HincII sites of phage mp18 and mpl9 (Perron et!l. 1981). Recombinant
phages containing chloroplast DNA fragments were screened by dot-hybridization with
the chloroplast DNA (Hu and Messing 1982). Obtained shotgun libraries were used for
-19-
.... ··.'.:11
",
_,I.
Nip Ct-DNA
", ",
'F;gu~el~' RestriC:ti6n m~p of ttieM~polymo~pha chloroplast DNA. Narh)w lines
: . with a rrovtheads ~ns idethe.Ci rcLi lar map i ndi c.atei nverted' rElpeat regjons.. LS
'. illdicp.tes the. site ·pt the gene for th.e l.arge sU,bunitof ribulose-l)5-
. bispho~phat~ c~rboxYlas~/oxyge~~se (rbc'L). Th~ sequenced region in thi~ study
is shown as a black box:
:': "
. ('
.. ' ~
-.. r .' r, ~. " " , , ;
-20-
·f
• 'l>OC 1
L,IVERWOR,"
CHLOROPl!AST DNA
LSC
Figure 2. Gene organization of the chloroplast genome from a liverwort. M. ~rpha and sequenced restriction fragments. Thick lines indicate the
inverted repeats (IRA and IRS). sse and Lse indicate the small single-copy region and large single-copy region. respectively. Genes shown outside the '
map are transcribed anticlockwise~ and tnose inside are· transcribed clockwise.
The tRNA genes are i dent,ifi ed by tlie qne-~ etter am~ no aci d code wHh thei r
anticodons g1ven dn parentheses. Asterisks indicate genes having introns in
their sequences (Ohyama et EJ. 1986).. Restriction fragments used in sequence determination are shown outside the genetic map.
-21-
Table 1. Recombinantplasmids used for the nucleotide sequence determination.
'-'---Pl~sm;d-Fra;;;nt;)(k~)----~-P~;~tiorib)---Vec;~;---cl~~i~;-";ite"-------~~~-"";''':'''~~--~-'. -----------:--~-..:..-~..:..----. -----.--.,...----:-----~:.....-----
pMP593 B 10 g (3.8) 51611 - 55380 pKC7 BglII pMP228 B5 (10.3) 47155 - 57496 pBR322 BamHI pMP452 Bg8 (6.2) 55380 ,... 61569 pKC7 Bg1 II pMP708 P8. (5.3) 57bB2·~ .. 62.368 . pUC18 Pst! pMP7T3 ~g13 (2.9) 61569 ~64474 pUC18 BamHI pMP727 P6 (7.0) .'62368 - 69315 pBR322 PstI pMP310 8g5 .. (7.5) 65513 -73011 pKC7 BglII pMP71 0 Pl0 (3.7)' 69804 - 73521 pUC18 PstI,' pMP376 8g3 (11.4) 73011 .,... 84425 pKC7 BglII pMP?06 86 (5.8) 76397 -' 82182 pBR322 BamHI
~- ...... -------..;.---, . -~----....;.....;--.-:.:----, ...... --=-----:'-- ....... ---------~------------.~'":"":"':"".--;;...----a) corre~pond to chloroplast DNA fragments generated by restriction enzymes;
Bg1 II (Bg), BamHI (B)andPstI (P). , b) counted from the:first nucleotide of LSCregion next to IRA'
DNA sequence determination by the chain termination method (Sanger et~. 1977)
using universal primers M3 and M4 (Takara shuzo Ltd.) and buffer gradient gels
(Biggin et al. 1983). The DNA sequence data ,read from 'autoradiogral1)s were handled ~- ,
by personal computer PC-980l usirig the software DNASIS (HITACHI SK Ltd.).ORFs
deduced from the nucleotide sequences were searched for amino acid sequence
homologies with. protein data base (NBRF release 6.0) using the search algorithm
des.crl bed by Wil bur and l ipman (1983). Previ ous ly pub 1 i shed amino acid sequ~nces of
polypeptides in other p1ant species were also used to homology search.
RESULTS
Gen~otgani:zat;on in ':tile 1SC re!j;on (psbtr16SrRNA gerie)
Thenucleotide,seq~~nce .(30~600 bp) from the B9111 site (position 5161l) to
BamHI site. (82188) covering the junction (JLBJ between LSCand IRB reg.ions was
determined~ 'A' computer searcn of the DNA sequence led to the identification'cif ORFs
that"b~gin with the tr~nslati~ni~iti~tion codon AUG and: end at an'either
termiriationcodon(UAA, UAG,andUGA). Thegene.organizatiori deduced from the
nucleotide sequence are shown schematically in Fig. 3. The L!?C region sequenced in
this 'stu-dy was divided into seven blocks (Fig. 3 A-G) depending on the directions of
thetrans'ahonal:ori~ntation (showri by horizontal arrows in Fig. 3). The
=22-
III a. iii
I·
...J U -e
a .0 o 0: i
'" ~ '" 0
D I +·---------=E~--~~~
., ;;; '" ..,
~ u.
'" o. ·0·
• B
III ~ .. C.
F
a. __ _ ··-""" .. ···~ .. ····-··JLB
G
< iii ~-
c !; ~I.L UJ !;; Il. u..:.o .D Ii..
.0:: a: rn.~ c: . 0'0 c. c. 0 11._ ..
G
Figure 3. Detail gene organization of the region sequenced in this study.
The coding regions of genes are indicated .bybold lines. Introns (interVening
sequences) are shown as hatched box. Genes shown on lines a.re transcribed to the right side, and those under lines are transcribed to the left sid~~ The
sequence files are indicated by arrows with the names of sequence fHes. J'LB
indicates the junction site between LSC r~g;ona!1d IRB region.
Each. line indicates 10 kb long.
--,-23-
Table 2. list of identified genes and open reading frames. and their loci on the chloroplast genome.
Gene From To Length Amino acid M.W. Comments (bp) residue
(A5. atpB .:.. 55846 54368 1479 492 53179.3 88.4% (Spi) 62.8% (Eco) atpE - 54362 53955 409 135 15054.3 63.0% (Spi) 22.2% (Eco) trnM(CAU) + 53801 53874 74 94.6% (Tab)
*trnV(UAC) - 53652 53051 602 91.14 (Tob) (Exon 1) 53652 53616 37 (intron) 53615 53086 530 (Exon 2) 53085 53051 35
ndh3 - 52877 52515 363 120 14188.7 30.8% (mit) psbG -52524 . 51793 732 243 27609.6 62.1% (Mz) ORF169 - 51742 51233 510 169 20084.8
( B) 1428 rbcl +56355 57782 475 52790.0 90.5% (Spi)
trnR(CCG) + 57877 57950 74 ORF315 + 58065 59015 951 316 35826.3 ORF36b + 59193 59303 111 36 4017 .8 ORF184 + 59525 60079 555 184 21533.1 ORF434 . + 60151 61455 1305 434 51866.2 petA + 61641 62503 963 320 33482.6 78.8% (Spi)
(C) ORF40 - 62916 52794 123 40 4101.8 12.6% (Cya) ORF38 - 63152 63036 117 38 4479.1 psbF - 63293 63174 120 39 4468.3 89.14 (Spi) psbE - 63554 63303 252 83 9493.5 89.2% (Spi) ORF42a - 63584 63556 129 42 ' 5101.9
(D) ORF31 + 64152 64247 96 31 3465.4 ORF37 +'64370 64483 114 37 4075.9 trnW(CCA) - 54626 64553 74 93.4% (Spi) trnP(UGG) - 54788 64715 74 93.4% (Sp1) ORF42b + 55027 65155 129 42 4746.5 rp133 + 65273 6547D 198 65 7782.1 36.9% (Eco) rps18 + 65498 65725 228 75 8879.5 34.7% (Eco)
(E) rplZO - 66157 65807 351 116 12773.0 45.6% (Eco)
*rps12A - 67057 1 7 123 13797.0 70.27. (Eco) 91.9% (Tob) (Exon 1) 67057 66944 111 38 trans-split (intron ?) 66943 7 7, - (Exon 2,3 coded oppasit strand)
'*ORF203 - 68640 67130 1511 203 22685.0 (Exlin1) 68640 68570 71 - (87.07. homologous to spinach ~-gene) (intron) 68569 68052 518 (Exon 2) 68051 67760 292 (intl"on) 67759 67379 381 (Exon3)
(F) ·67378 67130 249
psbB + 69026 70552 1527 508 ' 56191. 5 BB.2% (Spi) ORF35 + 70669 70776 108 35 3958.8 ORF27 + 70B12 70955 84 27 326B.0 ORF74 + 71092 71316 225 74 792B.3
*pctB + 71424 72566 1143 215 24306.5 85.07. (Spi) (Exon 1) 71424 71429 6 (intron) 71430 71924 495 (Exon 2) 71925 12566 642
*petD + 72715 73690 976 160 17413.5 95.6% (Spi) (Exon 1) 72715 72722 8 (intron) 72723 73215 493 (Exon 2) 73216 73690 475
-~-----~---------------- ----------------~~- .........
-24~
Table 2 (continued).
Gene From To Length Amino ecid M.W. Comments (bp) residue
(G) rpoA - 74824 73802 1023 340 39240.2 25.61 (Eco), 54.11 (Spf) rpsll - 75249 74857 393 130 14172.5 51.5% (Eco), 72.3% (Spi) secX - 75413 75300 114 37 4521.5 62.2% (Eco), 86.5:! (Spi) infA - 75686 75450 237 78 8978.4 56.4% (Eco), 60.3;:: (Spi) rps8 - 76171 75773 399 132 14921.4 45.5% (Eco) I:"p114 - 76621 76253 369 122 13496.6 58.2% (Eco)
*rp116 - 77685 76719 967 143 16149.8 53.8% (Eco), 72.0% (Spir) (Exon ]) 77685 77677 9 (intron) 77676 77142 535 (Exon 2) 77141 76719 423
rps3 - 78396 77743 654 217 25055.0 40.6% (Eco) rp122 - 78B04 78445 360 119 13580.8 37.8% (Eco) rps19 - 79100 78822 279 92 10553.3 63.0% (Eco), 83.77. (Spi)
*rp12 - 80514 79137 i378 277 31162.9 48.4% (Eco). 58.57. (Spi) (Exon 1) 80514 80118 397 (intron) 80117 79574 544 (£xon 2) 79573 79137 437
rp123 - 80B25 B0550 276 91 10768.5 29.7% (Eco) trnI(C*AU) - 81057 80984 74 93.27. (Spi) C*; modified base trnV(GAC) + 81814 81885 72 95.8% (Spi) in IR region
Amino acid sequence homologies are ~alcu'ated as -(identical residue number)
(reSidue number of liverwort product)
Homo logy pe'rcentages wi th gene products of Spi nach (Sp1), I . .£Ql! (Eco). Tobacco (Tob). human mitochondria (mit), Mai~e (Mz). Cyanella (Cya), and Spirodela oligorhiza (Spir) . are shown jn comments.
-25-
A TcmMCAcCAGcmGMTcCMCACCTGcmAGTCTcCGmGTGGTGACATMGTCCCTCCCTAcAMTcTAATMTAnmGcMcGATGAGGTCTGCTCGATMw.TTTTT 56291 OKYGAKFGYGAKTETQPSH GGAGG <rt>c:L
+- <---- --~
TrCTAATTnTCmTGTATAAJ..w.TTTTTrMGTAAAMmTTTCcMTAATAAAACi.TTATTATTGTATATTGrrrTrrATATGTMTGCMCCTAGCTATTGTArTAnAAATAA 56171 +--- __ <--- - - - ---+ <TAACAT <lACGn
AmTAnArirrrrrirArTGATACACArTGACmAAcMTTmCAGTATATAGAnTAAATATATAi"ATATATATATATATATATATATATGAGTATrATATATCTATATCTATAT 56051 +-- -- <-- - --
ATATATATATAcATATATAToomATATATTTTCTCATrAAmAGTATCAAAAmMTCTATCTM;TTMcncMAAATAnMi:AAw.GmAAATATATArTrmCTMG 55931 '--...,----~ TTGTtA> mAAT>
iHAGmmTrATTAnAATrGAmArrTGATACACMTATTTTTTTTATTATAATTlCATTATTAAcTMCTTmATmAW .. W.CAMnnrTAGCTTTTaaMTGTClACA 5S!l11 .~~ H K T PI F L A F G H 5 T
i:TTGTTGCTMAAATATAGGMGTATTACTWaTTATTGGTCCCGTATTAGATGTTGCCTrrrCTCCAGGGAAAATGCcTAATAmATMCTCTTTMTrGTTAAAGATCAAAATTCA 55691 l V A K H I G 5 I· T' Q Y I G P V L 0 V A F 5 P G K H P ~ I Y II S L 1 V K 0 Q ~ 5
GC TGGTGMGAM TT M TGrT ACTTGTGAAGTTCMCM nGTT AGGA!\A T MCAMGTMGAGCTGm;CTATGAGIGCGACAGA looM TGA TGcGGGin ATGAMGrT A TTGA TACT 55571 AGE E 1 N V T C E V Q Q L L G II II K V R A V A H 5 A TOG H H R G H K V lOT
GGTGCTCCnTMCAGTTCcTGTAGGAGAAGCGACTCTTCGACGAATTTTTMTGTTTTAGGAGAACCTGTAGATMCTTAl;GACCTGTAGMGnACTACAACAmCcTATTCATAGA 55451 GAPlTVPVGEATLGRIFHVLGEPVOPlLGl'VEVTTTFPIHR
GCTGCTCCTGCTTTTACTcMTTAGATACcAMTTATCTATrmGAAAcAGGMTTAAAGTAGTAGATCTnrAGCTCCTrATCGTCGTGGAGGMAAATrGGATTArrTGGAGGAGCT 55331 II A P AFT Q LOT K LSI F E T G 1 K V VOL LAP Y R R G G K 1 G L FG G A
GGTGTAGGAMMCAGnCnATTATGGAATrMTTMTMCATCTTGAMaCACATGGAGGTGmCAGTAmGGAGGAaTAGGG~CTCGTaMaGAMTGATCmACATG 5521 I G V G K T V L I H ELI PI ~ ILK A /I G G V 5 V F GG V G E R T REG " 0 L Y H
GAMTGAMcMTCTAMGTMTAMTGMCAAAA.TAmCAGAATCMMcTTGCTTTM;mATGGTcMATGAATaMCCACCAGGCGCTCGTATGAGAGTTGGGTTMCAGCmA 55091 EHKES~V1IIEQIl15ESKVALVYGQIIIIEPPGARHRVGLTAL
ACTATGGCTcAGTAmTCGTGATGTTMTAAACAAGATGTACTTTTATTrATTGATAATATTmCGTTTrGTTCAAGcAcGTTCAGAAcmCTCCTTTATTAGGTAcMTGCCGTCT 54971 Til A E Y FRO V II K Q 0 V L L FlO /I I F R F V QA G S E V SAL l G R II P S
GCAGTAGGATATCMCCMCTTTMGTACAiiAMTGGGAACmACP.AGAMGAATMCrTCTACAMAGMcciATCMrTACTTCTATTCAAaCTGmATGTACCTGCGGATGAmA 54851 AVGYQPTLSTEHGTLQERITSTKEGSITSIQAVYVPAOOl
ACAGATCCGGcrcCTCCMCMCTTTTGCTCAmAGATGCCACTACTGTATTATCTAGAGGmAGCAGCrAAAGGAArTrATCCTGCTGTAGATCcrrTAGATTCMCTrCTACMTG 54731 TOP A PAT T F A II lOA T T V L 5 R G L A A K G I Y P A V 0 P LOS T 5 T H
TrACMCCTTGGATTGTAGGTGAAGMCATTATGAAACTGCGCMGGAGTMMCAGACmACMCGATACAMGMTTACMGATATTATTGCTAITCnGGmAGATGMTTATCT S4Gl I LQPI/IVGEEHYETAQGVKQTLQRYKElQOIIAILGLOELS
iiAAGAAGA ICG m AACTGTAGCMGAGCACGCAMA T AGi.MGA TTm;' TCACMCCTTrCTTTCT AGCAGAAGnm ACAGCTTCGCCAGGAAAA T A TCT MGTCTY AGAGAAACT S4 4 91 EEDRLTVARARKIERFLSQPFFVAEVFTGSPGKYVSlRET
ATAAAAGGAmCAMTGATTCmCGGGAGAATTAGATAGCCTTCCTcMCMGCArrrTAmAGTAGMMTATAGATGAAGCTACTGCAMA.GCAGCTACmAcMcTGGAGAGT 54371 I I( GF Q HI L S GEL OS l P E Q A f Y L V G" 1 0 EAT A K A AT L Q V E S
AtpE. GGAG
TMAAATTATGCTAAATCTTCGTATCATGGCTCCTMTCcMTTGmGoMTTCGGAl'I\TrCMGAAArTATTTTATcMCGAATAGTGGaCAMTTGoMTACTACCTAACCATGCTT 54251 II l II l R I II A P II R 1 V 1/ II SOl Q Ell L 5 T II 5 G Q I GIL P II HAS
CAGTmMcTGCmAGATATAGGAATTGTCAAAATACGCCTTMTGATCMTGGTCTACTATGGCATTMTGGGTGGrTrrGCTATGATrGACAATMTMmAACTATTTTAGTTA 54131 VLTALDIGIVKIRLIIDQIISTHAlIIGGfAHIOII,,"LTILVIl
ATGATGCTGAMMGCTAGTGAMTAGATTATCMGAAGcACMGAAACrmCAMMGCrAAAAcAMmAGMGAAGCAGAAGGTMCAAMAAMi.GMATCGAAGCTCTATTAG S4011 OAEKASEIOYQEAQETfQKAKTIIlE[AEGII~KK[IEALLV
TTmAAMGAoCTMAGCAi.cATTAGAAGCMTCMTATGaCATCAMGTrATAAATTAMTAATTMrTMTAATTAAAAAmATArTAGATGCCACTrmCTGGcATCTAATATA 53891 f K R A K A R LEA r II HAS K L -- +--> <-~ 1-1 - __ ~ __
MTT~cnACcTACTATTGGATrTGAACCAATGACTCTCGci:GTATGAMGCGATACTCTAAACCACTGAGTrMGTAGGTATTTTATTTcAMmATTATMcTATATA 53771 --+ J '-AUGGAUGAUAACCUAAACUUGGUUACUGAGAGCGGCADACUUUCGCUAUGAGAUUUliCUGACUCAAUUCAUCCA-5 ' <Mel-atl < 1AATAT
ATMCTTTTMTGCAAGATMcTAAAAAAMTTGAATCTAAACCTTGAcMTAAATAATMAAAAaTATATATATATATTGTATAGAAMTMTACTMTATMCTCGTcAGTATTMAG 53651 <TACCTT TTGTAT> TATAAC> s'-AG
+-><---l- V.HJAC,
GGCTATAGCTCAGCGGTA(lPi;CGCCTCGrrTACACGTGCGCcMTGCTTATCAMMmrmTCGArrTGTCGATTCACMcTMMGTmCTMnTrrGTGAMTAGAAAATGTC 53531 GGWADAGCUCAGCGGUAGAGCGCCUCGWUACACgugyg •••••••••••••••• (1 n tron) ••••••••••••••••••••••••••••••••••••••••••••••••••••••••
TTACTCmci.rTTAATCArTGAGAAAAATAGCCTGACAcMATMmCTATTAmArTrGAMTTAcAnmAGTTGATATGGTTMmmCTmAATGTTArTACATGATGA 53411 •••••••••••••••••••••••••••••••••••••••••••••••••••••••• (lntron) ••••••••••••••••••••••••••••••••••••••••••••••••••••••••
GAA TT ACGGGGAACTCAAGA T A TTCTTTTTTrGCm ATaAAATTTT MGG TG TAT AAAA mCA TA TT A TrTT AGCAAci.GAMCTCTTT A TTGAGTMA TCCA TGT MAAAACAMCC 53291 ........................................................ (lnlron) ...................................................... ..
TMGTCM TAmGATMnTrrGAAAAAcTnGGGATTGTATTAAMrrTmAGAArrTrMGCAMCGcAACCATCTlATIATTMci.MAAAAAGci.TGGAAAATMCTAAATTM 53171 ........................................................ (lntron) ....................................................... ..
Figure 4A.
-26-
CACTTAGTTMTAMCGAGCCW1CCA1AAAMCATGCATGITCCGTrCITMAGCAGTTCITMITTMAAcMCTGrTrrACCGAGMTGTCTACGGTrCAAATCCGTATAGCCCTA 53051 ................ rog ccg-•• g •• -g ••• ~uuC.u9 • ...,9~ uuy •••••••••••••••••••••••••• cU'YYrayCGAGAAUGUCUACGGUUCAMtJCCGUAUAGCCCI)J\-J'
AATfTCTfTrTTTTTATAAMTGMMMGGTATACITCATCATAMAGATTAGTTMTMAAmMCTAAAMATCTMAGITCMATAAcTACAATTATMTMTMITGACCMTT 52931 <~---t +---- --- <----
TAMCITrrTTmATAMCnGAMTTATATATATTTmAGCAGGrnTrMTGfTTTTAcrrCAAA.t..i.TA1GATTArrrrITCaTArTTfTATIMTAATTAGrrrTTTCTCMTAC 52611 _________ --, ndh3. AGGAGG H f L L Q KYO Y r f V f L L I I S f f S I L
TAATTrrTTCITTGTCAMATGGATAGCACCTATMATMAGGACCTGAMAAmACAMTTATGAATCAGGTATAGAACCGATGGGAcAAcCrTGTArTCMmCAMTTCGATATT 52691 I F S L 5 K \I I A P I f1 K G P E K f T S YES G I E P H G E A C I Q f Q I R Y Y
ATATGmGCTTTAGfTTTTGTMTTmcATGTAGAAAcAGTTmCmATCCITGGI;CTATGAGTrrTrATMTTTTGGTATATCATCTTfTATIcAAGCTfTMTTrHATTTTM 52571 H f A L V f V I f 0 VET V FLY PI/A M 5 F Y f1 F GIS S F J E A l I F J L I
rTTTMTTATTGGTTTACTATATGCATGGci.M.w;GAGcACTAGMTGGTCTIAMmCMATTfTTTAcTIGTGAAAi.TAGTTTAGAGGATMCTCTACMCTATGCITMAAATTC 52451 LIIGLVYAIIRKGALEIIS~
p.bG, /\IlGAG H V L II F X F F 1 C· l' 11 S LED " S T T H l X II S
TATAGMTCTTCTTTTATTMCAMACTCrTACAAATICMTTATTTTAACMCmTAATGATTfTTCTAATTGGGCTMlACTTTCTAGTCTATGGCCACTCCIDATGGTACMGTTG 52331 I E.5 S· Fill K T L T 11 S [ J L T T F fl 0 F S II II A R l 5 S L 1/ P l L Y G T S C
TfGmfATTiw.TTTGCATCATTMTTGGTTCA[GATT[GATmGATC~TfATGGmAGTACCTAGATCCAGCCCTMlACMGCAGATrrGATMTMCAGCTGGTACrGTMCTAT 52211 Cf J E FA S L I G S R F 0 fOR Y a L v P R SSP R Q A 0 l 1 1 fAG TV T H
GMAATGGCTCCTTCmAGTTAGATTGTATGAACAMTGCCTGAACCAMATATGTAArTGCAATGGGTGCGTGTACTATrACTGGAGGCAfGTTfAGT.;'CAGATTCATATACTACAGT 52091 KHAPSLVRLYEQMPEPKYVIAMGACTITGGHfSTOSYTTV
TCGTGGGGTAGATAAATTAATrCCTGTGGATATTTATTTAi:CTGCATGTCCGCCTAMCcAGAAGCMTTATAGATGCTATMTAAMCrTCGTAAAMAATAGCTCMcAMTAfATGA 51911 RGVOKLIPVOIYlPGCPPxPEAIIOAIIKLRKxlAQEIYI:
~ TCC TT AAAAMGGAACCAcA TTfTTT ACTIT AM TCA TCM TTCAA TTrrrmCAM Ti:! AGACAA TCCAMACT AAcTTccTCAAACCAA mrrccM TeT AAAAA 51651 E K K ILK K GT R F f T L 1/ 11 Q f 1/ f f $ 11 LOll P K L T $ 5 II Q f F Q S K K
MCTTCTAMGTmATTAGAAACATCmMCAmAAAGMAAGGAAM mATAAATATMccTmAcrmGACTAAMAAAMTAAM.w.GTAAAAATAATATimAAACAIT 51731 T 5 K V L LET S L T F K EKE f1 L ~ +-- -> <- --> ORf169. H L 11' J
nmMATAACAATMI"AA;..TACAAGG.Ai:Gm"'TCTj\mG(mMnAAGCMAATTTAAMC"'CA~CCTH~TmIil\TmWGGAAT ... Cii-.AAcm ... CA~TTkATCT 51611 l K fl fl /j 1/ K J Q G R LSI 1/ L I K " fl L K " R P L G F 0 Y Q G [ E T L Q IRS
. 8g1l1
B TGTAGACArTCCAAAAGCTAAAMATHGITncATMAATAAAAAaTTAGTTAATMTGAAATTATMTMAMMATATTGTGTATcAAATAMTCMTTMTAATMMAAAACTAC 55930
T S H G F A. L F H T K H •• lpe
crrAG.W.Ai..ATATATArrTAAACTTTTTCTTAATAHTnGAAGTTAACTAGATAGArTAMrmGATACTAMTTAATGACAMATATATAMmGATATATATcrATATATATAT 56050 <TAATTT <ACTGTT 1-- -----
ATATAGATATAGATATATAATACTCATATATATATATATATATATATATATATATAmAMTCTATATACTGAAAAATTGTTMGGTcMTCiTGTATcAATMAAMAAATAATAMAT 56170 -- - ~--. -----+
TT A m M T M TACM TAGCTAGGTTGCA IT ACA TAT AJ.MAACM TATACAA T M TM TGrm A iTA nGGAAAAM Trnr ACTT AAAMA TTmTATACAAMGi.AMA TT AeM 56290 TrGCAT. TACMT. +- - - - -----, ...,.-- ---. ~
AAAAATTnTATCGAGCAcACCTCATCCTTGCAAGAATATrATTAGAmGTAGGGAGGGACTTATGTcACCACAAACGGAGACTAAAGCAGGTGnccATTCAAAGcTGcTGITAMcA 56410 roc!> GGAGG H S P Q T E T K A G V G f K A G V K D
t-- ---.
TTATCGATTMCITATTAcACTCCCGATIATGAGACCMGGATACCGATATTTTAGCAGCAmAGAATGACTCCTCAGCCrGGAGITCCAGCGGAAcAAGCAGGCMCGCAGITGCTGC 56530 Y R L T Y Y T P 0 YET K 0 T 0 I L A A F R H T P Q P G V P A E E A G fl A V A A
TGAATCTTCMCTGGTACATGGACTACAGTnGGACTGATGGTCTTACTMCCTTGATCimATAAAGGTCGATGCfATGATATTGACCCTGTTCCTGcAGAAGAAAATCAATATAITGC 56650 E SST G T Ii T T V Ii TOG l T II LOR Y K G RC Y 0 [ 0 P V P GEE Il Q Y J A
TTATGTAGCTTATCCm...GAmATTTcAAGAAGGGTCTGITACAAATATGmACTTWnGTAGGTAATGTAmGGGmAMGCmAAGAGcGTTACGTCITGAAGATTT.uG 56770 YVAYPlOLfEEGSVTIIHFTSIVGIIYFGFKALRALRLEOLR
MTTCCTCcAGCnACACAAAAACTfTCcAAcGTCCTCcTCATGGTAITCAAonGAGAGAGATAAATIWCAAATATGGTCGTCCT1TAnAGGATGTACTATTAAACCAMATTAGG 56890 I P PAY T K T f Q G P P II G I Q V E R 0 K L II K Y G R P L L G C T I K P K l G
mATCTGCTAAAAATTATGGTCGAGCTGTATATGAATGTCTTCCTGGTGGACTTGArrTIACTAAAGATGATGAAAAcGTAAAnCTcAACCAmATGCGnGGAGAGATCCITTCrT 57010 L S A K " Y G R A V Y EeL R G G l 0 F TKO 0 E N. V H S Q P F H R W R 0 R F L
A TTTal ACcAGAAGCTA TTTAT AM TCTcAAGCAGAAACTGGAGAAATci..AAcGACA IT Am AM TGeT ACTGCAGGr Au. TGTGAAcAM TGCT AAMAGAGCAGCA TGTGCTAGAcA 57130 FVAEAIYK5QAETGEIKGIIYLHATAGTCEEHLKRAACARE
GITAGGTGTACCMTIGHATGCACGATTACTTAACTGGtGGmCACTGCAAATACTAi;rCTGGCTrrTrATTGCCGTGACMTOOmACITCTTCATATTCACCGTGCMTGCATGC 57250 l G V P I V H If 0 .y l T G G F T A II T S l A· F Y C R 0 " G L L L HIll R A H " A
AGnATTGATAGACAAAAAMTCATGGTATACATTTCCGTGTATTAGCAi.AAGCmACGTATGTCTGGTGGAGATCATATTCACGCTcGTACTGTTGTAccTAAACT1iiMcGAGACcG 57370 V lOR Q K 1/ H G J I! f R V L A K A L R H S G G 0 II I HAG T V v' G K L E GO R
TCAAGT MCTTT AGGmcGTAGA m ACITCGTGA TGACTATA TTGAAi.AAGA TAGAAGTCGTGGTA mAmCAcAi:AAGA nGGGTrrCmACcTGGTGfTTTCCCTGT ACCA TC 51490 Q V T L G f Y 0 L L ROD Y J E K 0 R S R G [ Y f' TQ 0 1/ V S LPG V f P V A .5
Figure 4A. and. 48. I _ ':. ~: :
-27-
TGGTGGGATCCATCmGGCArATGCCTGCITTAACTuW,:mTTGGAGATCACTCTGTmAr.Ai.nl:CCTGGTccMemAGGTcATCe'rTGGGGTMCGCACCTGGTGCACITGC 57610 G G I' 11 V 1/ H H j> A' L TEl F G· DDS V l Q F G GaT l a H P II Gil A P ·G· A V A
TAACCCAGTTTCGlTAGAAGCITCeaTAcAAGCAcGTMTGAAGGTCGTi:ATCITGCTCGTCAAGCAAATGAAATTATTCGCCAAGCTTGTAAGTGCAGTCCTGAGTTATCTGCTGCITG 57730 IIRVSlEACVQARJ/EGRDLAREGIIE r 1 REACKWSPELSAAC
TGAAAmG~TTMAmGMrTrGATAiTArTGATACmGTAAMTMAGi-AGATAmrATCTTMAAAmrGTAATTTrCTTTTmTATCTCAGArTrCAGATAAAA 57B50 E . 1 11 K ElK F E F 0 1 lOT L .~- +--~. <---+ +-- > <---
AAAAGAAAATTACAAAATTnAAAATGGGitTGTAGCTcicTGGATTAGAGCTCATGGTTCCIlAATCATGAAGTCMGcGTTCGAATCCCTTCTAACCCTnnCTTArTrlGAATATrT 57970 ___ ~_~'_~>5'_GeGUUUGUAGCUCACUGGAUUAGAGCUCAUGGUUCCGAAUCAUGAAGUCAAGGGUUCCAAUCCCUUCUMCCCU-3'
At<J'-CCG>
TTGTATMioAA.wo.AATTGTTcTATrATTMATTACITAAATTMAATTi-mCATATATATTTTTTTT~GCATTrfTTTATGTcmAATGAArTccmGAAGA 58090 DRF316. GAGe H S t Ii II 1/ FED
TAAACGAAcATTTGGTGGAitAATTCGCGCrmATTc.Ww.cCTACTmCGATATATrmAGTGAAAGACAAAAAGATCGATATAitMAATTGACACTACTAAcGCATTATGCAC 56210 K R R f G,G L 1 G A fiE KAT KG Y I F S ERE K 0 R Y r KID T T X G L 1/ T
TAGATGTGAi:MnaCGAAMTATGTTATATGITACATTiTTGAGAwAATAAACGMmGTGAAGMTGTGGATATCAmACAAATGAGTAGTAcAGAAAGAATTGMcrmMT 5BJJO R C D 'I CEil H l Y V R f L R Q 'I K RIC E E C G Y H L Q H SST E R 1 ELL I
TGATCGTGGTACTTGGTATCCAATGGATGMGATATGAcTGCTCCAGATGTTCTTAMrTnCTGATGAAGATTCTTATMAAATCGAAitGCTTTTTATCAMMCcAACTGGTTTAAC 58450 D R G T .11 Y P H <0 E 0 H TAR D V L K f S 0 E D S Y K /I RIA F Y Q r. R T G L T
iGATGCMTTCMACACGcATAGGTCAArTAMCGGTArTCCTAfrGCATrGGMGTTATGMTTTTcMTTTATGGGCGcJAGTATGGMTCTGTAGTAGGTGAAAAAATTACTCGTCT 5B570 DAIQTGIGQllIGIPIALGVIIDfQFHGGSHGSVVGEKITRL
TATTGAATATGCTACTAGAGCArCAATGCCATTMTTATAGTATGTTCrTCTGGTGCAGCACGCATGcMcMGGAACATrAAGCTTAATGCAMTGGCTMAATTTCrTCGGTTTTGcA 58590 lEY A T R A. S H p. l 1 1 V C SSG GAR II Q E G T l S l H Q H A K ISS V L Q
MTTCATcMGCCCAAAMAMTTACTTTATATAGCTATTCTTACCTATCCTACAACAGGAGGAGTTACiGCAAGmTGcTATGTTAGGGGATAfTArTATTGCTGAGCCAAAAGCTTA 58810 I II Q A Q K R L l Y 1 AIL T Y P T T G G V T A. S F G H L G 0 I I I A £ P KAY
TATTGCATTTGCAGCAAAA.i.GAGTTATTaMCMACmACMCAAAAAATACCAGATc;GTTTTCAAGrTGCAGMTCATTAmGATC~TGGTTTACTTCAmAAfTGTTCCAAGAAA 5Jl93D I A FAG. K R V I.E Q T l R Q Kip 0 G F Q v A E S l FOil G L L 0 l 1 V P R II
TCTrTTAAAAcGTGTTrrAAGTGAMrnl:TGAATTATATAACGCTGCTCCTTGTAAAAMmcAAMitccTTTTTTMATMTTTTGTTA(lACmTAGTAmTAGTAGmTTTT 59050 L L K G V L S ElF ELY /I A A P C K K F Q NSF F K -- +-
TTAATTCAAAnmAATMAATATTATATTTTATTATTATTATTTATTAAAATAAGATATAATTTTTAmAGTTTAGTmAMTAAMTATAGTATATATTATATTTATTTTmn 59110 -----,.> <------~- ~-~ --->
ATTGTATTTTMGGTAnrTnATGACAGCnCITAmACCTTCTATTTTTGTTCCrrTAGTTGGATTAATTTTTCCTGCTATTACTATGGCTTCATTATTTATAfATAITGAACMcA 59290 ORf36b> AGG < H . T A S Y L PSI F V P l V G L I F PAl T HAS L FlY 1 E Q 0
TGAAATTTTATAMTAAATTGGAGACTAAMATITnnl:TACTTITTATMAATATATATATGITATATATCAATTnl:nGTGATATATAITCTATMTACmGTc.i..aMGTMAAA 59410 ElL --- +--~-- <--- ~ - ------
AMTTITTAATTTAITGTTATTAITATAATATTGATTATTTTTATATTcAA,w.TMTCMTAmmTrATCMCArTATTATTCAACTAGTTTAGcAGACATTCrrTTGTTATGAAT 59530 ---r t----> ORFl84. AGCAG fI II
TTACAAGTGCACCATATTAMOTAGATfTTATAATAGGATCTCGAAGAATMGTMTTTTrGTTGGGCrmATTCmTAmGGTGcATTAGGTmTTTmGTTGCATTTTCTAGT 59650 l Q v 0 II 1 R V 0 F I I G S R R r S II F C \I A F 1 L L F GAL G F F F.V G F S S
TAffiGCAAAAAGATTTMnCCTTHTTATcAGeTGAACAAATTnATTTATCCCAcAAGGAAITGTAATGTGTTmATGGTATTGcAGGrITATTTATTAGTTTTTATTTATGGTGC 59770 YlQKOLIPfLSAEQILflPQGIVI'ICfYGIAGLflSfflllC
ACTATTTGrTGGAATGTCcGTAGTCCCTATMTAAAmGATAAACAAAMGGAATATmCTATTITTCGTTGGGGArTrCCTGGAAAMATCGTCGTATTITTATTcMTTmAAn .59890 TIC \I Ii V G S G Y /I K F '0 K Q x G 1 F S 1 F R II G F P G X /I R R r F 1 Q F L I
AAAGATATTCAATCAATACGAATCGAAGTTCMGAAGGTitTTTATCTCGTCGCGTTCrTTATATMAAATAAAAGGTcMCCAGATATACCTTTAAGTAGAATTGAAGAATATTTTAcA 60010 ~ a ! Q SIR tI ( V q '£ G f L 5 R R V l r I K I K G Q P ~ I P L S R ! E £ f f r TT AAGAGAAA TGGMGA r AAAGCTGCTGAGTT AGCTCGTTTTTT AAAAGmCTA TTcAAGGTA m AAACTTmA TT ACGTCTTTTTi-r AT AAM TAT MMA TATGCTGmTTTT A 60130 LREMEDKAAElARfLKVSIEGI- _. <-->
GCAAA TT AT AM TAGTrnl: ATGMGAA6M TTTT AGTT i. TTGGCGAA TnnCA TCAcA TTTTrCCTCTrCCA TA TTGitem AGAAA.o.AGCA TAT WGCCAGT ,w,CGTATAc.w. 60250 Dlll'434. H K K N F S Y W R I F Ii II I f ALP res l E K ArK ASK R I Q K
AAATAAAGWGAITATTTTTTGTATMAAATA1ACTTTTTTCATCAAAicGTTCTTGGCACTCTATTcitrrrTATATAGATACAGAATTAAATMTTCTGTTITTAAMTATAmcA 60370 1 KKDYfLYKlI1 LfSSKRSIIQSllFYI DTEllIlISVFKJ YLS
GTCTTHAcAATATAAATTAAGmGTGGitAATTCAGcTTTTTCTAATTrTTTCTrTAmTT~TTCAAA.i.TTTCATTTAATTCTACCAMTATTAAT~ 60490 l t E Y K l 5 L I.' l 1 Q L f l I f 5 L f f K K U S K f {J L I L PilI II £ KKK K
AGAGAAAMTAAACAc.w.ATTAt;CTTGcATTAGAGCTACTCTAAATGAmAGNlAGrTGCAGACGTTACTAmATTnCTTcmTTTATcmAcATAi.MM~TAArT 60510 R K 1 /I R K L A \I 1 RAT l /I D L E 5 II R R Y Y l F S 5 f L S L 0 K K E XII II f
TTTCTTTTTTACAMTGAAAAGTTCTACATrGACAGCTATAGCTTATcMTCTATAGGTCTTGTACCACGITCTATMcACGAACTTmCMGAmAAAGCACAGTTAACAAATCMT 50730 5 F l Q ,II I( 5 S R ·L T A I AYE S I G L V P RS I T RTf SR f /( A E l T II Q S
CAAGTTCGcTrGTA TT A.v.Ac.i.A m AGGrr AGCAAAA TATCMGCGTTGcCncrr'r Ai:AGTATA TTcGCTGm A ninr A TTCCrTTAGGAGmciTnTmrTCAAAM TaCT . 50B 50 S S l V L KEf R l A K Y Q A LAS L Q Y I G C 'L F F I P L G V S f F f Q K C F
Figure 4B (continued).
TTTT /lGAGCCCTGGII TTcAAM TTGGTGGM TA m AreM TCTCAAAITn-rrrGIICTICA me' "GoA AGAA .. AGcTrr MAAAMCTTCMGAAA nGAAGAAl:TImTCCTT Ai; 60970 L E P II I Q II \I \I fl' [ Y Q S Q [ F l T 5 r (f E E K A L K K L Q E I ['E L FII L 0
ATMAGlAA;GACAlllnu.TCAAAc.MJ..i.TAtMTTGcAAGAmGACTMAGAMTTCACtMCAAAW!CGAIITTi.GTTCAAATTrATAATAA1~TAGTATTAAAATTGrmAt 6\OSO K V H T Y 5 S II" K I Q L Q D L T K E I II Q Q T [ E L V Q [ Y II " D S, [ K [ V L "
ATTTGCTAACTGIITCTCAmGGmATTACTTTAAGTTGTTTATTTAmn,GCAAAAGAACGTCTTGITATTTTAMnCTTGGGCTCMGMTTGrTrTArAGCTTAAGCGIITACGA 61210 L LTD L I II FIT l S C l F 1 L G K E R L V I l" S \I A Q ELF.Y S L 5,0 T H
TGAAAGCTTTTTTTIITTcrrHATTAACTGArrTATGTATTGGATTTCA1TCW;TCATGGTTGGGAMTIGTAATAAGC1CTTGTTTMw-CAnnw.rmmCATAATAAACATG 6;"G K A r F 1 L L L T 0 Lei G F Ii S P II G II E [ V ISS C L E H F G f V " II K II V
TAATTTCGTGTTTTGTTTeMCllmCCAGTAATTTTAGACACAGTCTrTAMTAmGATTTTTeGTCATTTAAATCGTATATCGCCrTCcATTGTAGw.crTATCATACTATGAATG 61450 I S C F V S T F P V 1 LOT v F K Y L I F RilL II R 1 S P 5, I V A T Y H T H II E
AATAAAAMTrCMAArrTIAGGTCTTTnTGTTTACATfMATAAAAITATTTCATATI!",mATATIMMGTAGAAMTTTTTCmATTTATTt.TrATTGTTAiU-TAA1GGCt.G {,\510 .i..~)o (-. +---~:> ,,---t- +-> <-.
ATCTTGTAAMTTGIIGTAGmAAACAATAAAACTATTCTrAAAAATTATTTGAAATAMTAATCTAACTATGCAAAAcAGAAACTTTMTAACTTGIIrTATCAAATGGGCCATTCGArT' 6\690 +----> <--4 +--> <----> H Q II R II F II II L 1 I K \I A I R l
p.tA>
AATTTCCATAATGIITTATTATAAAIACAATATTTTCGTcATCTATTTCAGAAGCCTTTCCTAmATGcACAACAAGGrTATGIIAAATCCACGIIGMGcTACTGGIICGTATTGTATGTGC 61810 IS, J K I 1 I Il T I F W 5 SIS E A F P 'I Y A Q Q G Y' E " PRE A T G R 1 V C A
TAATTGTCAmAGCT~CCGGTTGATAnGIIAGnCCCCAATCTi;TTTTACCAMCACAG1GrriGAGGCAGHGTCAAAATTCCTTATGIITATGCAAATAAAACMGTACTTGC 61930 II C Il L A K K P VOl E V P Q S V L P 1/ T V F E A V V KIP Y 0 H Q I l Q v l A
T AA TGGT ~GTTCTh AM TGTTGGAGCAGTTCT TA mTACCAGAAGGTTncAA TT ACCTCcTrCTGII TCGIIA TTCCTCCTGAAA TGAAAGAi.AAAA TTGST AA TCTTTnrT 61050 II G K K G 5 L Il V G A V l I L PEG F E lAP S 0 RIP P E H K E K I G II l F f
TCAACCCTATAGTAATGAT~TATmAGTAAIAGGTCCAGTTCCAGGA,W.M.UATAGTGAAATGGTTTTTi:CAATTCTCTCTCCAGIITCcAGClAClAAcAMGAAGcru. 62110 Q P Y S fl 0 K K /I I LV) C P v P G K K Y S E M V F P I l S POP A T Il K E A 11
TTTrrrAAMTATCCAATTTATGrTGGTGGTAATAGACGGAGAGGACAGAmATCCTGATGGMGTAMAGTAATAATACAGTTTATAATGCTTCAArTACAGGllAAAGTAAGTAAAAT 62290 F L K Y P J Y V G G II R G R G Q 1 y, P 0 G 5 K S 1/ II T V Y " A SIT G K V S K J
TTTTCGTAMct..w.GGGTGGGTATGAAATAACAATTGlliGATATTTC~TGGTCATMAGTTGTTGATATTTClGCTGCAGGIICCAGMCTTATTArTrCAGMGGTGAGCTTGTGIIA 62410 F R K E KG G Y E J TID 0 ISO G Il K V V D ) S A A G P El J J S E GEL V K
AGIAGIITcAACCTTTAACTAATAATcCAAi.TGTAGGTGCGmGGTCMGG1GIITGCTGi.AGTAGTACTTCAAGATCCATrACGlATTcAAGGTCTTTTATTATTTTTTGGATCACHAT 62530 v D Q P l 'T I' II PI/V G G F G Q GOA E V V L Q 0 P l R 1 Q C l L L F F G S V I
TTTAIlCACAAAIATTTTTAi;HCTTAAAMGAAACAATriGAAAAAGTACAATTAGCAGAGATGAATTmAAm~TAGTAAATrAAGcrAATi.TTMTACTAmAATAAAM 62650 L A Q J F L V L KKK Q F E K V Q L A EM " F - +'---- -> <----~ +
AAIGGAAAAATGGllACATTCATTTTGGTTAATGTTCCArTTTtCCATTTGTTi1AAAAAMATATATAAACTTTTTCccAGmAAATAAACAGTAAAGIITrCMTAnTITAT/lTTCAM' 62770 ----------~--,
AAAATTATATrrTTTGTTTTGTATTATAAAat.TGATCCTAA1CCAGllATATGMCCAlAAAAAAAGIITACCTACTAAACCGIITCACAAGGATACCAGClACAGTACCCATrMCCACAM 6Z890' <--~ _. L SSG L G S Y S G Y F F [ G V l G I V l I G A V T GIL ~ l
<OIlf~O
c .. . . .. .. .. .. .. .. .. .. .. AClAACGCT AA lCTT AAAGCACCGA TT AAI>MAAGAAAA T MCl AA TT A TTGT AAGCAT MAMA TeCT AA T MCMT MAAAAATT AAG TATAAA TACCAAA HA H AT AT AAAA TTTT 64091 V l A L T lAG I l F L f Y S J [. l L H AGGA <ORF31 '
---> .. .. .. .. .. .. .. .. . .. .. .. CAAATAAAAATGGTTTITTTATATATAAAGIITATATATATATAATmmTATTMTGllAAATTATACTTAATATAATATTAATATATATTATTATATATATATTATTTTGllCAAAAAC 63971
---- -- - - __ <TAlMT <AClCH +- - --- --->
TATTTAAAATAGATTGAAAAi.AA.w.AAeTTAAATAMTTAAGTAATATAAAAAATAAnTATTll1TJli-AGTGTAGIITAATTATTTCGGGIITAGIICcAATCTGTAAATATTACATATT 63851 +-- -> <-----+ +-- -> <--
.. .. .. ... .. .. .. .. .. .. .. .. GTGCAmGMCTTCGTTMmATTmATATAmATATATATAATTGTAATACATCTAmCATATAATTATTAATTAAATTMTAAmATTGTTTTAACTTATTTMmTTTTG 6)731 + TTGJ\AC, TATATT> > <---~
AA TTT ATCTTGCTGCGT AAAi.AGAACA TT AGCTATACT MGTT AGTATGCTrCAAAAA TACCTTTGGTA T AAAAACAAcAACCT AACAGGGm AAMGTAA TTTTCAGGAAGTTTTT AA 63611 1lIIf42a> Ht Q K Y L 1/ Y K " PI II LTG F K S " F E £ V F " ,
TCCTCTTATTrrTCGGATGllmCCCCCTTGmAAAAAAi.ATATAlGGAGClAACATGTCTGGllAATACGcGAGAcCCTCCTTTTCCloATATAATTACCAGTATTAGATATTGGCTM 63491' P L 1 F G K IS P L f K K II 11/ S -H S G H T G E R P fAD fiT SIR Y \I V J
p.bE> GGAG
fCCA T AGCA TCACTATACCTTCTTT A m A nGCAGGTTGGTT A mGTcAcCACAGGGrTCCmATGil TGTGTTCGGAi.GTCCTCGTCCAAATGAA TA TrrCACAGi.AAACCGACAAG 63371 IISITIPSlflAGlIlFVSTGLAynVfGSPRPIIEYFIEHRQE
AAGTACCACT AA 1 AACTGGCCcTTTTAATTCCTTAGAACAAATTGII lGAAmAtMAI; TCCTTTT AGGllGcCATTAA TcACTATACATAGAACTTATCcAATTlTTACcGTAAGil TGeT 6)251 V P t [ T G R F "5 l E Q IDE F T K S F - H TID R T Y P 1FT. V RilL
, p<bF. AGGAGe
TAGCCGfTCACGGIITTAGCTGTACCTACAGfTTTCTTTTTAcGTGCAATATCAGCAATGcAGTTTATTcAA.a.c...TMmTAAAAAAAArTrAGAACTATGACACAACCAAAICCAAACA ,53131 A V II G' L A V P T V F F L G A [ 5 A H Q F I Q R - O!!f38>H T Q P " P " K
Figure 48 and 4C.
-29-
AACAAAGTGTAG,.v.TTAMTi;GTACTAGCCrCTATTGGOOTTrATTAi:TMTATTTimcTrGCTGnTrATrrrCTMITAmmTMTTAMAArrITTTTTAAATMTTCTTTAC 63011 OSVEl~RTSlYWGlllIFVlAVLFSHYFFH-
+- ~-> (~-~t
i:nATcT~ci-mrTCATTATTTGTA.mGmATGiTriGAGTTATMCrAmMTAA.v.TTTIGAA.i.AGGAGTA.MnCMTGGCcAATACTACCGGAAGGGTTCC 62891 f-'.--> (_-+ ORF40> AGGAG II A 11 T T G R V P
TnGTGGCTMTCGGTACTGTACCTGGTATCCTTGTIlATcGcTTTAilTAGGTATCTTmTTATGGTTCATATTCTGGArTAGGATCATCmATMTAcMAACA.llAAAATArMTTTT 62171 l II L 1 G T V A Gil V 1 G l V G I, F f Y G S Y 5 G l G 5 S l -.. +--> (-
rTTGMT AT MAM TA rrGAi. TCm ACTGm A TTT MACTCCGAAAAAGm ATATA rinnnr AACAAA TGGAAAM TGGMCA IT MCCAAAA TGMTGrrcCA nTTTCCA TT' 62651 -to +--> <--+ +-:- ~-> <-- --t I (
rrmATTAAAIAGTATTMTATTAGmMmACTArrimAAATTMAMTTCATCTCTGCTMTTGTACTmTcAA...TTGmCnTTTMGMCTMAAATAmGTGCrMA 62531 + (_ ---+ • ...., f H II E A l Q V KEF Q KKK l V L r 1 Q A l
<petA
D rrAAAAACrTCCTCMAA rTACrrrTMACcCTaTTAGGrrGTTGTTTTTATACCAAAGGTA TmTWGCA TACT MClT AGTATAGCr AA rem ri-m ACGCAGi:AAGA TAM IT 6373()
II F VEE f II S K F G T L 11 " II K Y 1/ l Y K Q l II <1JRHZ.
CAA.MA.MrfA.MTMGTTMAACMTMATTATTMmAATTAATAAnATATGAAATAIlATGTATTACMTTATATATATAAATATATAAAAATAAATTAACGAAGfTCAAATGCAC 63850 _> <--+ <TTATAT (CAAGTT +
AATATGTAATATTTACAGATrGGTCTATCi:CIlAAATAArTATCTACACTAMMMAATAAATTArrrrTTATATTACrTAATTTAmMGT II fill jill i CMTcTATTTIMATA 63970 <-- ~.. t----""- -) <- ~-..:..-....t- +~
GTTTTTGTcMAATMTA!ATATATMTMTATATATTMTATTATATTAAGTATAArrTrCATTMTIlAMAMATTATATATATATATCTTTATATATAAMMACcATrrrrATTTG 64090 TTGTCA, TMTAT> +----- - - -- ----
<-- --- -+
AAMrTTTATATMTMmGGTATTTAfAcTTMTTrrnTATTGTTATTAGGAmrTIATGCTTAcAATMTTAGTTATTTTCTTTTmMTCGGTaCmMCAnAGCGTTAGT 64210 <-- -- t AGIlA II L Til 5 Y F L r L I GAL T L A L V
ORF3!>
TTTATTTAriGGGITAAATMAATACMCTrAmAAAAAATMmAAAAAAGGITAmMAmCAnGTATTTCTCw.CTTTTrTrGAGAITCATAGTAAACTACMTACTMCT 64330 L fiG L H ~ I Q L I ~ +---,' <--,--+ +--, <-+ +-- --
MATTAGTTArrATTrCAGTrMT~TGGnGMGi:nrGTTGTCTGGMnGni-rAGGCTTMnCCTATMCmACnGGAnAmGTAACTGCGTATCTC 64450 -> <-- --+ ORF37> H V E A L ,L S G I V l G LIP I T L L G L F V T A Y L
CAATATCGACGTGGTIlATcAAITAGATCrTIMTTGAAMGTCMmTTGTmTAAGTCCTCCCmATAGGIlAGGITmAmTAnMAAAMAATTCACGCTCTGTAGGAmG 64570 Q, Y R R G 0 Q L 0 L - +- ----, <-- ~~ 3'-GUGCIlAGACAUCCUMAC
AACCTACIlACATTAGGTTrTGGAGACCTACGTTCTACCaAACTGMCTAA.v.GCGcnATrACTTATTAATTAGTATTMTATGMTTATATTTATAmATATACATAMTATATATAT 64690 UUGGAUGCUGUAAUCCAAAACCUCUGGAUGCAAGAUGGCUUGACUUIlAUUUUCGCG-5,' +--- - --- ---> <---
<T.-p--(CA
ATMATATATMTAccnAilMGGTAGGcATIlACAGllArTCIlAACCTGCGACATTTTGTACCCAAAAe.o.McGCGCTACCw.CTGCGcTACATCCCTAMcmTTTTATCTATCTGTA 64610 3' -AUCCCUACUGUCCUMGCUUGGACGCUGUAAAACAUGGGUUUUGUUUGCGCIlAUGGUUUIlACGCGAUGUAGGGA-5' <Prn-UGG < T
--- ---- - ---+
TTGTACITmrrrrrrTTeTTTGCCTAcnATTTTACnACTATATATATATATrITnrrCTTilAAAAGATAAAAAGMMAATATATTMAMATrrAmMAACMAAAAnTn 64930 MCAT <ACGIlAT _, <-' --> T
<----TGHTATTMTCTAGTTMCATMTTATGTGTAGTATATACTATATATMTATATATAAAMTGCMATGTTATAAAAAA.v....GGAGTMmMAATGCMGATGTMAAACATATCrT 65050 TGm, CATMT> 1 >( ORF~Zb>' AGIlAG M Q 0 V K T Y L
TCTACTGCACCTGTTrr AGCr ACA ITGTGGTTTGGGnrTI AGCTGGGrrGTT M TTGAA... TT M TCGITrmrCCAGA TeeTTT AGITCTTCCA TTT TIn MCA mAAAGT MAiA 65170 STAPYlATLWfGfLAGLLIEIHRFFfOALVlPfF-
AATGTCAAAGTMATIlACTTrrcATATTAAGTGTTMTTAArnATTAmTMTMTATIrrmMTAsGTGGTATAMCCTMAAmCMATAGAATTATGGCTAA.v.GTAAAIlAT 65290 ----+' .,,133, H A K 5 K 0
~------- -> <--~
ATAAGAGTci.CMTTMITrAllAATGTArTMTTGTGcr~TIlAT~GG{lTAmCTAGATATACrACCCAAAAo\MTCGTCGAA...TACACCMTrCIlATTccAA 65410 I R Y T 1 H l E C I II C A Q II 0 E K R K K G ( 5 R Y T T Q K 11 R R II T P 1 R L E -. IT AAMAAA r TTrG TrGTTA TTGT M T MAw ACTA TTi:ACAAAGMA T MAAAM T AAAAA m MAGcm AT MAA TTT AGTT A IGMCMA TCT iIAAAGA rcrrcTCGTAGGCGT 65530 L K K FCC Y C II KilT .I H K ElK K - .".18> II H K S K ,R S S R R R
ATGCCACCcA TT AGA TCAGGAIlAAA T AA riGA IT AT AAAM TAl AAGTTj ACTTCGTCM man AilTGAGCMGGAA.i.M TA ITA TeT AIlACGGA TaM T AIlA TTGACTTCAAAGcM 55650 II P P IRS G r 1 J D Y K /I I S l l R R F V S £ Q G K I L 5 R R Il II R L T S K Q
CAACGTTTATrMCTATAGCMTTAAACaAcCTCGTGrrTIAGcmGTTACCTTmTAMTMCIlAAAATTAATTTATCATTAmAnAATATACAGTTTTTTTAriAAACCTCCCC '65770 Q R l L T I A I K R A R V L ALL P F l H HEn' ._ _
GGMmAITTTmMTT~TCCGIlAGAGGmnnrAnc-rGTMTMTAnmAATrATTGTcGA.i.MAcwAmATcTAATATAGCTATTTcAaCTAGAAnTrrCTAmM, 55890 -, <' 1 ~ E Til II K liT 5 f C f K 0 l I A I Q A l ( K R H L
<",'20
Figure 4C and 4D.
-30'-
E ACCTGCCCAACCAGAAACTAMGCAGTATGCATTAMTcMCAGCGATTAAGCGACCTGGATCAmMcACAACTGTATGAACACGATACCAAGGTAMCCCATAAAAATACCccrnC 69011
G II \I G S V L A T /I H l /I V A [ L R GPO 'fI L V v T H V R Y \I P L 'C H <p;bB GGA
TCAAAGAGAA n AGACGCTA TGT AACTTTTTTGCA m AAM mATT AA IT AM TAGTTWCCCTmITACTCA TCCWAGGCAAcA T MCMMCT M TAT A mm ACCAA TA 6B891 <TMm <TGAGTA +--. <--+
+-. <--+
MCGTMGCACAAACACmACCATTTCTATCmAGGATMCAATGGAGAGATTGGTCCCATITmArTnACTTCAAnmAmATrCTATCTAGACACTAGACAMTAMTAM 66711 +- ~~-.' <----.. I -> (- ----
+--
iAAAAmn~TAmATMMnGTATTCTAmTATAGi.AAAACTATAi:ATMTMAT~TMATAGMmCAmi-rACGTnTTi:T!',m.TAGMGAcTAmfGmGTGGM 66651 TTGTAT> TATMT. GGA
~TGCCTATTGGTGTTCCGAAi.CTTCCTTrTCGTCTCCCAGGAGAAGAAGATGCTGHTGGi.TTGACGTATMTGCGCCTTATTCAATAnTrAGTTATATGcMAGAATC 68531 ORfZ03> H PIG V P K V P F R LPG E E 0 II V II [ 0 V y' gu9Y9 .............. (tntron) ••••••••••••
CGTCAITmGCAGACrAMCTCTTTrTTATrCACTTAMmCAAAMTATATCAAATTrTTAMCCGTGAATTTATATIAMAAAATTCATTATMAATrCTATGGTTMTTMAATA 6641 I ........................................................ (lntron) ........................................................ .
AATAMGTATTMAACTTCrITCAATTCrrTCATAATMcTMATMACMTAmMAATrTTATMArTCAAAMTTAITTCmATATGfACAAATAATACTCAGAGMAmnrA 6B291 ........................................................ (lntron) ....................................................... .
iGAAGTAGAACATAAACCfMCGATTTnTATTCMAACTATmATAMTMGAAATATTTATTGTTTMGAAAMAATATATATATcAATAAATAAAMATMTGTTcAATTAGCAM 68111 ........................................................ (lntron) ....................................................... ..
GTAGCAMATIGMCTACAATrfCTAAAAAiw.GCTMTTTTTACAATAACTTMGCTGTATGCCCTTAMAAGTGCTTOTACACTTTTATAAGAMAAAATMTMAArTATCTTAATC 68051 ............................ 0-.0- •• I .. I ................ 0- ...... ragce-9-bug~.)-----c;JaOla-UUC8.U91J-c99wuy ••••••••• ~ ~ •• o. ~ .......... CU{Jyy-y-ay
AATCGACTTTATCGTGAMcATTACTmrITAGCCCAACAAGTAGATGACGAAATAGCAMTCMCTTAnGGTATTATGATGTACCTTAATGGAGAAcATGAAAGTAMGATATGTAC 61931 II R l Y R E R L l FLO Q Q V ODE 1 A IIQ L [ G I H H Y L II G E U E S K 0 H Y
TrATATATWTTCTCCTGGTGGTGCfGTTITAGCTGGMTnCTGTnAfGATGCGATCWmGTTGTACCTGATGTICATACAATTTGTATGGGATIAGCTGCTTcAATGGGCTCT 61811 L Y IllS P G G A V lAG I S V Y D A H Q F V V P 0 V tl TIC H G L A ASH G S
nTATTTTMCAGGAGGAGAMTTACTAAACGTATAGCACTACefCACGCTrfCTGCCAATGAfHTTTTATGTCTGCACWAMAGGTAAAAATMCATCACATATATATTTTTATAC 67691 f J l T G· G E [ T K RIA L P tj A 9u9Y9 ................... (lot"'n) ................................... ..
AAMAT~TATAMTTArAnri:rmAAGTTTATTCTAGCGITATAAACTMTAATTAAAMTMAnmMTMTTMAi..McTrTGcMTTGCTCMTIMTATTTTC 67511 ................. ; ...................................... (lnlron) ....................................................... .
ITTAmAGcAATAGAGCTATCATAATAAAAMAGATACrTnMATTATATMTf.mATMATTTTTAMTATAAAAMAMTATATATATATATAArTAAAATAGAGCrCTAfGCAA 67451 ........................................................ (1ntron) ........................................... r.gcc9~.u9 •• -. . .. .. . .. . . .. .. .. .. CTMMATGCATGTACACTTCGmCAmATTTTTTTAATAAAAAAAATMAAAATTGMTAmATTMTAGGGTTATGATTCATCAACCTGCTAGTTCnATTATGATGGACAAGCT 67331 '"') ••• -uuC.u9U-t:99uuY ............................. , ........... cu.n-y-.yR ~ H 1 \I Q ~ ASS Y YOu \) A
GcAGAATGTATTATCGMGci.GAAGAAGnTrGAAACTTCGTGATTGTATTAeTMAGTTTATGTACAAAilMCTGGTAMCCTTTATGGGTMTTTCTGAAGATATGGAAAGAGATCrT 61211 GEe I H E A' E E V L K L ROC I T K y. Y Y Q R T G K P l W V I SED HER 0 V
mATGTCAGCAAAAGAAG~CTTTATGGTATTGTAcACTTAGTTGCTATAGAAAAcAATTCTACTAnAAAMTTAGITmMAcAAAMAAmTArnGTTATGCTTAGGm 61091 F H SA K E A K l Y C I V 0 t V A I Ell 11 S T 1 K II ~ +__---. ATCCAMCTMAAAATTTTCCATATAAGTTACAATcecTAi:TATTCMcAATTAATTMMATAAAAGAcAACCCATccMMTAGAAcMAATCACCAGCCCTTAMccATGC"TCM 66971 <-- ---~ rpsl2A> H P T I Q.Q L [ R II K R Q PIE 11 R T K SPA L K C C P Q
CCTAGAGGAGTATCTACTAGA.GTGTATGTGCGACTTGTTTAAATCAMAACGTTMMAmAAAGATCAMATTGCATAAMATTTTTTTATTTTMTAACCTMAGATATAGTATCTA 66B51 R R G V C T R V Y qu9l'11 ............... (Intran 7) +--. <--+ +-.-. TTGTTGmAGATACAATTTATAGmCCrTrGGTGCAMTCCAATCATCnAAGmAGGATAGMMCCAmCTCAAAGGGTAGCGACrGATTCTcMTCCCTTAAGCGAGMAm 66731
<--> +--- -- -- -- - ->
TATTMAAAATTTTTGCATMTATAATArTACHATATMCCGTMMACGAAACTGMCGGTCAGCTATICAGCGMCCnCAAATMcATGCCGTTAAnMTAAAAAAAACAAmT 66Gll -->
TGAMCTmi:irAGTGmCATTMTAAAAMAGCTTCAAATCAGAMTIATACAMTMCTGATATTATCMTATATAITATATATTACAAGCTTCceTATATAGAMCGACCTATTC 66491 --> +--.<-->
+---- <-- --- ----+
GCAM TOC TmGGT A TTTTITTTTT ATAci. T AAGAAAACGAAGAAA mm AT AGCAA rCT MGMAA TAAAA T AAAACmTrT A TT AT AAAAA TTGT AGA f!ATAGr AAGCAMCT 66251 ~-> <--> I- --> <--->
GCAATMAAAAATAmATTGMAATCGATGTTTTGATATAAAAAMTAci.CACACACAAAnmGAATAATTMMCcAGTATATACAGCAATGACTAGAGTTAMCCTGGTTATGTA 66131 +--> <--+ rplll). H T R V K R G Y V
GCACGMMCGcCGTMAAATATTCTTACGCnACATCTcGAmCMGcAACTCATTCcMACTTTTTAGAACTGCTAATCAACAAGGAATGAGAGCArTAGCATCATcTCATCGCGAT 66011 ARK R !I K II I L T L- T S 0 F Q G T II sx L F R r A tl Q Q G H R A LAS S H R. 0
AGAGGTAAAC~TCTTAGACGriTATGGATTAC:TCGAGTTMTGCAGCCGcMGAGATAATGGAAmCCTATAATMATTMTTGAATAmATATMAMAAMATrCTT 65891 R C. K R K Rill R R l ~ I T R V II A A A ROIl CIS Y '11 K LIE Y l Y KKK I L
TrMATAGAA.W.TTCTAGCTCA.AATAGCTATATTAGATAMTTTTGrni:rCGACAATAATTAAAMTAnATTAC~TAAMMAACCTCrCeGGAi-MTTMMAAATMATTce 65771 l II R K I l A Q [ A I l 0 K f C F S T I I K II I I T E -.
Figure 4E.
-31-
G.GGGAGGmMT~CTGrATAnAAT..w.TAA1GATAAATTMITnCGTTAnj-AAAAAAGGTMcAAAGCTAAAACACGAGcTCGmMTTGCTATAGTTMTAAACGTTG 65651 -.---t . _tIE'NIlLFPLLALVRARKIAITLLRQ
<'Ps1B
F . . . . .. . . . . . . . GATTCTTTCCATATAACTAAAATATTGAATAAGGCGCATTATACGTCMTCCAAACAGCATCTTCTTCTCCTGGGAGACGAAAAGGAACTTTCGGAACACCAATAGGCATTTrrTTTTTC 68650· ••••••••••• (1nlron) ••••• : ......... 9Y9u9 Y V 0 I II V A DEE G P L R F P V K P V G I P H <ORFZ03
TTCCACAAACAAAATACTCfrCTATMTAAAAAAACGTAMMTGMATTCTATTTATTTATTTATTATATATAGTTTTTCTATMMIAGAATACAATmATAAATATAAAAATTrrr 66770 AGG ., ',TAATAT . <TATGTT
+- --- ~- -~- --- ---> nTATrrATTTGTCTAGTGTCTAGATAaMTAMTAAAMTTGAAGTMMTAAAAAATilGGACCAATCrCTCCATTGrTATCCTAAAcATAGMATGCTAAACTGmGTGCTTACGrT 68890 __ • __ _)I . (-- '. . +----> ,-~- ---r ---fo.
TATTGGTAAMAATATATTAuTmcnArGTTGccmGGGATGAGTAAAAMGGGTTG.AACTAmMTTMTMArTnMATGcMMMcTTAcATAGCGTCTMTTmmGA 69010 +---> ,-. ATGAGT> TTTAAT.
+--> '-·-r GAAAClGGGT A nTITATGcGmACCTTGGTATCGTGTrU. TACAGTTG rGn AM TGA TCCAGGTCGCTTAA TCGCTGTrCA m AA TGCA TAC recTTT AGTTTCTGGTTGGGCAGGT 69130
AGG p$bBl- H G L P II r R Y liT V V L 110 P GR L I A V H L H II TAL V S G WAG
TCTATGGCrTrATATGAATIAGCTGnTITGATCCTTCTGATCCAGTTCTrGATCCMTinGGAGACAAGGCATGmGTTATACCmTATGACTCGTTrAGGAATMCGAAATCCTGG 69250 S II A L .y E LAY FOP S D P V LOP H U R Q G II F V I P F H T R L G 1 T K S \I
GGGGGTTGIlAGTATTACAcGAGAAACTGTTACTAACGCAGGTATCTGGAGTTATGMGcAuTAGCTGCAGTACATATTGITTTATCAGGATTACTTTrrrTGGCAGCTAmGGCATTGG 69370 G G II SIT G ( T V T II A G I W S Y E G V A A V fl I V L S G L L F L A A 1 \I H \I
GTGTATTGGGAmAGAAcTGTITCGTGATGAAcGTACAilGTAAGccncmAGAmAccTAAAATrTnGGMTTcATTTGTTTCTTTCTGMGTAcmcnTIGcnTIGGAGcA 69490 V Y \I 0 L ELF ROE, R T G K P S l D L P KIF G I tI L ,F L S G V L C f A F G A
TTTCATGTMCTGGmATTrGGTCCTGcMTATGGArrTCTGATCCTTATGGATTAACAGGAAAAGTACAACCTGTAGCrCCTGCTTGGGGTGCTGAAGGTTTTGATCCrrTTGTACCT 69610 FHVTGLFGPGIIIISDPYGLTGKYOPVAPAIIGAEGFOPFYP
GGAGGAATTGCTTCTCATcATATTGCTGcAl;GTATTTTAGGMTATTAGCTGGmGmCATCTTAGTGTTCGTCCTCCTCAAAGATTATATAAAGGATrACGTATGGGAAATGTTcM 69730 G G I A Sil 11 1 A A GIL GIL A G L Fil L S V R P P Q R L Y K G L R H G II Y E
ACAGTTTTATCCAGTAGTATrGCAGCTGTrTTTTTlGCTGCTTTTGTTGTrCCGGGAAcTATGTGGTACGGTTCTGCAGCAACTCCAArTGMTTAmGGTCCTACTCGTTACCMTGG 69853 TV L S S,S 1 A A V F F A A F V V A G T H \I Y GSA AT PIE L f G P TRY Q II
.. .. .. • • .. • • .. r • •
GATCAAGGATTTTTTCAGCMGAAATAGATCGAAGAATTCGCTCTAGTMAGCAGAAAAmMGTTTATCAGAAGCTTCCTCTAAAATTCCTGAAAMTTAGCTTlTTATGATTATATT 69970 o Q G F F 0 Q E lOR R IRS S K A E II L S L S ( A II SKI P E K L A F Y 0 Y I
.. .. . .. .. .. .. .. .. .. .. .. GGTAATAATCCTGCTMAGGAGGATTATTTAGAGCTGGAGCGATGGATAATGGAGATGGTATAGCAGTTGGTTGGTTAGGCCATGCAGTTTTTAAAGATAMGMGGAAATGAGCTTTTC 70090 GIl H P A K G G L F RAG A HOI/ G 0 G,I A Y G \I L GilA Y F K D KEG II ELF
.. .. . . .. .. .. .. .. .. .. .. GTTCGTCGTATGCCTACTTTTTTTGAMCTTTTCCAGTTGTmGGTAGATGAACMGGMTTGTTAGAGCTGATGTTCCAmAGAAGAGCAGAATCGAAGTATAGTGTTGAACMGTA 70210 VRRMPTffETFPVVLVDEQGIYRAOVPFRRAESKYSYEQV
GGTGTAACTGTTGAATTTTATcGTGGTaMCTTGATCCGGTTAGTTTTAGTGACCCTGCMCIIGTCAMAAATATGCTAGACGCGCTcAATTAGGTGAAAmTTGAATTTGATCGTGcr 70330 GYTVEFYGGELOGVSFSOPATVKKYARRAQLGEIFEFORA
ACTTTAAAA TCGGATGGTGTTTTTCGMGTAGTCCAAGAOOTTGGmACnTICCTCATGCTACATTTGCTCTTCrrfrTTTCTTTCCTCIITAmGG!:ATGGTGCTAGMCATTGTTT 70450 T L K 5 0 G V FRS 5 P R. G II F T F GilA T F ALL F F F G II I W H GAR T L f
AGAGATGTTTTTGCAGGMITGATCCTGATnAGATGCTWGTGGAAmGGAGCGmCAGAAATTAGGAGATCCMCAACMAAAGACAAGTAATATMMTATATTTTATATCm 70570 R 0 V FAG I 0 PO LOA Q V E F G A F Q K L GOP T T K R Q V 1 ~
+-- -- - ----
TAAATAMTAAMTTTTTAGTACAGnTImw.cTAAAMTATTATTTMTTAGTACGAMGTTATcTGCAMTTAmAccTMTATACAMTATATGGMGCIITTAGmATACAT 70590 ---- -----,; <--- ._- - - - - ~ ORF35> H E A L ·V Y T F
nTIGTTGGTAGGTAcmAcGAATCATTrmTTGCTATrmmWGAACCACCTMAGTACCAAGTAMGGAAim..o.TMMCGTTMTATTcAATTAGTAArTrMTAnAM 70Bl0 L L V C T L G 1 [ F F A 1 F F REP P K V P S K G K, K ~ t----- -> ,---~
TTACTAATTTTGGACTAAlGACTTTTTAGTffAAAAAGT!:ATTAGTCcMMTTAGTCrTCATGTTCTTUMTGGATcTCTMGTTCAITAGAAGGTTGTCCAMTGCGGTATAMGAG 70930 --------->, I H F F K WIS· K FIR R l S K C G I K S -----r ORFZ7.
CATMCCAGTMAGCTTATMGTAAACAAGATATGAAGATAGCCACMAAGTTGCAGnTCCATTGTTAOOTAGTTCCAMATAATGGTATATTmMTGTATAnTITrAATATMTA 71050 ITS KAY Ii: .... ~~ t(-. -t .+--:-- <_~
GTACAAAAAimAATMATCTCMCTAATCTGATMGrrTTATGGCTAcACAMTMTTGATGACACTCCrAAAACMAAGGMAMAAAGTGGTATAGGTGATATATTAAAACCATTM 71170 0I!f74. HAT Q 1 I DOT P K T K G K K S GIG D I L. K P L II
A1TCAGAGTATGGAAAAGTGGCTCCTGGTTGGGGAACrACrccrcnATGGGTATTATGi.TGGCTCnrTrGCAGTTT ITrTAGTTGTTATTTTAGMCrTTATMTTCCTCTGmTGT 71293 S E Y .G K V A P G II G T T, P L, H C 1 H HAL F A V FLY V I L E L T II S S Y L L
TAf;ATGGAGTTTCIIGtrAGnGGTAATAMTMAAATT~TGMTTGciGCTTjnTIAGCAGCAATTCATTrffirAAnTIAGGTAGmMTTGTGTMTTATTAAAnc~ 71410 o G V S V SW -- +----,--. ---. > (-- I +-
Figure 4E and 4F.
-32-
AGGATTTnWTArGGGTiiTGCGTCTTGTGTAMTAMTCTATATTTATAAiACAAAATAACTTGTTACrGATATATTAAATATT~TTrniTnGTTAMTGTTTACAAAT 71530 AGGA p<>tB> H G 9"9Y9 ••••••••••••••••••••••••••••••• (lntron) ••••••••••••••••••••••••••••••• ~ ••••••••••••••••••••••••• -)0<---+
TTGTTAGCAmAMCCAcA.w.AATGAAAA.w:TTAMCTATGATTAATrTTTATAMmATTAGTTATACTTCGTTATCAATATAAi.AAAATTMrTATATGCATTMTCAAATGTA 71650 ....................................................... ( Intron) •••••••••••••••••••••••••••••••••••• , •••••••••••••••••••••
TGAAAATGTnATAAAATATAAAAAATGATMAAAAAGATrTTCACTCAnCTATCmrTTTfAGTCATCGGAGTTTMTAAAAATCTACCCTTTMTACTAATTATTAAGATTTAMC 71770 ........................................................ (Intron) ........................................ ; •••••••••••••••••
AAGAAAA T AMAAAAAA T AMAAGA TTCCTCAAAAAAAAACA TATATAT AAACTTGAGA T AAAAACAAAM TAT AM rrTrrncm MGCTCT MeAn AT AM TM TCA TTT ACCCT 71890 ... .o .............. of' I .............................................................. (intron). 0- ..................................... ... bgcclJ-au911i:1-----g~aoCll--uucIl1l9u--cg9u
TTTTCGACGGCGAACTrrAilTAAccTATCTCAATAAAGTAiACGATTGGmGAAGAGcGTCTTGAGArTCAAGCGATTGCAGATGATATAACMGTAAATATGTTCCTCCACATGTTM 72010 "y ••.••••••.•••••••••••• C".yy-y-.y~ V Y 0 \I fEE R LEI Q A I A 0 0 ITS K Y V P P II V II
TATrTTTTATTGTTTAGGAGGrATTACrrTMmGmTrrAGTTCAAGTAGCTACTcGCTTTeCTATGACTTTnArTATCGTCCTACrGTMCTcMCCITTTTCATCTCTTCAArA 72130 I f Y C l G G I T LTC f L V Q V A T G f A II T FY Y R pry TEA F S S V Q Y
CATTATGACTGAAGTAMrTrrGGATGGcTTATTCGCTCAGTTCATCGcTGGTCAGCAAGTATGATGGrTTTMTGATcATTTTACATAmTTCGTGrTrATCTMCAGGAccTTTTAA 72250 1 H T E V II F G 1/ L 1 R S V II R \I S ASH II V L II II I L II 1 f R V Y LTG C f K
AAAACCTCcGGM TT MCrTGGGTT ACTGGTGTT A TTTT AGCAGTTTTAACTGTATcrrirCGTGTT AcAccTTA TTCm ACCTTeccA TCAM TTCGn A nGGGcAilrr AAAA TTGT 72310 K P R £ L T 1/ V T G V 1 .L A V LTV S F G V T G Y S L P 1/ 0 Q I G Y 1/ A V K 1 Y
MCTGGrG r ACCAGMGcM TTCCAA T M TrGGA TCTCcTn AGTTGAGTr A TTACGccilAAcTGT MGTGTTGGTCAA TCGACA TT MCrCGA TTTT AT AGTTTACA TACTTTTGTATT 72490 T G V PEA I PI 1 G S P L VEL L R G S V S V G QS T L T R f Y S L" T F V L
GccrCTTTTAACTGCAATATrrATGTTMTGCACTTTTTMTGATTCGTMACAAGGTAmCAGGTCcGTTATAMTTACGTAMTTTATTACAAAATAAAMAGTTTAAATACTMrT 72610 P L L T A I f H L H II F L H 1 R K Q GIS G P L - t-->
+---> <---. TCA meCA i A TTTT ATGcA TTTTTTTTTTCTA mGAAi.cTTcTTTTT AGAGAAA TGeT AAAAAAAmTTTTTMT AGATATTTTATi.AAccAAAAT,w,n ATGGGi.GTGTGTGACi: 7Z 730
<_ +-> <----;- AGGA petiJ> H G V 9"9YG •••
TTAmAATAATMTTTGAGTTATAeAGAMTTATTTAATATCTGTTAcATAAAATTTAATMGATTAT[iTATTTTTATCCCAATTArrTrrnrAGTAAAAACTTGGGTrATMTGrrT 72850 ••••••••••••••••••••••••••••••••••••••••••••••••••••••• (lntron) ......................................................... .
TTCATTCT AAM rrrrrni-cTA TGA TCA TrrrrGAA T AilTAAAGACTTCGTT AM TeeM TAM TT 1\ rTrCGAA TA rrTCAAAA TilT AT AAGAAAGA TAGTATTAAAAA TACA TTCA IT 72970 ••••••••••••••••••••••••••••••••••••••••••••••••••••••• (lntran) •••••••••••••••••••••••••••• : ••••••••••••••••• , ••••••••••
TCTGTTGTGi I iii 1111 i ; TTACTATCGGccTAAAAAAAAGATCTMTiw..w..w.AACAAAATTArTAATAAGTTmmTTTTATAAAAAAAATAAAGACAATTCMMMATcA ·73090 ••••••••••••••••••••••••••••••••••••••••••• ; ••••••••••• (lnlron) •••••••••••••••••••••••• , ••••••••••••••••••••• , ••••••••••
AAAATTAMCrrGMTTATGAACATAAAGTrrnTTTGATrMAAAArrTCATTAATGrTGGACCCGGATGATATTAAAnATCATGTCCGATTCmGGGGQGACrrrTrrMTCTAC!: 73Z10 ••••• ' ........ I •••• ~ .......... I ................. I .. (intron) ......... 0 ....... r-.l!i9CC9 ... 4ugal!l..........gai~ul-uuclugu-cgguuy ••••• I .......... ~ •• I •• CU.lI.Y)'
TTMTMCAi.AAAAACCTcATTTAAGTGATCCTATATTACGAGCTAMrTAGCAAAAGGTATGGGACATAATTATTATOOTGAGeCTGCTrCCCCAMCGATCTTTrATATATTmCc.i. 73330 -1-'1 T K K POL SOP I L R A K l II K G H G II II Y Y G EPA \I P II D L t Y I F P
GTAGTTATrTTACGTACTATrGeCTGTACTGTTccm...cCTGTTTTAGMCmCAATGATTGGTGAACCTGCAAATCCTTTTGCAAeTCCTTTAGAAATTTTACCAGAATGGTArTrT 73450 VVILGTIACTVGLAVLEPSIIIGEPAHPFATPLEILPEI/YF
mCCAGTTmCAMTACTrCGTACGGTAcCTMTAMCnTrAGGTCTACTTTTMTGGCTGCTGTACCTGCAGGArTATTMCAGrTCCTTTTTTAGAMATGTTAATAAATTTCAG 73570 fPYfQILRTVPIIKLLGVLLMAAVPAGLLTVPfLEHVNKFQ
MTceTTTTimcGTcCAGTAGCTACTAcAcTIITTmAATAGGTACTGTCGTAGCTcrTrGGTTIIGGAATTGGAeCTGCrrrAccTArTGATAAATCmGACTTTAGGTTTGTTTTAA 73690 liP f R R P V A T T V f L Ie T V V A L \I L GIG A ALP lOX S L T L G L F ~
+-
MTATATArTTTTTTTATAAAACTAGAAATMGGmGAMTTTTTTACTMAAAGTAAi.AAAmc.W.CCTTAmCTAGTTTTATAAAACGTTTTTCAATCCAATTACCTMAGATA 73810 ->(- II - l Y
<rpoA
G Buill • . . l. . . . . . . • . GTCCGCCACTGGAAACACCACTAGGATCCTTCCCGTIICGACTTGCATGTGITAAGCATGCCGCCAGCGTTCATCCTGAGCCAGGATCAMCTCTCCATCAGATTCATMTTATATTAm 82091 CACCCGGUGACCUU\JGlIGGUGAUCCUAGGMGGGCAUGCUGMCGUAtACAAUUCGUACGGCGGUCGCAAGUAGGACUCGGUCCUAGUlJUGAGACGuACUCU-S' <165 riUIA
Tr ACrr AT AGCTTCCTTTTTC6TAAACAAAGCAGA mcAA... TCGTCrrcCA TCCCAAGAGA TAGA T AACTr .nom A rTrrrCATTtACrrCA TATTAilcnGMGCTCA TiTeT AGi . 81971 . <TCA
ATACCCATACCrACCCTATTATGTCAATCCCACAAGCCTCTn-CMTMcMGMAACMCAAATCAAAATGCTTTAACTATTTTTAGGGATAATtAGGrTCGMCTGATGAcTTCCACC 81851 TAT <ACAGTT 3' -AUCCCUAtlUAG1JCCAAGCUUGACUACUGMGGUGG
ACGTCAAGGTGATACTCTACCGCTGAGTTAjAICCCTATTCr~rTAAAATCTGAi:TTCTAAAAAATATATTArrTATAGATMTATATTTTTTCTATTTTCATT 81731 UGCAGUUCCACUAUGAGAUGGCGACUCAAUAlIAGCGA-5' <V41-<W: <TTATAT <fM
CAAATTmTAMcTrAMTCTCAAAMTTTAAGAAACTTATGCAACCArTMTTATTTcATATMTTATATTACtACcrTACCAATCCAAAw..wJ.TACTATTTAGTAmMTATA 61611 GTT +--> <_ <TMTAT <TAGGTT _, <---+ <-
Figure 4F and 46.
-33-
CrrnGTrTAi:~GACAAAAi.AA TAT AAAriAAAM m AT A TTTT ATM r A.w.w.A TT AG TTGAcciciGAM TG1CTCA TGA T MM TTGTCTTCT AM TAM TACTTA TTmm 81491 ~_> (~ +---:),(--' I-
+-> <-+ +--> <--+
eel All TeTAcGA TCA ~ AMTIOOTTCAA iTmtn ATGCAAACAGrrTiil AAAMAt.GeTACACtn AAtnCAAJ>AAaTT ATATACGTTITTleCrTrm MCA TI AAAA T AGAC 81371 +----> <~ . .....---:-> <-'-, ..... --><-- ~I-
AAnTrTTAcATAmGmTATACCTATAITmAGrrrTnATACGTAGGTAMTAMAA,w.CACTATATTCMMATAGAAAAAAMGACATAAAcTCACAGMCcAAAGCAAAM 81251 j--- ~-- ----> <- -' - - -- --'---+ .....
TmrAAAAMTMAMAnCCACTAAAATi;rATMTCAcACAGATAAGcTCACGCTMci:CGTCAMnITrATGTTAAAMAATACATATA~GAAMATMTTGATTC 61131 <-'-' -I- - - +---> <-I- '
lR <- J LB -. LSC
AAm~TAAMTGnATMACTATMTCAT'cCMITMTATCTATTATIMTAMTMTATA1MAAAGCATCCATGGCTGMTGGTTAMGCACCCMCTCATMTTGGCGMT 61011 HTMG> TATMT> nD-CAU. 5'-GCAUCCAUGGCUGMUGGUUAMGCACCCMCUCAUMUUGGCGMU
TCACAGGTTCAATTeCTGnGGATGCAmAMMMTTcMTTCMGTAATATATCnCTnATATATAAATATMnTTnATATACTTmMCAMGTMAMGTTCTATACAGTG 60891 UCACAGGUUCAAUUCCUGUUGGAUGCA-3" . _. <-----+
TCTAAMMAITTATAMn;\TAGAMTAcATATCAAATTGMmMGG.i.GAMTTATAAAmATGMTCMGTTMGTACCCAGTACITACAGAAA.D.AACMTTCGTi'TATTAGAM 80771 . AGCAG rp123> Ii I) Q V K Y ? V L T E K T I R L l E J:
AMATCAGTATAGmT(J,\TGTCMTATTuATTCAMTAIW.CACAMTMAAMATGGAITGMCmTCmMTGTTAMGTTATMGTGTMATAGTCATCGTCTTCCMMAAM 80551 II Q Y S f 0 V' /I I 0'5 II K T ,Q I K K U I ELF F )) V K V I S V fI S II R L P KKK
~TAcGTACGAcAAcAGGATATACTGTTCGrrATMACGMTcATTATMMriGCMTCTGGTrArri:;ATTi:CATTAnCTi:AMTMATMAAMATTnATTACaTm 60531 K K I G T T TG' ,Y T V R Y K R M I I K L Q S G Y 5 I P L f 511 K .~
ACA TAccrAT AA TT ATATCGCCA TAcimTATATCGAGCTTATACGCcAGGCACGCGT AACCGATCTGTACCT AMmcA TGAM T AGIT AM roTCAGCCACAAAAAAAA TT MCA T A 60411 rp12> H A J R L Y RAY T P G T R I) R S V P K FOE I V K C Q P Q K K L 1 Y
TMTAMc.\TATTAAAAMGGTCGAMCAACAGAGGMTcATMCMGTcAACACCCAGcAGGTGGACAcAMAGACmATCGMAAATAGATTTTCMCGAMTAAAAMTATATMC 60Z91 II K 1I I K'K G' R " " R '0 I ITS Q II R G G G II K R L Y R' KID F Q R I) K Y. Y 1 T
.. .. .. .. .. .. .. .. .. .. .. .. TGGGAAAATTAAAACTATAGAGTATGACCCAMTCGTAIITACATATATTTGTCTAAITAATTATGMGATGGTGAMMCCATATATTTTATATCCACGTGGCATfAMTTAGATGACAC 80)7) G ~ I K T I. E Y O' P II R " T Y I C' L I II YEO G E K R Y . I L Y P R G' IX L DDT
MTTAmCTAGTGAAGAAGCACCTATmMTTGGAMTACeCTAeCrrTGAGTGCGGmGMTTATATAmACGTcGTCGGMATAACCGACTMcAMTMACTTATAMTCTAT 80051 I ISS E E It P [ l I G H T l P L Tg"9Y\h .............. (lntron) ..................................... :.
CACTM TCCAP.GMA TTGGAAAGACCTT MAAcGAMCTMMAGGAJoJo.M TAGGCMGTGAAAAAGGTTm M TAT AT AAM TMAAAAACTTCAAl\!iATA TTA TM Tis. TGGAMA TT 79931 ••••••••••••••••••••••• ~ .................... ,: ••••••••• (Intron) ••••••••••••••••••••••••••••••••••.••••••••••••••••• : ••••
TmAMGCATTAMGTMTATATMMTAGGAMCMTTTTATTCAMcAMTTTATAATMTMAMGTrACmTAITATGATTTGTAGGTCAMGACAMm~ 19B11 •••••••••••••• : .......................................... (Introo) ......................................................... .
GAMcTMTTATGCTTCCTAAGTTATATMAATAmMAGCGTAMTAAATAAAGTCATccmCTGATGCTAAAiGAATATCATAAGCCAGATGATGcAMAMcCMGGACGTAMA 79691 ..................... ,., ........................... · ...... (Intron) .......... , .............................................. .
AACCAAGGACGGTAAAAMCTAMTTTmAAMCGTCTAGAAAAcCTGTA TGCITGAM.W.GCITGTACAGmGGGA.i.GAGAITTTMTATMMM mAAMTCTAtnCMCCA 79511 ..... 111" I ..... I ............... 0- ................. 1 rI:l9Ctg-i::.,Jga~-ga.l!I.a--:--uuc:a.u9U---c99uuy • .............. 11 ................ 0.0 ............... CUOYYY-l!!lY II
ATATGCCA TTAGGTACTGCTA TTCACMTAITGAM T McACCTGGMMGGTGGACAA ITAGT MilAGcAGCCGGAACTGT AGCMMA ITA TTGCAMi.GAAGGACAGTr AGrr ACAe H P L G T AI fl HIE I T P G ~ G G Q L V R A A G T V A K I [ ~ KEG Q L V T L
TACGCTTACCTrCAGGAGMATrAGATTAATCTCTCJ!.4.AAATGmAGcMCMTAGGAci.AATTCGMATGTTGAlGTAMTMlTTMGMTAGGTAAAGCAGGGTCAAAACCTTGGT R L P S· G [ I R l J 5 Q K C L A T I G Q I G II V 0 V II )1 l RIG X A G S K Rill
TAGGTAMCcACCAMAGTTAGAGGAGT AGITATGMTCcTATAGATCACCCTCACGGTGGTGGAGAAGGTAGAGCACCcATTGGTAGAAA.w.ACCATTAACTCCTTGcGGTCATCCiG GKRPKVRGVVHIlPIOIIPlIGGGEGRAP[GRKKPLTPIIGIIPA
CAcTTGGMAi.AGMGTAGAAMMTMTAAATATAGCGATACrCTTATTCTTCGTCGTCGTAAAMTAGCTMGCTAAAAMAGAMGAAAMTAMGGATAGTTGGcMTGACACCTT L G K l! S R K I) II Y. Y SOT l r l R R R Y. " 5 ... rpsl9> AGGA H T R S
i:MTAAAAAAAGGTCCrmGTAGCTGATcAmATT~TAGAAAATCTTMCITMMAAAGMw.MAMTMTMTMCATGaTCTCGAGci.TCTACAATTGTACCTACAA I K K G P F V A 0 II L L K K I E II l )1 L K K E K K I I I T II S R A S T ,1 V P T H
TGATTGGTCATACMTAGCTGTTCATMTGGACMGMCATTTACCMrrTATATMCAGATCGTATGGITGGTCACAMTrAGGAGAAITCGCTCCTACTCGAACTTTTCGAGGACACG IGHTIAVllllnQElJlPIYlTORHVGIlKLGEFAPTRTfRGIiA
CAAA.AAATGATMAAAATCCCGTCGTTMITAGGAGArMITnrMTGci.AAcTAATAcTTCTMlAAAAMATCCGTGCrGTTGCTAAACATATACAT;'~GTCTCCAcATAMGTACG K )/ 0 K K S R R -- AGGAG rp122> H Q T· 'II T SilK K I R A V A K II I II H S P H K V R
MGAGT AGTT AGTCAAA TTCGTCGTCG7TCIT ATGAACMGCACTTATGA r A IT AGAGITT ATGCCGTATCGAGCA TGcM TeCAA fA miCAA TTACTrTCA TCTGCAGCTGCAM TCe R V V S Q r R G RS Y E Q A L Ii I l E r Ii PY R A ell P I L Q L L S 5 A A A II A
TMTCATMITTTGGATTMGTAAAACAMCTTAmATAAGTGAMTTcMGTAMTAAAGGMcnnTnAMAGAmCMCCMcAGCTCAAGGACGTGGCTATCCTATACACM II II II f G L S K T II L F I 5 E J Q V " K G T F F K R F Q P R A Q G R GYP I H K
ACCTACTTGTCATATMCrArrGTACTGAATATTmCCTAAATAAAAMAATTGAAAMi.mGTTMTATMTlT~TATATATATGGGA~TMACCCACTTGG P T CHI T I V L II I l P K """ +---><- __ rps3, H G Q K I II P l G
mT AGACTTGGT A T MCAcAAM TCACCGCrCA TA TTGGTnGCAAACAAMM TA TTci' AMGTTTTTGMGAAGA TMAAMATACGTGACTGTA TTGMTT ATATGTACAAAMCA r R L G IrQ 1/ H R S Y \I F " I) K K Y S K V FEE D K K I ROC I ELY v Q Y. II
Figure 4G (continued).
-34-
79451
79331
)921J
79091
76971
16851
78731
78611
78491
18371
78251
TATAAAAAATTCTTCMATTATeGAeGAArTeCTCGTGTTGMATTAAM!w..w.CAGATnMTTcMilTTGMATATATACAGGATTTCCTGCmATrAGTAGMAGCCGAGGTCIl 18131 I K H S 5 " ~ G G, I A R V E I ~ R K T D l I Q v E I , T C F P ~ l L V E S R G Q
AGGAA TTGMCM IT MAA rTMA TeT ACAAM Til TA TTA TCTTCAGMcA 'r AGMGACTCCGM TGACTTT M TCGMA TTGCCMACCCr IICGGAGAACCMMII ncTTGCMAMA 78011 G I E Q L K LII Y Q 'Il I l S, 5 E 0 R R L R II T LIE I. A K P Y G E P K I' L II K K
AATTGCmAAMTTAGAMGTAGGGTTGCrmAGIICcMCAATGMAAAAGCCATTcMTTAGCAMAi.MGGAMTATAMAGGMTTMMTACMATAGCAGGTAGACTTMTGG 7)891 I A L K L E S R V A F R R T H K K A J E L A K K G /I I K G I K J Q I II G R L II G
AcCTGAAATTGCTCGTGTTcAATGGGCACcAGMGGTAGAGTTCCmACMACMTMGAGCACGMTTAATTATTGCTATTACGCAGCTCAMCMTTfACGGAGTArrllGGMTCAA 77771, A E JAR V E \I ARE G R V P L Q T I R II R I II Y C Y Y II A Q T I Y G V LG 1 K
AGmGGATATTTCMGATci.AGAATMTTAmnrrcMTCMATCIICTTTMTTATr.AAmMCATMAMMMATrGCrATGCTTAGTGTGTGACrCGmAmCAMATGTT 77651 V W I F Q 0 E E ~ rp115> H LS 9ugY9 •••••• (lntron) .......
t---~- <- ------+
ACTTAAAAAAi:AMATTGMACTCTAGmATACTAGMMTMTTTAToArmATATTAGMMTArN...r.ACACrTTci:~CMATTrrcTTGTGAAGCG"""""AMCTMTC 77531' •••••••••••••••••••••••••••••••••••••••••••••.••••••••• (I.teon) ...................... .' ................................. .
CATMMATTGTAGGGmTTGTTATAGTATrMAACGcMMAMTAAGAGCTTTATTTTMTAAMACTMGAMATTAAMilGMMi.MAAGcmATTATAGAMi.MAAcCIM 77411 ....................................................... (IAtron) ••••••••••••••••• , ...................................... .
ACAMATGTATMMTCATAAAMCGMGcAATCTATAMTMTAAMACTrrmGTATfTTTAmATCAGATAGGATGGCGAAAAAAACCMAMTAAATTTGMATAACTTAMAT 71291 ....................................................... (Intron) ........................................................ .
i.cMAmMAmATTACAAATMMAATTATTMAIiAGTAMTATTCGCCCGTGGArrTrTTTATTTTATATAMTTTATTCATGAccAGCCGGATcMTCAMATTTCATGTCCGGT 77111 ........................................................ 0- ............................... (1nLron) ......... 0- ................... • rllgC:~9-4~9"4--g44a-lJu'c4U9U-C~H)U
TnGMGTAGCGATCAMTCGACTATMCCCTMMGMcAAMmCGTMACMCATTGTGGAMmAMAGGMTATCTACTCGAGGTMTGTTATATGTTTTGGcMAmcCGC 71051 "y ................. cuoyy-y-.yP K R T K F R K Q II C GilL K GIS T R G /I V I C F G r. F P L
TTcAAGCACTCGAGCCCTCTTGGATMCArCrCGACAMTAGMGCAGGTCGCAGAGCTATMCTCGCTACCCTCGTCGAGGTGGTAMrTATGGATTCGTATAmCCTGATMACCM 76931 QALEPSWITSRQIEAGRRAITRYARRGGKtWIRIFPDKPJ
TrACTATTCcACCTGCAGMACACGMTGGGATCGGGTMAGGATCTCCAGMTATTGGGTAGCTGTAGrTMACCTGG...AAMTACmATGAMTTAGTGGCGTATCTGMAATATTG 16811 T I R P A ErR H G 5 G K G 5 P f Y W V A V V K P G K I LYE I S G V SEll I A
CrAGAGCTGCGATGAMATTGCAGCATATAAMTGCCGATACGTACTCMTrrATTACMCATCTAGmAMTMMAACMGAMTATAilMMMTTACTMTTAGITMTTATATA 16691 R A A H K I A A Y K H P I R T Q f J T T S S l U K K Q E I -- +--><---+
AATTTTAMTATTMAATTGGCCCTCCCTAATCCATCCATTrrAGGGGGGGGATT~TGATTCAACcTCAMCTTArTrMATGTTGi:AGATMTAGTGGAGCTCGA 76511 t-~> <-~t l----> <----t rp1H> H I Q P Q T Y L /I V A 0' 11 S GAR .. • .. .. .. • .. • .. 0- .. •
AMCTMTGTGCATTCGAGTTATAGGMCGAGTMTCGAMATATGCAMTATTGGTGATATTATTATTGCTGTTGTTMAGMGCAGTGCCAMTATGcCTATTMMAATCCGAAATT 76451 K l II C I R V I G T S Il R K Y A II I G 0 II! A V V K E A V P 1I HPJ K r. S E I
GTMliAGCTGTMTTGTACGTACGTGTMAiiMTTTAMCGMATMTGGATcCATMTN...r.AmGAToATMTGCAGcAuTTGTTATTMTCMGAAGGMATCCAMAGGMCTCGA 16331 V R A V I V R T C KEf K R II /I G S I I K F 0 0 II A A V V I II Q E G II P K G T R
GTTTTlGGTCCAATTGCTAoAGMTTMGAGMTCTMrrTrACTMAATAGmCGTTAGCTCCAGMGTmATMATAMATAmAhnrATMATAMTMAAGACTTATMM 76211 V F G P I ARE L RES !l F T KJV S L A PE V L - +-----> <-------+ +--
TAmATTTTATATTTTTCAATTMmMGGAGTAmATGGGGMTGATACAATTGCGMTATGATMCCTCMTAAGAMTGCAMrTrAGGGAMATMMACAGrTCMGTACeT 760~1 -> <----t rp,8> AGIiAG H G 1/ 0 T I A II' HIT SIR II A /I L G K I K T V Q V P
GCTACTMTATMCTAGAMTATTGCMMi.TTcmnCMGAAGGnrTATAGATMCTrrATTGATAATAMCAMATACTAMGATATTTTMmTAMTCTAA.IIATATcAAGGG 75971 A T I( I T R 1/ J A K 1 l' F Q E G F J 0 /I F J 0 U K Q II TKO I L I L U t K" Y Q G
~TCTTATATAACAAcmAi.GACGMTTAGTMACCAGG~TTMGMTATATTCTMTcATMAGMArTCCMAAGrrTrAGGTGGMTGGGMTTGTMrremec 15851 KKKKSYI TTtRR I SKPGLR IYSIIIIKEJ PKVlGGHGI V IlS
ACGTCTCGAGGMTTATGAcAGATCGAGMGCTCGACMAAAMilATTGGGGGCGMCrTTrATGTTATGTATGGTAAmmATMAAMTmA~TAGTTACTTACTATC 15731 TS R G I H TOR EAR Q K K J G GEL L C Y V W -
-+----~<-- ..
GTTTTTATTAATGTTGGnTATTMMAGcAGATTCTTCITTTMTGGAGAMCMMAITMTTGATATGGMGGTGTTGTTATAGAATCAcnCCTMTGCMCAmCGAGTTTATT 75611 AGGAG InfA> H E K Q K L 1 () H E G V V I E S L PI/A T f R V Y l
TAGATMTGGATGTATAGTATrMCACATATATCAGGMAAATCCGACGAAATTATATTCGMTATTACCCGGAGATAGAGTAMAGTccAATTMGTCCTrATGAmAACTAMGGTC 15491 o II G C I V L Till 5 G K I R R II Y 1 R I LPG, 0 R V K VEt Spy 0 l T K C R
GTATMCTTATAGACTTCGTGCAMATCTTCAMTMTTAAAAMTTAMilMAAAMMTrAGAGATTAAATAmATcAAMTCCGCGCTTCTGrrcci.MAAmCTGMAATTCTC 75371 I T Y R L R A K S S II II - GAG .cd> II K I R A S Y R K J CEil C R
GATTMTTCGACGCCGAAGACGMTTATGGTAGTTTGTTCTMTCCAMAc...CMACAMGACMGGTT~GrirAMTMMACACATATMAATATATACATATAGTAMT 75251 l I R R R R R I H V V C S 1/ P K 11 K Q R Q G ~ - rpo,l1>
TATGCCAAMTCTGTAAAMAMTTMfTTACGTMAGcMMCGTAGCITACCTAMccAGTTATTCATATTCMGCCAGCTTTMTMTACMTTGTAACTGTTACAGATATTAGAGG 75131 H P K S V K K I II L R K G K R R L P KG V I II 1 Q A S'F II I(T I V T V TO 1 R G
GCMGTCCmCATGGTcTTCrGCTGGTGCTrGCGGAmAMGGTACMAMAAAGTACCCCAmGCcGCTCMACCGCrGCAGAMATGCTATTCGGATATTMTTGATCMGGTAT ,75011 QVVSWSSAGACGFKGTKKSTPFA"QTA"EIIAJRllIDQGH
iiAMc.w;CGGMcTTATGATrAGTGGTCcAGGACCAGGcAGAGATACGGc...TTACGAGcMTTCGTCccAGTGGTATMTACTTAGmrGTACGTGACGTMCTCCCATGCCTCATM 7~691 K Q A E V H J S G P G P G ROT A L R A, I R R S G I I l S f V R D V T P HP H II
Figure 4G (continued).
-35-
TGGATGTAGAi:CACCTAllAAAAAGACGTGTATMATAAAAAAAACTAm~TATTAATArGAnCMGATGAMTAAAAGmCTAcTCMAi:Am.CAGTGGAAGTGTATT 14111 G C R P P R K R R V ~ . ~ H I Q 0 ElK V S T Q r. L Q II. K C 1
GAATCTAAAATAGAAAGTAMcGTCnCrrrATAGTCGTTTCGCTAmcACCmTAGAAAAGGTCMGCCAATACAGnGGMTAGCTATGCGTAGAGCGTTACTTAATGMATTGAA 74651 E 5 K 1 E S K R L L Y 5 R f A I S p f R K G Q A II T V G 1 A H R R ALL II £1 E
GGAGCTICTAnACATACGCTMMTAAA.MMt;TAAMcATGAATATTeMCAATMTAGcmACMGAATCTATTCATGATATATTAATTMmAAMGAMTTGrTnAAMAGT 74531 GAS! T VA K 1 I: K V K II E Y S T I I G L Q E 51 II D III II L K E I V L I: S
GAATCrnTcMCCTCMMAGI;ATATATTiCAGm·TAGGACCTMAMAATAACTGCTCAAGATATTAAAGGGCCTTcTTGTATTAAcATTATGATAATAGCCCAATATATAGCAACT 14411 £ 5 FE P Q. X A Y I 5 V l G P X K I r A Q D J K G P SCI K 1 H I 1 A Q Y I II T
TTAAACAAAcATATmATTAcMATTGAATrMATATTci.MAAcATCGTGGATATCGTATTGAAMCTTACAMAATATCAAGAAGGnrAmcCAGTGGATCCTGTnrfATGCCA 74291 L II K 0 ! L LEI E l II I f K 0 R G V R I E II L Q K Y Q E G L f P V 0 A V F H P
ATACGAAATGOOTTATAGTGTTCATTCrrTTGMAGTuAllAMMAArTAMGAMTACrrTTTCTTuAMTCTGCACTCATGGMGITTGACrCCAAAAGMGCTCrTTATGMGCT 74171 1 !I II A II Y S V· II S F ESE K K IKE I L F LEI ~ TOG S L T P K E A LYE A
TCTCGMAmAATTGATTTATTTATTCCTTrMTTMTTCAGAAAAMAAllAMMAATTnGGMTAGAAAAAACAAATGAATCAMTATGTCTTATTTTCCnTTcMTCTGTATCA 74051 5 R /I LID ~ F I P l 1 /I S E K K E K II F G I E K TilE S Ii H S Y F P F 0 S V S
CTGGATATTci.A.AAMTGACWAGATGtTGCTTTlAAACATATATTTArTGATCAACTAGAATTACCTGCCAGAGCATATMTTGTCTTAAAMAGTAMTGTGCATAcAATAGCAGAT 73931 L 0 I E KilT K 0 V A F K II I FlO Q L E L PAR A Y II elK K V II V II T I A D
TT A TT ACACTATAGTGAAGA TCA m M n MAA n MAM rrnGGAAAAMA TCAGT AGAACAAGnrTGGAAGCA TT AMAAAACG rTrnCAA TCeM TT ACCT AAAAA T AMAA T 13811 l LilY S E 0 0 L II: ! K N F G K K S V E Q V LEA L K K R F S I Q L P K fl r. II
lATCTTTAGGTAAHGGAlTGAAAMCGTTTrATMAACTAGMATMGGnTGMATTTTTTACTTTTTAcTAAAAMrTTCAMCCTTATTTCTAGTTTTATAAMAAMTATATATT 73691 Y L -
Figure 4G (continued).
Figure 4. Nucleotide sequence of the chloroplast DNA. The nucleotide
position numbers are counted from the 5 1-terminal nucleotide next to the
inverted repeat IRA' Dots are put on every ten nucleotide. Amino acid sequences deduced from the nucleotide sequences are shown under the DNA
sequence by one letter symols. Stop codons are shown by double underlines.
Putative stem-loop structures are shown by broken lines with arrow heads.
Predicted promoter sequences and Shine-Dalgarno sequences are shown under the
DNA sequence. Transfer RNA sequences are shown under the DNA sequence.
Introns are shown by dots under the DNA sequence with 51 and 3 1 terminal
consensus sequences (gagyg and ragccg.augaa •• gaaa •• uucaugu.c99uUYi r represents· a or 9. and y re~resents c or u).
-36-
nucleotide sequences of each block are shown in Fig. 4 A-G. The amino acid
sequences of ORFs and the nucleotide sequences of transfer RNAs deduced from the DNA
sequence are shown below the nucleotide sequences. Introns in ORFs are predicted in
one tRNA gene (valine~UAC) and six protein coding sequences for petB, petD, rp12,
rpl16. and rps12 genes and ORF203 by the presence of the 5' consensus sequence
(GUGYGj Y represents C or T). and 3' consensus secondary structures with the common
sequences (RAGCCG.AUGAA •• GAAA •• UUCAUGU.CGGUUYj R represents A or G) characteristic
to group II introns found in fungal mitochondrial genes and Euglena gracilis
chloroplast genes (Michel and Dujon 1983. Keller and Michel 1985). Identified genes
and open reading frames (ORFs), and their loci on the chloroplast genome are
summarized in Table 2. Genes are categorized into three groups: 11-1 transfer RNA
genes; 11-2 genes for photosynthetic polypeptides: 11-3 genes for ribosomal proteins
and subunits of RNA polymerase. In the section 11-4, unidentified open reading
frames are discussed. Detail characterization of these genes are described
following sections.
11-1 Transfer RNA·genes
As previously mentioned. chloroplasts contain genes for their own rRNAs. They
probably also contain genes for all of their tRNAs. They show high homology with
the corresponding genes of I. coli. Genes for numerous tRNAs have been sequenced
and mapped on chloroplast chromosomes (Crouse et ~. 1985).
RESULTS
From the DNA sequence, tRNA genes were predicted as regions that have higher GC
content than spacer regions between ORFs as shown in Fig. 5. Seven tRNA genes were
located by searching for the T-jbloop consensus sequence (GTTCRA) and identified by
constructing the clover-leaf structures as shown in Fig. 6.
-37-
lW ',' ". '. , . atpB . ' .. §i- -- -; '. . - ~ @'-....~~~ ... ,-_f ..... ~ ./, ... ...., "\. "e" ,.,.,.,' • -r' '.-- .'---e=Y.~ ,.r-.,
D 1 III :;ro zoo .:00 H\J &\J ";OJ ;;m :'\:IJ itl.'\J 1'100 1m.)
'!;~~"--'~~. 1~ li~ i~oo It.OO l€W liOO u:ru lK\) a'CO 2100 :?:a'J 2m) 2~OO
'~~~~~~ _.::.'0 "~~ ·.m = -,"00 _.m.(UJ ,I(,J !~(\' "'('0 3.:00 ;~oo !~ro
ORF184 ---
Figure 5. GC content and corresponding g~nes. GC content was plotted by
calculation in average 30 nucleotides. The coding sequences are shown by bold
lines with names of genes.
-38-
Figure 6. Secondary structures of tRNAs deduced from the DNA sequences.
The 3'-terminal eGA nucleotides are not coded by the chloroplast geno~e. The insertion site of an intron in the tRNAValCUAC) is shown by an arrow head.'
~39-
Valine and isoleucine tRNA genes (tnnV-GAC and trnI-e*AU)
A gene for tRNAVal(GAC) was found at position 81814 to 81885. The 5' terminus
of 16S ribosomal RNA gene was mapped at the position 82109 by comparing with maize
and tobacco chloroplast 16S rRNA genes (Schwarz and Kossel 1980, Tohdoh and Sugiura
1982). It was shown that the primary transcript of the 16S rRNA does not include
tRNAVal(GAC) (Strittmatter et ~. 1985). In the liverwort putative promoter
sequences were indiVidually located upstream from 16S rRNA gene and valine tRNA
gene. A tRNA gene (81057-80984) is located on the opposite DNA strand 756 bp apart
from the 5' terminal end of the trnV-GAC gene. The unmodified anticodon is CAU
complementary to the methionine codon AUG. The nucleotide sequence shows 43.2% and
54.1% homologies with the spinach initiator methionine tRNA(CAU) (Calagan et ~.
1980) and elongator methionine tRNA(CAU) (Pirtle et~. 1982), respectively, but
exhibits 93.2% homology. with the spinach chloroplast isoleucine tRNA(C*AU). In
addition, the tRNA gene in liverwort has extra mismatching within the anticodon stem
as seen in the case of spinach isoleucine tRNA(C*AU) (Kashdan MA et~. 1982 and
Francis et~. 1982). Therefore, coding sequence for this tRNA can be tRNA gene
(trnI-C*AU) highly modified in the first nucleotide of the anticodon.
Arginine tRNA gene (trnR-CCG)
A tRNA gene (57877-57950) was found 94 bp downstream from the termination codon
of the rbcL gene. The tRNA has the anticodon of eCG that can recognize CGG arginine
codon. A pair of mismatching nucleotides (U-U) was found in the amino acyl stem
(see Fig. 6B). The liverwort trnR-ACG gene in the IR region also have two
mismatching nucleotides (U-U and U-U) in its amino acyl stem (Kohch; et~. 1986).
Tryptophan and proline tRNA genes (trnW-CCA and trnP-UGG)
Coding sequences for two tRNAs were found at pOSitions 64626-64553 and 64788-
64715 near the psbE gene in Fig. 40. Their secondary structures showed the
anticodon triplets, eCA and UGG, pairing with codons for tryptophan UGG and proline
-40-
CCA, respeC£ively(Fig. 6C and 60). Therefore these tRNAs were identified to be
tryptophan tRNA and proline tRNA. The tRNATrp(CCA) and tRNAPrO(UGG) show 93.4% and
93.4% sequence homologies with spinach chloroplast tRNATrp(CCA) and tRNAPro(UGG)
(Canaday et ~.1981 and Francis et~. 1982), respectively. The genes for these
tRNAs were d~sidnated trnW-CCA and trnP-UGG. Their coding sequ~nces were separated
by 88 bp spacer region. A significant promoter sequence for these genes ;s present
20.bp upstream from th~ 5t end of the proline- tRNA gene, but not in tbe spacer region
between two tRNA genes. Two tRNA genes must be co-transcribed in a primary
transcript and processed into mature tRNA molecules, although 42 bp stem structure
in the spacer region can be formed as described in tRNAArg(eCG).
Elongator methionine and valine tRNA genes (trnM-CAU and trnV-UAC)
In the 8g1Il fragment 8g10, 83 bp apart from atpE coding region, a tRNA.gene
(53801-53874) was found on the opposite strand. The anticodon of the tRNA was AUG.
The tRNA showed sequence homology with spinach elongatormethionine tRNA (94.8%),
and with initiator methionine tRNA (46.8%). Therefore this tRNA gene was confirmed
to be elongator methionine tRNA gene. A tRNA gene (53652-53051), whose anticodon
was UAe was found 148 bp apart from trriM-CAU. This putative valine tRNA gene was
split by 530 bp intron. at the junction between anticodon stem and loop as shown in
Fig. 6F.
DISCUSSION
Thirty two species of tRNA genes have been identified and mapped on the
liverwort chloroplast genome (Ohyama et ll. 1986). Those identified tRNAs are
listed in the codon table as shown in Table 3. The tRNAs encoded by chloroplast
genome are sufficient to read all codons taking into account an exterided wobbling
and modification in the anticodons. No tRNA gene would encode a 3'-terminal eeA
nucleotides. Five species of tRNA genes (trnI-GAU, trnV-UAC, trnA-UGC, trnK-UUU and
trnG-UCC) have been found to be split by group II introns. A tRNA gene (trnL-UAA)
-41-
Table 3. Codon table and unmodified anticodons of tRNAs coded by
Marchantia polymorpha chloroplast genome.
codon anticodon codon anticodon codon anticodon codon anticodon
UUU} U~~ VAU} UGU} Phe Tyr Cys
UUC GM UCC ' GGA UAC GUA UGC GCA
.UM* Ser
UUA} .UCA UGA UM Ter UGA .Ter
. Leu UUG eM UCG UAG Ter UGG Trp CCA
am} em} CAU} eGUj ACG His
CUC CCC (GGG) CAC GUG CGC Leu Pro Arg
CUA UAG CCA UGG CM}
UUG CGA Gin
CUG CCG CAG CGG CCG
AOUl Am} MUl CGU}
GAO* Asn Ser
AVC ·ile ACC GGU Me GUU AGe GCU Thr
UUU* AVA CAV ACA UGU AM }LYS
AG.l\} ueu ( Met CAU Arg
AUG fMet' ACG MG AGG CAU
GOO} GC"} GAV} GGUl . Asp
GUe . GAe Gec. GAC GUC GGC Gee Val
UAC* Ala
UGC* Gly
UCC* GVA . GCA GAA}
vve GGA Glu
GUG GCG GAG GGG
AUG codon is an initiation codon. Termination codons (UAA, UAG and UGA)
are indicated by Ter. Asterisks indicate presence of introns in the
~oding sequences •.
-42-
is interrupted by group I intron. Five species of tRN~ genes (trnV-GAG, trnI-GAU,
trnA-UGC, trnR-AGG, trnN-GUU) have been identified in each inver,ted repeat reg.ion
(see Fig. 2).
Three kinds of arginine tRNA genes have been identified on the chloroplast
genome. Dup 1 i cated genes, trnR-AGGs are 1 oca 11 zed in IR regi ons (Kohchi et a 1.
1986) and 'a trnR-:UGU gene was mapped near the 3' side of the atpA gene coding for
o(subunit ~f H+-ATP synthase (Umesono et £1.. 1986, see Fig. 2). In liverwort
ch19roplast, genome, however, the trnArg(GGG) gene, which has not been found in the
.chloroplasts. of any species of .plants,., was identified near the 3' end of the rbcL
gene. Arginine codons in the codon table are separated into two boxes including AGR
and GGNcodons (see Table 3). Godons, AGA and AGG, can be read by tRNAArg(UCU)
using ~/u wobbling of the third letter. The GGU and GGG codons .can also be
recognized by tRNAArg(AGG) and tRNAArg(GGG), respectively. However, GGG and GGA·
codons could not be read by tRNAArg(AGG) and tRNAAr9(GGG) without modification of
the first letter of ACG and GGG anticodons. In I. coli there are also three species,
of arginine tRNAs having anticodons ACG, GCG and UGU. Therefore these results
indicate that the mechanisms of the codon-anticodon recognition in chloroplasts is
similar to those in I. coli. In addition, unpaired nucleotides ;n amino acyl stem
were also reported in chloroplast tRNAAr9(ACG) molecules of liverwort (Kohchi et £1.
1986), Euglena gracil is (Ha 11 i ck et £1. 1984)', Spi rode 1 a 01 i gorhi za (Keus et £1..
1984) Pelargonium zonale (Hellmund et £1.. 1984), and tobacco (Kato et £1. 1985).
These inc;:omplete structures of amino acyl stems may alter the codon-anticodon
recognition. A promoter sequence is not detected in the 94 bp spacer region between
rbcL andtrnR coding regions. :Instead, there are two stem structures consist of 8
and 3,2 bp ,long (dG= -44.9 kcal) that may function as intracistronic termination or
RNA processing signals (Fig. 4B).If it is so, the trnR .gene may be co-transcribed
with rbcl gene and processed into mature tRNA.
-43-
II-2 Genes for photosYnthetiC polypeptides
The most important l"ole'of chloroplasts is photosynthesis. L;verwo'rt
chloroplast genome ;s smaller (121 kb) than those in other higher plants {Palmer .§1
!!.l. 1985}. Nevertheless it is interesting to know whether the liverwort chloroplast , ,
genome has necessary sets of genes for photosynthetic polypeptides; Events in
photosynthesis take place on the thylakoid' membrane as well as in stroma of the
chloroplasti. The thylakoid membrane contains re~ction centers (called photosystems
I ~nd II). cytochrome b6/f complex and coup~ing factors. A cytochrome b6/f complex
in chloroplasts operates in an electron transfer chain of a plastoquinol
plastocyanin OXidoreductase between photosystem II and I. The complex has four
major polypeptides. Three of them. cytochrome f, cytochrome b6 and 17 kd subunit 4.
are synthesized in the chloroplasts, whereas the Rieske FeS protein is synthesized
in the cytoplasms (Hauska 1985). The genes for cytochrome b6 and the 17 kd subunit
4 are separated by 1 kbon the chloroplast genome and are transcribed as a common
precursor mRNA (Alt et El. 1983). The gene for cytochrome f is located distantly
from others and translated into a preprotein larger th~n mature cytochrome f.
suggesting that processing occurs during insertion from the ribosomes in thestronia '
into thethylakoid membrane CAlt et El. 1983. 14hilley et El. 1984. Alt and Herrmann
1984 ).
RESULTS
Genes for photosynthetic polypeptides were identified by compartng amino acid
sequences of the ORFs with those of photosynthetic proteins reported previously iri
other species of pl~nts. The amino acid sequence alignments of the identified
photosynthetic proteins are shown in Fig. 7 A"'-K. On the LSC region'sequElnced in
this' study, there are ten genes for photosynthetic proteins; the large subunit of
r1bulose-l.57 bisphosphate carboxylase/oxygenase (rbcL), 51 kd-P~680 chlorophyll ~
apoprotein (psbB). cytochrome b-559 polypeptides (psbE and psbF). cytochrome f
preprotein (petA). cytochrome b6 (petB). cytochrome b6/f complex subunit 4 (petD)
-44-
CA) rbel
L 1y~ntort Splnoch Tob.a-c:co &1 •• A. n1du~.!Ins
(8) !,sh8
LhcN'Drt
Splnoch
(el p:b£
lIvoj;'(l,t Splnoch lClb~-C'co OonoUlOr.
(0) p.bF
t hen-ort 51' lnoch Tob.acco OJ1nother.
(E) p.bG
MSPQT ETKAGVGFKAGI'I:OYRL TYVTP OYi:T~DTO I LMFRKTPOPGVPAEU,GllAV MESSTGnmvWrOGllllLORYKGRCYO I DPVPGWIQY I AY ::::::::: 5: E::::::: :r-:::,:: E::: l::::::: ::VS,,:: ::P::: :A:::::::::::::::::::,,:::::::: :II:E: :A:::::: :C: :::::::: :5:::,::: ,E:~:::::: E:Q:::::::::: 'V,: ,,,: ,P:: ::An:::::::,,:::::::: :s:: ,::: ::,R:ER:V::KO::::: :::::::: :5:,::::::: :1;:::::: E,:::: i:,::: ::V:: ,l: ::P: ,::A:::::: :A::,:::,:::: ,5::::: '" :H,E: :::OPD:: :C, :-:X:QSA:-: y,:::,:: k,:::,::, TP:" ,l:,:: :1'5::,::: ,D:,:Ao I:::::,::::::: ::L: :D.'I:,,: :K: :I/,E: :():: ::5: 1':1'
VAVPLOLfEEGSVTlIIIFT5iVGlIVFGFKAlRALRlEOLnipPAI'TKTroGPPIIGIQVERDKLijKYGRPLlGCTI~PKlGi.sAKIIYGRAV;'ECU!GGlDFj ~;;;i};;;;;;;;;;;;;;;;;;;;;;;:;;;;;;;;; ;;~; ;~;;;;;;;;;;;;;;;;; ~;;; ;;;;;;;;;;;;;;; ;;;; ;;;;;;;; ;;;; ;;; : : : : :: : : , : : : : : : : : : : : :: : : :: : : : : : : :: : , : , : :: : , : 5: : :: : , : R:II:: , : :: : ,II:: , : : : :: : , :: : , , : : ::: , :: : C:: : : :: : : :: : I:::::::::::,:: Il:::::::::::: I :S::::: I:F:V: lV::::::::::,:: ::l,:::: ,:11:::::::,::::::: ::.::::::::::::::
. KOOEIIVlISQP FIIRWRORFLfVAEA I YKSQA[TGE I KGIIVLIIAl AGT CEEMl~RMCAREi.GVP I Villl[)YL TGGfT AlITSLAfYCRDIIGLLLH! IIRA!IIIAV ::.,:,:::::;:,:::,:: :e:: :L, :A::::::::,:::::,,::: :U:M::: Vf:':'::':::,,:::::I:::, T:511::::::,:::::::::::
~~:; ~;; ~ ~;: ~ ~:; ~~ ~~:~~ ~ ~~~ :~~:;~ ~;;;;;;;;;;;;; ;~;; l;~;~~; ;Q;;;;; ;:~;;;;;:;;;; ;~;S~; ;;;;;;;;; ~;;;;;;; ::,:: I: ::::Q:,:::::: ::0:,11: :::::::::::: ::V: :P:::: :11: ::EF:K: ::H:: I:: :F::A:::: :: T:: 1:1/, :: ::V:::::::::::
I DRQKIIHG I irrRVLAKAlRilSGGDH I ~AG TVVGnEGDRQVTLGFVDLLiWOY! E~DR5RG I YFlQOW'ISLPGVF PVASGG I HVWIlIIPAi. TE I fGDOSVl. ::::,:: :Ii:::::::::: l:::::: :5::::,::: ,[:01:::::::,::: :T:::::,::::: :S,,:T: ::l:::,:::,,:::::,,::,:::,,:
;;;; ~;; ;~; ;~;;;;;; ;;;; ;;~; ;~;;;;;;;; ;~~~:;;;;;;;;;; ;~~;?~;;; ;;;; ;;; ;;;~; ;;i;;;;;;;;;;;;;;;;; ;~;~;;;; :: ::R::::::::: ::C: :l::::: LIS: II:::::: ,KAS:::,: ::1i,E:lI::A::,: : V;:, " ::A:Ii::: L:: :.:::::,:::::, V:,,:::,::
QFGGGT lGIIPWGllAPGA V NmVSLEACVOAAl1 EGRDlAAi:CIIE II REACKwSPELSMCi: I W~£ j ~f Eri)/ J on E::,,::::::::::::::::: A,,:::,:::::::::::, :T:::: :T::,:: :A:::: V:,:::, ::PAII,: V ::::::::::::,:::,:::: :A:::" K,:,::::, :0:::::::::':::::: A::: :V:: ::V: II:MV:V:DK E::::,::::::, :fI::A::: :A,::::,:::::,,::: :VQ-: ,KII,::, :A: ,A::,::: :D::: O:KAII:: I E:::::::::::, :L: :1::: :A::::,::::::.:: :Y:: :GO:l:: :G::::, ,A: :lPl::::::: : £TIl: ~:
475 ~75 90.S~ 477 90.91 47.4 87. n 472 80.6t
IIGLPWYRVllivvWOPCRLi AVIIUlHTAL V 5G1/AcSIW. YELAVFDPSD?VLDPHlI!1QGHFY I PFHTRLG I TKSllG~1I5 irCHVIIIAGi II$YEG"iMvH
~ ~;.w~~~il~~~~ilo~~~~ ~ S~ill ~I~ ~~ ~~~~~~~ ~ ~~~ ~~o~ ~~~~~~~~~~L; ~}HTRL~ i il,Sii~~1 i ~~Gi 1 ~OP s i WSYEGVAGAI'I
IVlSOLLfLMIWllW'IYWDLELfROERTGKPSLOLPKIFGlllLFlSGVLCFAfGAFlIVTi;LFGPGIWISDPYGLTG~VQPVA!'AWGAEOFOPFV~GGIA5
HI! I flAG I LG i LAGlFIIL5 VRPPQRL Y~Glfu..GIIV HVlSSS I M vrr MF VVAGl!1lIYGSfIA TP I EUGPTRVQWOQGFrQQE 1 ORR I RSSKAEIILSLS E
;Iil i :J.GTLb \ U.W~L~~ks~Q~L ~~~L~:II ~TVL~S~ I!J.~ ~~ !J.~~~i~~~ r ~p I WGPiR~Q'~bQGY }&l~ I Y~RVSAGLAi:ilQsd AIISK I P EKlAfYDY 1 GIIII?AKGGLFRAGA VO~GOG I AVGWlGilA VFKDXEGII ELfVRIIli? TFFETF PVVL VOEQG I VRIIOVPFRRAE SKYSV EQVGVTVE
~~skIH~U.~~bYiGili,pIlGGUiJ.GSIIoiIGbGi~~~l~iIP'}R~~~GRt~~~R~p~~m~p~~~Iix;~i~iJ.~~P~RiJ.ES~Y~~iQ~G~iVE fYGGElOOV5fSOPArVKKrARIW)LG£I;ErDRArlKsix;vfRSSPRG~nFGllArfJ\i.lFffGII11IHGAijTlFROVfJiGIDPOlDAQVETGAFQnGD
FYGGElHE~h~v~~~X~~1R~Qml}h!JJ.Eg~~fg~~RGwn~!;:~5r~W~f~~I:d~5~TlfkDvr~IDPDlDVQv[r~n~I~D PTTKRQVI 508 :::: :: PTTAAQGV .s08 88.2%
I(SGIlTGERPf NIl ! T5 J Roo.! lIS I T I PSlr J AGlILH5TGLA Y~VFGSPRP/jEYrrEIIRQEVPL I TGRfllSlfQl Of rTKSF :: ,5:·,: ,5::::::::::::,:::::,::::::::::,::::::::::::::::: ::S: :GI:::::: :0:: ::l: ::SR:: :: :5::: :s:::::::::::::::::::::::::::::::::::::::::::::::: :5: :GI:::::: :DP:: :l: ::SII:: :: :5: :C: $:::::::::::::::::::::::::::::::::::::::::::::: :::5::GI:::::: :D::::l: ::SR::
HT! DRTYPI Fr~RWLA VlIGLAVPTV Ff LaA 1 SAHQFHiR :::::::::::::::: I" ::.i::: s: ::S::::::::G : ;;:::;;::;;:::: I:::::::;;::: S;;::::::G : : : :::::: :: ::: ::: : ::: :::: 5:,.: S:: ::" ::G
39 ·39 ·89.·77 39 92.3% 39 9Z.J%
83 63 69. ,X 83 88.07 83 88.07
llvc ..... ort IIVL11FXFfTCEIISlEDIIsrnlKJISl ESSFIII~TLTIISI i l TttuofsiniARLSSLWPli. YGT5CCFlEFASL IGSRFOFORYGl VPR5SPRQMlL I ITA ••• :: h =:: : := :: ::: ::::::n:::::::-;n::::::::n:::n:::a:::::::::;::::; ::
Ko I,. HVl TEYSEKKKKEGKOS I n 1 II-Sll £FP llOQTSSII$VI STTPIiDlSIIW$RlSSlWP II YGTSCCf I m.st I GSRFOFORYGLVPRSSPRQAOlI t fA
GTVTIIKMPSl VRLY[WE:PKYV I AHGACr I TGG/(FSTUSYTTVRGVOKL J PVOI YLPGCPPK PUll I itA II KtR~K r AQ(! Y(EK~ J i.KKGTRfHlU
. . . . ~QfllF F511lDNPKL T -SSllQF FQSXKTSXVl1ETSlJ FXEXElIL 243
TRHKLYVRRSTHTG~YEQELlYQ;'PS~LOISSETmsbvsSYKLVH 248 62. IX
Figure 7 (continued).
-45--.
L1'veriiort Sp~h'.e~"" ' , Pca .
. :Oenotherll.'
, MQI/RilftliIL i I ~WA I RU 5 iM III ilT i niss I SEAFP i-r AQQGrfliPREA tGii IVCAIICHLAK!:BVD I EVPQSVLPIITV~EAVV!: I PYDflQ1 KQvt:IIl/GK ::TI: TfSW:KtQIT; s: ,: SL: LVI: T~i::AII;Y::f:::::::,:; ,i:: i i:: i::: ::11:::::: i: :JI: ': :ll:::: :':":R':::: :L:::::::: ::T:,:AFSWiK: EIT: s: :VLUI:YI.: TRAPiill:Yi :F::: :'::::: i i.: iii:" i:: i: i :lli:::::: :":Ai: ill:::::: :"R::, :': :V::: :':::: " ",HK:TfSW:~iElTiS: :LSLM:Yl:TRT::ill: Y::f::: i:::: :":::::::: i:::: :11:;:: i::: :A::: 0::: i:: iR::: :R:V:::::::: :E: ::TfSWVi:EO~IT:S: :VS:I1:YV: TRT:::II:Y::F:::::"::: ii::::: i i:;: :;:5:::::: :~i :,,:: :D:::: ::LRi:~,::::l::,:':~::::
. KGSLIIVG~vLlLPEGFELAPSDi!IPPEllKEKIi;lIlFFQPYSllllmILViGPVPGKr.YSEllVFPILSPri~AtliKEA!lFlKyprivGGIIRi>RGQIYPDGSK :;G:::::::::::::::: :P:: :5:: i:: :M:': :S::S:RPH:Q:::.:::::: :Q::: ,'IT:: ::Ai: ii :r.:DV:::: ,::: ii:::::::::: i,i:.
~::A:: :::: i::::::::: iPII:LS:Q1:::: :::S::S:RpT::::::, :::::::::: IT::::::::: iKRDVY:':: ::Li:::: ::,:::::::::
~;g~~[~~;; ~,;;;; ~;; ~ ~~~; ;~;; ~;;~; ;;~*; ;~~~~~~~;~ ~; ~ ~ ~ ;;;~;;; ~ l~1; ~;11 ~;; ;K;~~; iii; ii; iii; ;;; ~; ;;;;;; ~ SlIIiTVYl/ASi TGKVSKI FR~t:KGGYE I Tl 00 I SOOIlKWO I SA;!GPEl i i SEGELVKVoQPl Tl/iIPI/VilGFGqGOAEVVl.QDPLR j QG Ll.LffGSV I LAO
. ·.1 iI:/itm~1r1~ml~ 111 ~11!11~1~1'11m:~ 1 !~il1! m~ l! 11f~ ~ll i ~11111111 ~ ~ 1! 1111~ ll! i!!~! i 11~lmlllll IFl~LKKK.QFmQLAEIlNi= : :: :: :: ::::: ::: s: ::: ;?;; ;;;;;;;; ;;;~;;~,; ,v::':, i:,',,'::::: ,Y:'::":
320 , 320 78.8% 320 75.92 3.18 . 78;.1% 320. 79.1%
(G) petB"
L. "lven.ort, MGKVVDWFEERLEIQAIAODITSKVVPPHVl/lfYCLGGlitr.CFLVQVATGFMTFYYRPTVTEAFSSVQrIHTEVIIFGWlIRSVHRWSASfUiVU-L'IILH
, ~IGSK/IVSRFRR~RMI ~~~~~~~~:,t~,I~~~m:~[~~g&o~i}~~~~t~~Rmio~~AgQ~i~~~~~I}&~~m&iIR~~~;~~L~iUI Spin.ch
(Ii) 'petD
l1verwort Spfn.cli Po.
(I) alpS
Liverwort Spinach Tob.cco l1.i,e E. coif
I fRVVl TGGFKKPREL TlIVTGVI LAVL TVSrGVTGYSlPilOQ I GYHAVK i VTGVPEAIP i IGSPl VELLRGSVSVGQSTi. TRFYS LllTFVLPLL TA I FHi.
vF~t~1. T~~f~~~REt~tT~t\lLG~1. TASfG~T~YS!'P~~ ~ G~H~ v~ It~GVPO~1 "VI 6SPt. tEL LRGS.\SVaQS TI. TRFmiM~I.~W ~ v}~~. 215
: ~: -: : :-::: : : : : : :- .. M1IFLMlRKQGISGPL, 211 .86.0%.
. , ~.
'1IGmKPOLSDP1 LRAKLAKGMGilNYVGEPAWPIIDLL Y I FPVVlLGT I ACr.VGLAVLEPSM I GEPAI;PFATPLE I LPEWYf ~PVf91 LRTVPIIKLLGVLl. ::::::" :1/' :V:,::: ,,::::::::::::::::::::::::::::: :11: :,::::::::::::: D:::,::,,:::::::::::::::::::::::::
: : : : : -:J:: V: : : :: :.::: : : :: : :;, : : ::: : ::: : :.:: : n: : : : II: :: : : :: : : :: : : ;: 0.: :: : :: : : :: , : " :,:: : ,; : : :: : :: : , :::
I-IMVPAGLtTv~FLElmIKfQIIPFRRPVAnVFLlGtvvAi.WLGIGMLPIDr.sLTLGLF "S::::: ,;" ,::,:,,: ,::: ::::::::: :V,:::::::,: ::T;::::::::,::' :VS~::: :,::::::::: :":::::::::::::: :"::::::: :,: '':,:: ::J::: E::::: ,,: '"
160 139 (160) 95.6t 139 93.1%
H~'rIlFLAfGHSTL VAKIl I GS] TQV I GPVLilV AFSPGKHPN I YIISLI VKoQIISAG'EE II/VTCEVQQLLmlilKYR~ Vi\fISATOGI1BRGMKV i DTGAP L TVPV HRIIIPTTSOP:V, ::EX: :L:R,A: I:::: :11:: ,p,::::::: :A::: :GROT: :QPM:::::::::::: ;R::,:,::: :,': LT::: E::,::::: S:::
'I<RIIIPTTSGS:V:: :EK:,:P,RVV, I::::::::: r::,:,:,: :A:V:QGRO:V:QP:: :A::: :~::::: :R::: I:::; :E: LT::: E::::::: IS::: I<RTIIPTTSRP: I:: lEE: SV, R:D: I::::: : IT: P: ::L:Y: ::A:::: SRDT:OKQ:::: :::::::::: R:::: ::::: E: L:::: E:::: :T:: S:::
MAT: K:V,:: :A: Y:: E: PQDAV,RV:OA: E:OIIG: ER--LVL::: :Q: :GGI::TI: :GSS: :LR:':LO:K: LEII: IE:::
,(;EATlGRIFi,vLGEPVDl/LGPVEy.mFPiHRMPAfTQl.OTKLSIFETGIKYVOLLAPVRRGGKIGLFGGAGVGKTVLiMELIIIIIILXilliGGYSVFGGV
.~ ~~~; ~~; ~ t;;~; ;;?~ ;~~~; ;~~ i ~:~;; ~;~;;~~ ~,::~~\~; ;r;;;I; ;~;;;;;; ;,;; ;;;;;; ;;;;; ;.; ;;.;;;;; ;~;;;; ~,;;;;~;; iG::: :.:::::::: :.1::::: :OTSA:::,: :5:: :':IE,:::.::::::::: :.:::;::::::: :i·::,:.::,::: i,: :'",: :::A::::: n: ,: t: :K:::: ::M::::::: :MK:E1GEEERIIAl'::,: : SYEE:SIISQELL: ::::, I: :MC:FAK:, £V:~,::: i: :~::: : WI: ::: R: :AIE:$:Y:: ,Ai:
GERTREGIWLYMEMKESKVillEQlllSESKVALVYGQMIIEPPGARHRVGlTALTMAEYFRilVfIKQOVLLFiol/lfRFVQAGSEVSALLGriilP.SAVGYQPTl. ::::: ::::,:,:: i: :G:::::: :Ai::::::::::::::::::::::::::,:::::::: E::,,:,::::::,:,:,;:,:::,::, :~::::: i: i:,
:: ~:~ ~ ~ ~~}:~~ :~m:: b~~~~; ;: I::: ::: ~:: :m: ~l: :~::: :i~~ l mm::,::~: ::~~~}~: :~:::!: ::::::::::: ::: STEMGTLQEiUTSTKEGSITsIQAVYVPAilDLTDPJlPATTfAillOATTV(SRGLMKGIYPAVOPLOSrSTMlQPWIVGEEHVETAQGVKQTLQRYXELQ ::: ::s::::,::::::,::: :~::::,::::::::::::::::::::::::::::::::::::":::,:::, :.:R:::,:::, I: ,R: :E::,::: ::: ,:: ::S:::':::::::.::::',:,::::::,,:::::::::::::::::::::::::::,:::::::: i:::::: :R:::::::::: :R:::::::::::: ::: ::5:::::::: :K::::::::::,:::::::::::::::::: f:::::::: :5,,:::::::::::::::: :R:: :11:::::: :R,: E::::: ,::: ~E: ::V, i:::::: :T, U::Y::,:::::,::: :S::: ii::::: :V:: ::QI:Sl: ::~::::: :,::' :RQ:D: lV::Q:: :0:: R::QSI:: ::Q::K
OIIAI LGLOELSEEORL TV ARARK I ERFLSQPFFVAEVFTGSPGKYVSLRETl KGFQfll LSGELOSLPEQAFYL VGII IOEATAKM TLQVES ,::::::::::::::;::::::::::::::::::;::::::::::, :G;A::~:R:: :t:::: i: :::':,::::;::::,,:::: :MIILEH: ,KlKK
i::: :::~::::::: m~:!~ ~ ~ :~::: i ~ i ~i! ~.i 1 ~.iii i:i i :;~'~~i ;~{ ~ k~i~l: i~:~:~'i : : : ~,~:ii1 ~~ : mi~~t~~: :~t~~
Figure 1 (continued).
~92 498 BB.4l 498 87.4~ 498 a6.2~ 460 62.B~
(J) .tpE
L 1 ,.",or ~, ,HUILRI MPNR] VWIISD I Qf: II LSTI15GQi G I LPIIHASVLTALD I G I V~ i RL-lIoQHsmLHGGF AMI OWWL TJ L VlliiAEtASEIDYQEAQETFQKAK ,Spl"o'ck' 'IHdiCVLTi" is:,: ,EVI::::::,: i ,', "V: f::, PTA,: V:,:, lR'" :-,::: L, L, ,', ,'::: R:G:': Er:','::;,:: :RG,D:::P::':"Q:.LEEE, 'Tob.ceo Hh, :SVLT::::: :0: EVE, :V:: ,,',::::::::: ,PIA: :V,:: :LR:::-::, :L::: :,:::: :R:G: :H:V:::::: :G:O: :P::: :Q:LEL:E Mol z. MK::: YVlT: k:: r;OCEVK:,,:: ::,:: :': :V::!:: PIli: :V,M:PLR::: L:::: L,AV:W5: P,R: V:: Ell:, G,:,: LGiO:: PE" ,QALEhE 'E. '0';11 'HAMJYII: DVV5~EQQljfSGLVE,K:,QVTGSE' Elf, y,G: :Pb, :IKP:MIR,VKQIIGllEEfIY:S: :'ILEVQfG: ViJ;ADl: IRG(JOL: EAR:lIEAKR,:E ,-
.. ;. . T/ILEEAEGIIKKKE IEALLVFKRhKARLEAItlflASKL 135 A:: RK:: :KRQ: -:: :!l:ALR:: RT,V"SiTl:5 13~ 63.0.
·fI, VKK::: RRQ:~:: :fl:ALR:: RT,V:::: PI, 133 63,'7%' A"SK: "T: ~t-V"K:ALR"RT: l"V:WIPPSII 137 51.9Z EHISSSH: DVOY~AQ:SAEL:A: I:Q: RLSS ,133 22.2~,
(1:) ndh3
ORF 120 MFLLQKYOYFfVfLL II SfFS I L I FSLSKW lAP I flKGPEKFTS YESG I EPMGEACi QFQ iRYYl1FALVFV I FOVETV fl.:'" VPWAMSFYilFG I S SF I E~~ flu. mit. URfJ Hlf ~ All i.Hi flTLLAll.LHI I TF~LP9L~GYM~~S~P~~C~FD~~SP ~RVP hI1KH.LVA I TFLlfDLE.1 AlI.LPLpiliiLQTT~L-P LI\~SSL . .
IfIUlIIGLVY!\f/R-KGALEfiS ,lZ0
L~il i~~SlAYE~'LQkG-[O~TE 114 30.8X
Figurf! 7., Amino acidsequenc;e ali9I:lments of photosynthetic polypeptides.
T-he ami no acid sequences are s,hown by or:re letter codes. Identi cal ami noC!-cid
residues areshCiwn by colons, 'and deleted residues are shown by dashes. The
am;'nQ ad; d residue n'umbers and sequencehcimol og; e's with liverwort gene
pr:qducts ate: indicated at,theend of; sequences "
-47-
and ~ and € subunits of H+-ATP synthase (atpB and atpE). The amino acid sequences
of the photosynthetic proteins in liverwort chloroplasts exhibit high homologies
(78.8%-95.6%) to those of spinach chloroplasts except for atpE (63.0% homologous to
spinach atpE) and psbG (62.1% to maize).
Gene for the large subunit of ribulose-l.5-bisphosphate carboxylase/oxygenase' (rhcL)
The rbcL gene was previous'ly mapped in the Bam5 fragment by heterologous
hybridization with the tobacco chloroplast rbcL gene (Ohyama et £1. 1983). An
ORF475 (56355-57782) shows high amino acid sequence homology with rbcL of spinach
(90.5%) (Zurawski et~. 1981), tobacco (90.5%) (Shinozaki and Sugiura 1982), maize
(87.6%) ·(McInt.osch et £1. 1980) and &. nidulance (80.6%) (Shinozaki and Sugiura
1983) (Fig. 7A). The rbcL gene was localized 508 bp apart on the opposite strand to
the atpB gene coding for (3 subunit of H+-ATP synthase. In the spacer region between
atpB andrbcL genes, typical stem-loop structures can be formed including 45 bp AT
rich stem structure (dG= -44.9 kcal).
Genes for photosystem II P-680 chlorophyll.~ apoprotein (psbB). and for cytochrome
b-559 polypeptides (psbE and psbF)
The amino acid sequence of an ORF508 (69026-70552) shows 88.2% homology to that
of the spinach 51 kd photosystem II P-680 chlorophyll ~ apoprotein (Morris and
Herrmann 1984). The amino acid sequence alignment with that of spinach is shown in
Fig. 7B.
ORF83 (63554-63303) and ORF39 (63293~53174) are located close together (10 bp
apart). The ORf83 shows 89.2% amino acid sequence homology to that of the spinach
apocytochrome b-559 polypeptide, which is psbE gene product (Herrmann et £1. 1984).
The ORF39 shows 89.7% amino acid sequence homology to the b-559 URF39 gene product
of spinach (Herrmann et~. 1984). The psbE and psbF genes are proceeded by the
typical Shine-Dalgarno-like sequences (GGAGG and AGGAGG, respectively). The psbF
gene terminates by a TAG stop codon that overlap to two nucleotides of the SO
-48-
sequence for psbE. Their sequence' alignments are shown'. in Fig. 7C and 7D .•
Gene 'for photo system II G-protein (psbG)
Recently Steinmetz et ~.ic:lentified a new protein (248 amino-ac;id residues
long) associated with the photosystem II· complex and analyzed.the fine structure of
its gene, psbG, on the maize .chloroplast genome (.1986). The 11. polymorpha'
counterpart of psbG gene (52524-51793; encoding 243 amino acids, 27.6 kd) was
identified and the predicted amino acid sequences were compared as shown in Fig. 7E.
Unlike other photosystem II polypeptides,. the psbGproteins are significantly
diverse iri N~ and C-terminal portions; in average they are only 62.1% .homologous.
whereas the central portions (maize, amino acid residue number.36-182; 11. polymorpha
37-183) are 91.9% identical. In 11. 'polyrnorpha, .the psbG gene overlaps with the last
seven nucleotides of the preceding ndh3.gene •.
Gene for cytochrome f preprotein (petA)
An ORF320 (61641-62603) shows high degree of amino acid sequence homologies.to
those of the cytochrome f preprotein'of spinach (78.8%) (Alt et ll. 1984), pea
(75.9%) (Willey et~. 1984), Oenothera hooker; (78.1%) (Tyagi et~. 1986), and
wheat (79.1%) (I~illeyet·~. 1984). The ORF320 can be liverwort cytochrome f
preprotein gene' (petA) whose molecular weight ;s.33.4 kd. The amino acid sequence
alignments compared with those of other plant species are shown in Fig. 7F. The N
terminal 35 amino acid sequence of liverwort cytochrome f·polypeptides.sho ..... s
relatively lower homology (40.0%) to that of spinach •. On'the other. hand, the
remaining sequence of 285 residues gives . high homology. (83. 5% with spinach
cytochrome f mature protein). However,the hydrophobic·characteristics of. the N
terminal region are conserved in petA gene products indicating that N-terminal -35
amino acid residues may be functional as a signal peptide. The molecular ~eight of
liverwort mature cytochrome f polypeptide is 31.3 kd.
-49-
Genes for cytochrome b6 (petB) and for cytochrome b6/f complex subunit 4 (petO)
ORF162 (72078-72566) can be postulated by using initiation codon AUG (72078-
72080). The C-terminal amino acid sequence was found to be homologous to that of
the spinach cytochrome b6 polypeptide (petS) (Heinemeyer et EJ.. 1984). However the
N-tefminal portion including initiation codon is quite different from that of
spinach petS gene. Instead, a consensus sequence for the 3' end of the intron
(RAGCCG.AUGAA •• GAAA •• UUCAUGU.CGGUUY) was found in the beginning (71859-71924) of the
ORF162. A consensus sequence specific to the 5' end of the intron (GUGYG) was also
located further upstream of the ORF162. This indicates that the liverwort petS gene
(71424-72566) is interrupted by 495 bp intron (71430-71924) and codes a polypeptide
of 215· amino acid residues (Fig. 4F and 7G).
The amino acid sequence of ORF139 (73271-73690) shows 95.6% and 93~1%
homologies to cytochrome b6/f complex subunit 4 of spinach (Heinemeyer etEJ.. 1984)
and pea (Phillips et EJ.. 1984), respectively (Fig. 7H). However, the consensus
sequences for an intron (493 bp: 72723-73215) are also present in petO gene as in
the petS gene. The putative secondary structures of intronsin petS and petD genes
are shown in Fig. 8. The initiation codon af the petD gene can be extended to the
upstream methionine codon (72715) as shown in Fig. 4F. The molecular weight of the
liverwort petO gene product can be estimated to be 17.4 kd(160 amino acid residues)
considering presence of an intron in the coding sequence. On the other hand, intron
specific sequences can also be identified on th~ ONA sequences of the spinach and
pea genes, although they do not describe the presence of introns in the coding
sequences for petO gene. If this is true, the amino acid sequences of the 5'
extended region of the petO genes were highly conserved in the three species of
plants (95.6% homologous to spinach petO protein, see Fig. 4F). A sequence motif
(AGGA) for the ribosome binding siles are present 9 bp upstream from initiation
codons of each petO and petS gene. '
-50-
6 C G U G
5'--CAAGGAUUUUUGAAUAUGGGU~ AG6A MetGly
petB>
G U G U G
5'--AAAGGAAAAUGGAUUAUGGGAGU~ AGGA MetGlyVa
petD>
A A li U A G A U U U ,A U U A U A U A A 6 C U '6 U A A G C! ~ .. ~ . : : :: A U U U G 6 C A U U U U U C
~ ~ ~ G G ~ ~ A ~ ~ ~ U
'~ ~ ~ ~ ~ ~ ~ A ~ ~ i U A A A 'A -U
_~AAAGUAUAC6AUUGG--3' -LysVa nyrA~pTrp
A A U A U U A : U
U U : A G A : U U G : C U U : A 6 GAG C eGG A : U • • •• ••• G
- U C U U A,G C C' U U U '6 6 G G 6 G A C U U U ... : : • : : U U U C cue U A AU U A A A U
..,. AACAAAAAAACCT -3 I lThrLysLysPhc
Figure 8. Secondary structures of introns in petB and petD precursor mRHA molecules. The Shihe-Dalgarno sequences and amino acid se!luences' are shown
under the messenger. RNA molecules. Underlined nucleotide A is putative branch
point in the lariat form. The junction sites -between exons and introns are
indicated by arrow heads.
-51'-
Genes for ~and S subunits of ~-ATP synthase (atpB and atpE)
A gene cluster,atpB (55846-54368) - atpE (54362-53955), was localized 508 bp
, apart from rbcl in head to head on the opposite DNA strand. The nucleotide-binding
subunit (3 encoded for byatpB gene contains 492 amino acids (Higgins et!!l. 1985).
The~. polymorpha atpB amino acid sequence showed 88.4% homology to spinach atp8
gene product (Zurawski et!!l. 1982), 86.2% to maize (Krebbers et i!l. 1982), 62.8% to
~. coli (Saraste et!!l. 1981). In contrast. 11. polymorpha e subunit (135 amino acid
residues), product of atpE, is less homologous to that of spinach (63.0%, Zurawski
et!!l. 1982), maize (51. 9%, Krebbers et!!l. 1982) and ~. col i (22.2%. Sarasteeti\..
1981). Although the Shine-Dalgarno sequence of atpE gene was overlapped with the
atpB coding region in spinach and maize, the coding sequences of these two genes of
11. polymorpha were not overlapped as shown in Fig. 9.
DISCUSSION
The comparison of amino acid sequences of photosynthetic polypeptides between
liverwort and other higher plant chloroplasts reveals the highly conserved amino
acid sequences (78.8-95.6% with spinach photosynthetic polypeptides) during the
events of chloroplast evolution. In contrast, it is interesting to note less amino
acid sequence conservation in transcriptional and translational polypeptides
including d. subunit of RNA polymerase (54%), ribosomal proteins (46-68%) and
initiation factor 1 (65%). These results suggest that chloroplast ribosomal
proteins and RNA polymerases may have changed faster than photosynthetic
polypeptides,ir ,the event of evolution.
Previous studies showed that several plants have two rbcl mRNA transcripts:
(Erion ;985). It has been shown that the smaller mRNA transcripts are produced by
processing of the primary rbcl transcripts (Mullet et!!l. 1985). Although the
spacer region between rbcl and atpB coding regions show less sequence homology with
those of other plant species, several regions are conserved in land plant
-52-
J", ''';.: :·,i '.:,\ :i-,
" ; 1 ~ .J l 10. < •• , '. -.,
MAAGCcMiv."IIC'l"l"TAA~GCCTTMTCi;G"GGG1'GIIT;'1'GGCMTGllrnACCACcri;GIICG!CGTci.GCGCIIc;'/lGchIlCA.v.TGTIc-rer ' KIIKKL ••• GGJlGG HIIMTYlIl,OVVSAEOQI-IFS
(atpnp . (atpE»
GCT ACTGCJuUJ\GCJ\GCT Ai:Tn IICMGTGc"GIIGTT AIIA.V.'l'1'IITGCTi .. \II TCT1'CGT~ TCATGGCTCcr All TCGMm'l'T'TCGM'rTcc II T II K II II T L (I V E S }! L " L 11 I Ii " P II R I V '" ,~ 5' ("tpal> - GGAG (;;tpur;-- --- ---
'l h-
Figure 9. Comparison of nucleotide sequence of spacer region between atpB a~d atpE genes. The i denti ca 1 amirlO aci d res; dues between 11- po lyinorpha 'atpBE and ,n
~pin~ch atpBE proteins are undeilined. Shine-Dalgarno sequences are shown
u~der the ,nucleotide sequen~es by bold letters.
·J5 .11) S/) rmw; .... W!C~ .. AATAATGT1T ..... ·~Al11nATCGAGCAGACCT .. ·lTGTI,G"..GACGGACTJ ATG ---- - --
-65.,-
TOEl.i.,cc:o
.3.\ ·lD r ,RIM. 5/)
C.(;lTGC·· .. W~C!ATAATGAIG!1T· ••••• Ml100T'':rAG:.CCI •.• TrGIA,GGGAGGGATI i AIG
-18 •• - --111s.- -368'- 1:,.
·J5 ·10 i-7 "RNA Sll G(;lIGC •••• aiACAAiI.AToATGTTl· .... ·AAI iT ,IGICC,\"T~GAC:I .. • T1GI~GGGAGt",,(;ACTi ~IG . = - =
·31 ·10 r.·R1I~ r_R.~~ S!l GS; IGC· .··lAlACAATI.A1GATGT11 ...... AAlioc~ATCr,!~"G'CCc .. ·llGTAGGG~~L.(;ACn ~lG = = - =
--23B'p- -J68P-
Figure 10. Comparison of the leader sequences of rbcl gene. The promoter
sequences referred to II-lO" and "-35", and Shine-Dalgarno sequences are double
underlined. The conserved sequences in the 51 leader regions of plant
chloroplast rbcL gene ~re indicated by thick underlines.
-53-
11 ;
chloroplasts. The promoter sequences for rbcL and atpB referred to 11_1011 and "-35"
regions, and the Shine-Oalgarno sequence (GGAGG) for rbcL were conserved. In
addition, a sequence motif (TCGAG) was found 38-45 bp apart from the SD sequence of
rbcL by comparing the DNA sequences as shown in Fig. 10. Interestingly,' one of the
51 terminal sites of processed rbcL mRNAs are located 8-12 bp upstream from this
conserved sequence. motif in maize and spinach. The distance from the rb~L promoter
to SO sequence in liverwort rbcL is shorter than those in other plant chloroplast
rbcL, but the distances between promoter regions and this newly identified sequence
motifs ~~re not conserved in land plants. This sequence motif may have some roles
on the t,:,ans1~tio,nal regulation of rbcL gene expression.
In the'l iverwort cytochrome f preprotein (petA), the N-termina 1 regi on .(1:-~5
ami no aci d res idue) shows rather low homology (40%) wi th correspondi ng'reg;:oris of
those from pea and spinach chloroplasts. The N-terminal 35 amino acid residues of
the pea and spinach cytochrome f proteins have been shown to be removed as a signal
peptide for localization into the thylakoid membrane (Willey et ~. 1983).
The liverwort chloroplast petB and petD genes were found to 'be interrupted by
introns classified as group II intron in fungal mitochondria (Keller and Michel
1985). Introns were not reported in the spinach chloroplast petB and petD genes and
it has been suggested that the psbB, petB and petD genes are co-transcribed in a
primary transcript following intercistronic splicing (Westhoff 1985). A similar
gene organization as in the liverwort genome can be interpreted in the spinach
nucleotide sequence presented by Heinemeyer et~. (1984) by application of intron
consensus sequences (Keller and Michel 1985, Ohyama et~. 1986). The molecular
weight calculated from the deduced amino acid sequences of the pe~D (17.~kd) agrees
reaio~~bli with that (lj.5 kd)' of the spinach subunit 4.polypeptide determined; by
50S-polyacrylamide gel electrophoresis (Alt et~. 1983). Findings of introris'in
liverwort petS and petD genes and absence of promoter sequences and termination
signals in the spacer regions between psbS, petB and petD genes suggest that these
three genes are transcribed in a 7.3 kb common precur.sor mRNA. Further study to
-54-"-
identify the splicing junction between their exons ;'s in progress.
11-3 Genes for ribosomal proteins and ~ subunit of RNA polymerase (rpoA)
Gene expression in the chloroplasts shows many similarities with those in
prokaryot i c ce 11 s such as 1. coli. Ri bosoma 1 RNAs (rRNA), trans fer RNAs (tRNA) and
ribosomal proteins involving the transcription and translation in chloroplasts have
prokaryotic features (Whitfeld and Bottomley 1983, Kozak 1983). Chloroplast
ribosomes playa fundamental role not only in chloroplast biogenesis but also in
photosynthesis, since several photosynthetic proteins are made on chloroplast
ribosomes (Ellis 1981). Chloroplast ribosomes share many structural and functional
similarities in common with prokaryotic ribosomes including subunit sedimentation
coefficients, mode of action, size and sequences of rRNAs (Bartsch et~. 1982, .
Shmidt et~. 1985, Bartsch 1985). The similarities in the translational systems
are so extensive that initi~tion and elongation factors. even ribosomal subunits of
f. coli and chloroplast can be interchanged (Graves and Spremulli 1983, Schmidt et
!l . .1983). Some of the cDNA coding for the chloroplast ribosomal protein have been
obtained from pea (Gantt and Key 1986).
The genes for chloroplast ribosomal proteins; S4 (Subramanian at~. 1983), S7
(Montandon and Stutz 1984), S11 (Muller et EJ.. 1986), S12 (Montandon and Stutz
1984), S16 (Shinozaki et EJ.. 1986), S19 (Sugita and Sugiura 1983),L2 (Zurawski et
~. 1984), S14 (Umesono et~. 1984, Kirsch at~. 1986), and L16 (Posnoet~.
1986) have been identified by amino acid ~equence homology with the corresponding I. coli ribosoma) proteins. One ·third of chloroplast ribosomal proteins are, thought to
be encoded by chloroplast genome (Filho at~. 1981) but some of the re~ts are shown
to be encoded by nuclear genome as a precursor polypeptides (Schm~dt et ll. 1985).
It is not so far: clarified how many genes for ribosomal proteins are encoded by
chloroplast genome.
Two different types of RNA polymerase activities appear to be present in
chloroplasts (Greenberget~ •. 1984). One;s associated with a transcriptionally
-55-
active chromosome and is preferentially active in rRNA synthesis (Br.iat and Mache
1980). The other ;s readily extracted in soluble form active in tRNA and mRNA
transcription '(Gruissem et ~.-1983). The site' of ,synthesis of these RNA
polymerases have not proved to be either in cytoplasms or in chloroplasts (Lerbs rt
~. 1985, Muller et~. 1986). ' The solub1e RNA polymerase from maize (Kidd and
Bogorad, 1979) and pea (Tewari and Goel 1983) has been shown to have subunit,
structure.
RESULTS
ThirteenORFs shows significant am;no-a~id sequence homologies with respective
~. coli ribosomal proteins (l2, U4, U6, L20, l22. L23, L33. $3, 58, 511,512, 518,
and 519). The amino acid sequence alignments withccirresponding ~. coli ribosomal
proteins are shown in Fig. 11 A-M. Percentages of the amino acid sequence homology
of liverwort ORFs to~. coli ribosomal protein genes range from 25.3% (L23) to 70.2%
(512).
A large cluster of ribosomal protein genes was found on the LSG region next to
the Junction site (JLB) (Fig. 3G and 4G). The organization of the gene cluster was
(trnI)-L23-35bp-L2-36bp~S19-17bp-L22-48bp-S3-57bp~L16-97bp~L14-81bp~S8-86bp-(infA)-
36bp-(secX)-50bp-si1-32bp-(rpoA). The putative infA gene product exhibits 56.4% and
60.3% amino acid sequence homologies to the~. coli initiation factor 1 (Pon et .5!l.
1979) and the spinach infA product (Muller et.5!l. 1986). respectively (Fig. 11-0).
The secX is the putativegene'for unknown polypeptide expressed ;n ~. coli (Gerett;
et.5!l. 1983). The amino acid sequence shows 62.2% homology to the~. coli secX
protein and also shows 86.5% homology to the spinach open reading frame
corre spond i og to the 1; verwort secX prote i n eMu 11 er et El. 1986) (Fig. 11 P) • The
rpoA is the putative gene for~ subunit of RNA polymerase whose amino acid sequence
shows homologies to those of I. coli (25.6%, Meek et~. 1984) and spinach (54.1%.
Muller et ll. 1985) (Fig. 11N). Spacer regions between their genes in the cluster
appears to be. less- than 97 bplong. No'- significant prokaryotic promoter .. sequences
-56-
(A) rp 12 (LZ)
U"CNort Spinach hf.Q]l
l-lAl~L YRAYTPGTRIIRSVPKFDE I VKCQPQKKL TYtIKHIKKGRmlRGI rTSQHRGGGHKRL YRKI DFQRilKKYITGKI KTI EYDPIlRIITYICLlI/YEOGE : ::11: :KiS:SS:: :GA:QVKSIIPRtItlLlSGQRRCG---:':: :A:::: :AR:;:::::::::::: :R:: E:D:V:: :V:::'::: :'::Il:: :,::Il:G:::' :: VVKCKPTS:: R: HVVKVVIIP: LH:GK: FAP: LEKUSKSG':::: II: R:: TR: I::: : : Q": : I V: : K:: : DG: PAVVERL:: : :: :S"II:A: VL:K: "
KRYIL VPRGi KLODT IISSEEAP I L1G1ITLPL TrlHPLGTi\llllH EI TPGKGGQLVRMGTVAK IIAKEGQlVTLRLPSGE 1 RLlSQKCU.T IGQI GIIVOV :::: :11:: :AJlG::: V:Gi: V: :KH: :A::: :0::::::::::::: L: R::: :1\::: :A::: L::: ::KSA::K::: ::V:::: KII: S: :V:,:V:: :G: R::: : A: K: l:AG: O:Q: GVO:A: KP: ::: :MR: I: V: STY: :V: 11K:: :::: : A: S::: YVO: V:RD:AY::::: R:: :M: KVEAD: R:: L:EV: :AEII
III1LR I GKAGSKRWLGKRPKVRGVVMI1P I Oi'PIIGGGEGRAPI GRKKPLTPWGIIPAlGKRSRKIlIlKYSOTL i LRRRKtlS :QK: L: R: : :::: :::: : V::: ::::: V::: T: VVK: GPOL'-: E: A:O:L: V1UILEEEVEKGlIIIVI lLLFVOVVI1RKEIIEIlRISFFVFT MLRVL::: :M:: R: V: : T:: :iA::: V::: G: :11: :: II-F: -: If: V: ::,: VOlK: :KT: S:-: -T: KF:V::: SK
277 287 ,58;5%, 272 4S.4%
(B) rp114 (Ll4)
L f"CNort HIQPQiYLIiVAmlSGARKlHCIRVI GTSIIRKY/\/lIGDII iAVYKEAVPIIMPI KKSEIVRAV IVRTCKEFKRlItIG511 KFDDIlAAVV IIIOEG-IIPKGTRVF : :: :: :::::::::: .: :-: ! : ! : :! : : = : .. ...... .. .. .. ~ ........ .. ............... ..
hf.Q]l HI QEQTHl/IVAOllSGARRVHCIKVLGGSHRRYAGVGO II KIT I KEAI PRGKVKr.GOVLKAVVVRTKKGVRRPDGSViRFDG;lACVLL~W1SEQP IGTill F · .
GPIARELRESUFTK1VSLAPEVl 122 ............ .. .. ...... . GPVTRELRSEKFHKI1SlAPEVL 121 58.2%
(C) rp116 (Ll6)
l!veNort Splrodcl~
.L.~
MLSP KRTKfRKQHCG ilL KG i STRGllV 1 C fGK FPLQAL E P SW 1 TSRQ I EAG R RA 1 TRY ARRGGKlW [R I FPOKP I T I R PAET RMGSG KGSPEYl-IV'AVVKPG ::::: ::R::: ::R:RJ-I: :K:Y:: :H:S::RYA:::V: :A:: :A::: :::::: :S:::::::: l:V:L: ::::V:L:::::::::: :::::: ::5'::::: : :Q:::::: ::M: K: RIlR: LAQGTO:S-: :S:G:K:VGRGRL:A:::::::: :H: :AVK:O:: I:: :V::::::: EK: LAV:: :K:: :IIV::::: llQ::
.. . .. .. K I L Y£ISGVSnnARMNKIMYKHPIRTQF JTTSSLtlKKQEI R::: : : G: ::: TV: : T:: lL:: 5:::::: ::: lEE :V:: :MO: :P: EL:: E: F:L: :A: L:: K:T:V: KTVM
143 135 72.0% 136 53.8%
(D) rp 120 (LZO)
L t VCNort flTRYKRGYVARKR'RKIi I L TL TSGFQGTfISi:LFRTAIIQQGMRALASSIlRDRGKRKRIILRRLWITRVIlAAAi!DNG 1 SYHKLi EYL YKKK I LltlRK I LAQ (Ai .. ..... ........ .. ...... , .......................... .. .. ..... ........ ...... ...... .. .... .... .. ... .. ..... ...... .... .... . ...... .
.L. f.Q]l HARV):RGV J ARARIIKK I LKQAKGrYGARSRYYRVAFQAV I KAGQYAYRDRRQRKRQFRQLIIIAR IIIAAARQlIG 1 SYSKF IIIGLKKASVE I ORK (LADI AV !
lOKFCFSTlI Kill ITE 116
FOKVAfTALVEKAKAALA liS 46.6%
(E) rp 122 (LZ2)
t fVCNort HQTI4TSHKKi RAV~~~ I ~M:""~PH~~~R~VSQ~ ~~RSYE~~~H~ ~EFMPYR~CNP I LQL~S~~;I~II~~F~LSKTtI;F i SE ~ Q~IIK~TFF~~FQ:~~ .L. ~ MET IAKHRHARSSAQKVRLVAOLlRGKKVSQALDIl iYTIIKKMVLVKKVLESAI AIIAEHIIOGAD I DOLKViKI F.VOEGP5MKRIMPRAKG
· . RGYPHIKPTCH ITlVLHI LPK .......... .. .. .. ... .. RAORIlKRTSHTTV\,VSDR
119
110 37;8%
(F) rp 123 (LZ3) I- • .. .. .. .. .. .. ..
l t veNort MIIQVKYPVLTEKT J RLLEK-IIQYSFOVNI OSflKTOI KKWI ELFFIIVKV I SVIISHRLPKKKKK IGTTTGYTVRYKRMI IKLQSGYS I PLFSflK 91 . :.. .... : ': : ,:': .......... ......: : .. - ... .
.L. ~ HI REERLLKVLRAPIIVSEKASTAHEKSIH I VLKVAKOATKAE J~MVOKLFEVEVEVVIITLVVKGKVKRHGQR1GRRSO-KKA YVTLKEGONLOFVGGAEK 100 25.3%
(GJ rp133 (L33) .. .. .. .. .. ..
LlvcNort MAKSKO I RVTI IILECI NCAQIIOEKRKKG I SRYTTQKHRRIITP I RLELKKFCCYWKHTI HKEJ KK 65 ...... '" ......... . .. .... . ............. . MAK-GJREKIKL----VSSAGTGIlFVTTTKHKRTKPEKlELKKFOpVVRQHVYIKEAKIK 54 36.9%
. ~ . . (II) rps3 (S3)
.. .. I- I- I ~ .. .. .. ,; .. ..
L I VCNort HGQKWPLGFRlG I TQIIHRSYIIFAIIKKY5KVFEE-DKKJ ROC I-EL YVQKH I Ktl5SIIYGG IARVE I KRKiDLlQVEI YTGFPAlLVESRGQG I EQLKLII .... ... ... . .......... . ....... .... .... .... . HGQKVHPtlG I RLG I VKPWIISTWFAtlTKEFADIILOSDFKVRQYL TKELAKASVS-,--- RI VI ERPAKSI RVTI HTARPG 1 V J GKKG-EOVEKL-
.. .. .. .. .. .. '.- -. . .. VQII I LSSEDRRLRMTLI E I AKPYGEPKI LAKKI ALKLESRVAFRRTM~KAI ELAKKGN I KG I KI 01 AGRLNGAE IARVEIIAREGRVPLQTl RAR( IIyen,
=-.. .. ... :::: ::: :::: ...... ::': ::::::: ::- ::::::: : n : : RKVVAOIAGVPAQI NIAEVRKPELOAKLVAOSI TSQlERRVHFRRAHKRAVQIlr.HR lGAKGIKVEVSGRLGGAEIARTEIIYREGRVPLIlTLRADIOYNTS
· . MQTIYGVLGIKVlIl FQDEE ............... .. .. ... . ...... .
217 , . '} :. ~ I', ,~
EAHTTYGVIGVKVWIFKGEI LGGMMVEQPEKPMOPKK(jQRKGRK 233 40.6%
Figure 11 (continued).
-57-
(1) rpsB (sa)
L t verwort I'.GllOT IIII/Hi TS I RNIINLGK IKTVQVPIITNITRfll AKllFQEGF lOlln iJIIKQrmo r l i l/ILKY -QGKKXKSY r TTLRRI SKPGLR I YSIIIJKE I PKVlG
h £.'2.ll HSHQOP I ~O~L ~R I ~;IGQMIIKIIIIVmpSSK LXVIII ~IIV~KE~GF I EOFKVEG-o~kpE~E ~ Tl.~~FQ~~'\VVts I QRV---SRPG~R [~KRKD~L~~~I-IA . GMGIV/LSTSRGlMTDREARQKKIGGELLCYI'II 132 .... !::: :::: -:::. ::: ::: GlGIIIVVSTSKGVHTDRIIIIRQIIGLGGE IICYVII 130 45.5%
(J) rpsll (511)
liverwort Spin.eh I:.~
HPXSVKKI NLRKGKRRLPKGV I HIQASFllilTl ViVIDI RG~VVSWSSIIG~CGFKGTKKSTPF IIIIQTMEHIII R iLl DQGMKQAEVHI SG? HAXP1PXIGSRRNGR:SS::SIIRKI::::: :V:::::::::::: :V:; R:: ::11:; :T:: :R::: RG:::::::: :G::: :TVVE:: :QR:::: :K::
:A:APJRARK:VR:QVS-D: :11: :11::::::::: I:: RQ:III1LG:AT: :GS:: R: SR:::::::: V;:: RCIIDAVKEY: I :IIL::: VX:: .. .. I ..
GPGRDTALRAlRRSG1ILSFVROVTPMPIINGCRPPRKRRV SL: : : II::::: ::::: L: :::: II:::: :: : : : ::: X:: : : :: ::ESTI:: L1IM: FRlTHlT:::: I:::::: ::X::::
130 138 72.3% 129 51.5%
(1::) rpsll (Sll)
Ltve ..... ort Tob.eeo L. gr~ci lis h coli
.. .. .. . .. . . . . .. MPTlQQL I RflKRQP I EflRTKSPAlXGCPQRRGVCTRVVnTPKKPIISIILRK I ARVRl TSGFE I TAY I PG I GHNLQEIISVVL VRGGRVKDlPGVRYli II RG : :: : K::::: T:: : : R: V::: : : : R: : : : : :: T:::: : : I: ::: : : : : :: : V:: :::: : : : : : : : : : : : : : : : : : : : : : :: : :::::: : : : : :.: : : : V: : :: :LEH:T:SP:KK:XRK:::::::::: :K:AI :11::::::::::::::: :YT::: :$: :L:V::::::::::::::;::: I:::::::::: ;K: :V::: : II: VII: : V: KP :ARKVIIKSNV: :: EA,: : K: : ::: :::: :: : : ::: ;: ::: ve: : : : : II: : : V: S:: G: E: :::: :: : : I: I: : : : : : : :: ::: :: TV: :
· . TLOAVGVKDRQQGRSKYGVKKSK : ::::: :: :: : :: : : :::::: P: C:: : AS::~: KUII: : :::;:: P: PK II:: CS:::: :K:II:::::: :RP:A
lZ3 123 91.9% 125 73.6% 124 70.2%
(L) rpsl8 (SIB) .. • • • .. .. 0-
l iVCl"ort HHKSI(RSSRRRMPp IRSGEI IOYKIIISLLRRFVSEQGK I LSRRMIIRLTSXQQRLL T1AIXRARVLALLPFUlNEII 75 .. .. .. .. .......... " .. .
h £.'2.ll MARYFRRRKFCRfTIIQGVQEiOYKOiATLXflYITESGKiVPSRITGTRIIKyQRQLIIR;'ii:RARYLSLLPYTDRHQ 75 34. n (H) rps19 (519)
Llvervort Splnllch Tob~ceo
hE!!.!.!. (N) rpolI
L jv"rvort Spinoch h~
CO) in(1I
Liv"rvort SpiMcn hill!.
(P) seeX
Llvcntart Spinllch h£2!!
.. . .. . . .. . .. .. MTRS I KKGPFVIIDHLlKK I EIIUJLKKEKK III TlISRAST! VP-TMI GHT IIIVHNGQEHLP I Y ITORHVGHKLGEFIIPTRTFRGHIIKIIDKKSRR :: ::l: :11::: :11:: :R:: :/(: :/(:A:: E: :V:::::::: 1:-:::::::: I:: :R::::::::::::::::::::: :W::::::: :11:::: ::: :l::II::: :11::::: :ox: :T:A::E::V:::::::: I :-:::::::: I: ::K:::::::: :5::::::::::: :lIl::: :::S:flR::: :P:: L::::: IOL:::: :V:KAVESGD: :PlR::: :-::: F:DR::: t:::::: :RO:V:VFV:: E:::::::::::: ::Y::: :A: K:AKKK
92 9l 83.7% 92 BO. 4~ 92 63.0%
• • • .. ... .. • • 'I- •
M lQOEI KVSTQTLQWKC I ESKI ESKRLLYSRFAI SPFRKGQAIITVGIAMRRIILLNEI EGASliY AKIKKV.KHEYSTI IGLQESI IIDIllllLKEI VlKSES : VREK: R: ::::::: :: V:: RTO:: C: II: G:: Il:: lH: : : : 0: I: : : : : : : :: G::: : TC:: R:: SE: I P:Q: : : : l: I: : : V: E: : H: : : ::: : R: Nl
HQGSV: EFL:PRlVO: :QVS5THAKVTlE:LER:FGH: L:II: L:: I:: SSl1P: CAV: EVE:OG: L::::: KE:V: :D:LE:: l:: :GLAVRVQG
FEPQKAYlSVLGPKKITlIQOll<GPSClKIil1lAQYIATlilKDI LLEI ELill EKD-RGYRI ENLQKYQ£GLFPVDAVfMP I RIIIINYSVHSFESEKKI KE i YGTCE: S: C: R: : RGV:::: : Il: PYVE: VKIIT:H: : S:IEP: D: C:G:Ql: RN~:: : II: KAPIINF: D: S:: I:: t: : :V: : V: II: 1: :YGIIGNEKQ:: KDEVILTLNKS: IGPV:: II: : TIIDGDVE: VKP-: IIVICHLT: WAS: SHR: XVQRG:: :VPASTRIHS: EOERPIGRLLVO-: C:: -~V:RIAYllV: A
LFlEI WTOGSt TPKEAL YEASRIIL 10LFIPLIIISEKKEKilFGI EKHlESiIMSYfPFQSV5LDI EKMTKOVIIFKH I FIOQl.ELPARAYNCLKKVIIVHT lAO ::: : : :: H: : :: ::: :::: : ::::::: L:: FLIIA: EIIV-: LEOIIQIIKV: LPL-: T: H!lRLAE: R: 11K: KI: L: F: : : : : : : : : P: I: : : : : : S: I: : LL: ARV:OR: : LOKlVI :-M:ltIG-T: :-PEEA: RRAATlLIIEQL: AFVOLRDVRQ: EV-KE:: PEf: Pil-LRPV: 0:: : TV: SA: :: :AEAI: Y: G:
LLHYSEODL i K I KflFGKKSVEQVLEALKKRFS IQLPKNKim : :NN:QE::: :H:II: RIEK: K: IFGT:E: H: V: D:-:::R :VQRT:VE: l: TP:L:!: :LTEIXOV:AS:GlSLGMRLE: WPPIISlflOE
340 335 '54.1% 329 25.6%
MEKQKl r OMEGVV I ESLPNA TFRVYLOIIGC I VL TH 1 SGKi RRIIY IRIlP(iDRVKVELSPYDL TXGR ITYRlRIIKSSIIII :KE: :W:-II:: LIT:::: :GH:W:R:: :£DPI:GYV:: R:: :5S:::::::::: I:V: R: :S:R::: I::: :11: D::O :A: EOII: EMO: T: l: T::: TM::: E: E: :IIV: Til::: :: !-I: K: :::: :T: :/(: T: ::T::: :5::: :VF: S:
· . . MKlRASVRKICEIICRLIRRRRRIHVVCSN-PKHKQRQG ::::::: :P: ::K:: ::: ::G:: I: I:: :-:::::::: ::V::::K:L:R::KIVK:OGV:R: I: :AE::::::::
37 37 86.5% 38 62.2:t
Figure 11 (continued).
-58-
78 77 60.3:;: 72 56.41:
(Q)Ollfl84
Llva .. ort ORF184 . HllLQVDH I RVDFJ IGSRRI StlFeWAF I LLFGALGFfFVGFSSYLQKDLI PFLSAEQI L,jPQGIVHCFViaAGLFlSFVL
Euglena ORF149 11NLRDltIIIHTLSKtlEtHkIlKQr.QItILPKIL~QEIKElmKI i r.WfY-lI1VM~l~G I~~L 1~~1 ~~~I Glm~; Y~LD~SEr I ~F~Q!;h~c~~~TC~1 L~~I/IQ WCTlCWIlVGSGYHKFDKQKGI F51 FRWGFPGKIIRRI F IQFllKDIQSI RMEVQEGfLSRRVL VI Ki KGQPDIPLSR (EEYFTLREMEOKAAELARFLKVSI EG I ,184
.... ••• .. .. .. ......... 0-.. .... .. .. .. ..... '. .... . .. J SI ILNGVGEGYtlEFtIKEl/ILMTI YRKGKQGKt/SDltl,1 TY5LKD I VKE
(RJ 0llF203 . . Liverwort ORF203 (Exon I) HPIGVPKVPFRLPGEEDAVWI DV'-( intran)
::::::-:-:!:: ;::::: : :: SplMch X-gene (E~an 1) MPIGVPKVPFRSPGEEOASWVDV-( intran) 87.DZ
(5) ORF40 .. • • 0-
MIIllTTGRYPlWl IGTVAGI l YIGlVGIFFYGSYSGLGSSL ................ •••••• 0-0- ...... . : : : : : : :: ~:: = : : : HAIHGGR I PLlILVATVAGLAAIGVLGI fFYGGYSGLGSSI
40
4072.6%
149 ·38.3<:
Figure 11. Amino acid sequence alignments of ribosomal related proteins and subunit. of RNA polymerase. The amino ,acid sequences are .shown by one letter
codes. Identical amino acid residues are shpwn by colons, and deleted residues are shown by dashes. The amino acid residue numbers' and sequence'
homologies with liverwort gene products are indicated at the end of sequences.
-59-
were. observed between them. Another ribosomal protein gene cluster is localized
between' trnP-UGG and psbB as the order of trnp-'124bp-L33-27bp-S18-81 bp-L20~786bp-, .
S12( exonl),-72bp-ORF203-385bp-psbB. ' The rps12 g~ne (Exon 1) is .loca 1 ized 72 bp
downstream from TAG stop codon of ORF203. The rest of the exons are located far
apart on the different DNA strand indicating trans-splicing mechanism for the gene
expression (Fukuzawa et~. 1986). Detail discussion on the trans-split gene rps12
are ;n Chapter III. The coding sequences for ribosomal protein L2 and L16 are
interrupted by 545 bp and 534 bp group II introns, respectively (Fig. 4F).
DISCUSSION,
In the liverwo~t chloroplast genome, nucleotide seqtience rev~aled a large
cluster of genes coding for ribosomal and related proteins (trnI-L23~L2-S19-L22-S3-
L16~L14~S~-infA-secX-S11-rpoA) on the LSC region near the J LS• Upstream from the
gene for L23 ~ibosomal protein, there is an isoleucine tRNA(C*AU) gene whose
promoter highly functioned in 1. coli as well as in chloroplasts (Fukuzawa et ~.
1985). The length of the gene cluster from the 51 end of isoleucine tRNA gene to 31
end of rpoA gene is approximately 7.3 kb. No promoter sequence can not be found in
the short (less than 97 bp) spacers between ribosomal protein genes indicating that
the gene cluster may be transcribed into a single precursor RNA from the trnI to
rpoA genes. Furthermore. this cluster has similar order to the clusters reported in
the 1. coli ribosomal protein operons such as the S10 operon (Zurawski and Zurawski
1985 ). ~ operon (Cerretti et ~. 1983), and alpha operon (Bedwell et~. 1985)
(S10-L3-L4-L23-L2-S19-L22-S3-L16-L29-S17. L14-L24-LS-S14-S8-L6-L1B-SS-L30-secY-secX,
and S13-S11-S4-rpoA-L17, respectively). Two additional clusters of ribosomal
protein genes are seen in the orders of S12{Exon1)-L20 and S18-L33 between trnP-UGG
and psbB genes. The genes for S12 (from the 39th amino acid residue to C-terminal
end) and S7 ribosomal protein also have been clustered as seen in l. coli str-operon
(S12-S7-EFG-EFTu ' Post gl~. 1978, Post and Nomura 1980). In contrast, the
chloroplast rps2, rps4. rps14, rps15 and rp121 genes are scattered throughout the
-60-
liverwort chloroplast genome.
The rp12 genes of spinach and Nicotiana debneyi has been· located in the inverted
repeat regions •. In H. debneyi, but not in spinach, rp12 is interrupted by a 666 bp
intran (Zurawski et~. 1984)., However. liverwort rp12 gene is located on the LSC
region just outside of the IRB region and has 544 bp intron. Introns are also
present in the rpl16 gene of Spiradela oligorhiza (interrupted by 1411 bp intron)
(Posno et al. 1986). The first exonof genes for L16 ribosomal protein encodes for
the first three amino acid residues (Met-Leu-Ser) of N7terminal. It is interesting
that the products of ribosomal protein genes encoded in the chloroplast genome are
important components in the initial stage of ribosome and rRNA assembly (Dorne et
~. 1984).
A putative gene for ~ subunit of RNA polymerase is located at the 31 end of the
gene cluster for ribosomal proteins. Genes for ~ and ~' subunits of RNA polymerase
(rpoB, rpoCl and rpoC2) are located as a single operon at 50 kb apart from rpoA gene
. (Umesono et~. 1986). An ORF homologous to the i. coli sigma subunit of RNA,
polymerase is not found on the liverwort chloroplast genome.
II~ The putative gene ndh3 and unidentified open reading frames
, It is reported that the t1,. polymorpha chloroplast genome contained a set of
homologues of mammalian mitochondrial "URF" genes (Ohyama et~. 1986). In this
region, a gene named ndh3 corresponding to human mitochondrial URF3 was identified
by amino acid sequence comparison (Fig. 7K). The mitochondrial URF3 gene code for
the- component .of respi'ratory-chain NADH dehydrogenase complex (Chomyn et~. 1985).
An ORF120· (52877-52515) is located at the upstream region of psbG with an overlap of
seve~ nucleotides~ This ORF dose not contain an intron and the product (120 amino
acid residues, 14.2 kd) is similar in size to a human mitochondrial URF3 protein
(114 amino acid residues, Anderson et ~. 1981) sharing 30.8% homology. Actually
proteins of chloroplast ndh genes are not yet identified, but a 3' half of ndh3 gene
would be also conserved in maize chloroplast genome; the published sequence of maize
-61'-
psbG gene contains its 5' flanking region of 158 nucleotides (Steinmetz et ~.
1986). If twoG residues (1ocated at -68 and.-96 in their numbering) are deleted,
the region from -157 to +7 ·will encode a polypeptide similar to the last 54 amino
acids of the M. polymorpha ndh3 product (85.2% identical) as well as ~noverlapPing
to the downstream psbG gene. There has been no previous "report of the presence of
ndh ge~es in chloropla~t·genome. However, an NADH-plastoquinone-(PQ) oXidoredtictase
activity has been detected in the chloroplasts of Chlamydomonas reinhardii (Bennoun
1982), thus it is possible that this ORF encodes one of subunits of the NADH-PQ
oxidoreductase.
Fifteen significant ORFs, which do not show any homologies with previously
reported genes, were located on the sequenced region in this study (see Table 2).
Amino Acid sequences of two unidentified open reading frames in liverwort
chloroplast genome show significant homologies with those of unidentified frames
reported in other kinds of chloroplast genome. An ORF184 (59525-60079) shows 38.3%
local homology to the ORF149 located at the next to the gene for elongation factor
Tu of Euglena chloroplasts (Fig. 11Q). The Euglena ORF149 does not terminate of its
stop codon but follow intron sequence (Montandon and Stutz 1983). It is interesting
to compare the C-terminal region of liverwort ORF184 (position 128 to 184 amino acid
residue) with the corresponding Euglena ORF in the Exon 4 described by them. The
first exon (68640-68570) of the ORF203 shows 20 out of 23 amino acid identity with
an reading frame in spinach X-gene on the opposite strand of psbB gene as shown in
Fig. llR. It is reasonable to beHeve that the open reading frames conserved in two
kinds of chloroplasts would code .polypeptide having an unknown function. In
addition. an ORF40 (62916-62794) showed 72.6% homology with cyanella Cyanophora
paradoxa ORF40, which ;s not proved to have any function (Fig. 11S, Bryant personal
communication).
-62-'-
CHAPTER III The split gene for chloroplast ribosomal protein S12
Introns (intervening sequences) in a chloroplast RNA gene have been reported;
the 235 rRNA gene of Chlamydomonas reinhardii (Rochaix and Malone 1978, Rochaix et
~. 1985); the tRNA genes. trnI-GAU and trnA-UGC, in the 165-235 rDNA spacer region
of lea mays (Koch et~. 1981) and Nicotiana tabacum (Takaiwa and Sugiura 1982); as
well as the chloroplast tRNA genes trnL-UAA (Steinmetz et~. 1983a, Bonnard et ~.
1984), trnK-UUU (5ugita at ~. 1985), trnG-UCC (Deno et~. 1984a, Quigley and Weil
1985) and trnV-UAC (Deno et~. 1982. Krebbers et~. 1984. Zurawski and Clegg
1984). Introns within chloroplast protein genes also have been reported in several
genes of Euglena gracilis; for the large subunit of ribulose-1,5-bisphosphate
carboxylase/oxygenase (rbcL) (Koller et~. 1984), the elongation factor Tu (tufA)
(Montandon and Stutz 1983), and the 32-kd protefn (psbA) (Karabin et~. 1984.
Keller and Michel 1985). The gene for the 32-kd-protein of £. reinhardii also has
introns (Erickson et~. 1984) as does the- gene for the H+-ATP synthase subunit I
(atpF) of wheat (Bird et~. 1985). Zurawski et~. (1984) reported that the
chloroplast ribosomal protein l2 (rp12) in Nicotiana debneyi has a single intran.
Several genes in the chloroplast DNA from the liverwort. M. polymorpha- are shown to
have introns in their coding sequences (see Fig. 2 in. chapter II).
Hallick et~. (1985) and Fromm et~. (1986) reported that the reading frame of
the ribosomal protein 512 in li. tabacum is interrupted by two introns, but they
described only the second one. During nucleotide sequencings of the chloroplast DNA
from the liverwort, M .. polymorpha, however, the first exon with the 51 intron
boundary sequence was found on the opposite strand of the chloroplast DNA. In this
chapter, the complex structure of the putative gene for chloroplast ribosomal
protein 512 from the liverwort. M. polymorpha is presented. which has threeexons
split into different DNA strands. The mechanisms of the expression of this
unusually organized gene will be discussed.
-63-
Iff, /JlJlymorplla Ct - DNA t pSDB '" I rNll"
I"NZO
Figure 1. Locations of coding regions for chloroplast ribosomal protein S12
on physical maps of chloroplast DNA from a liverwort. M. polymorpha. Exon 1
(rps l2A) wa·siocated on the 8g 1 II fragment (8g5) i and the di recti on of its
transcription was clockwise indicated by an atrow. BY cohtrast, exons 2 and 3
(rps12B and C) ·were found. on the BamHI fragment (Ball), and their
transcription being in the opposite di:ection from that of exon 1 (counter
clockWise). The abbreviations rps12. rps7 ahd ~p120 are for genes of
riboso~~l prbtei~s S12~ S7 and L20. The site of the gerie for the l~rge
subunit of ribulose-l,S-bisphosphate carboxylase/oxygenase (rbcL) is shown.
IRA and IRg indicate a set of inverted repeats, and SSC and LSC indicate the
small single copy and large single copy regions. respectively.
-64-
MATERIALS AND METHODS
Chloroplast DNA fragments. the BamHI (Ball) and BglII (Bg5) fragments. were
cloned into the respective plasmids, pBR322 and pKC7, as described by Maniatis et
!l. (1982). A physical map of the chloroplast DNA for BamHI fragments has been
described previously (Ohyama et~. 1983, Umesono et~. 1984). The location of the
6g5 fragment on that map was determined by restriction analysis and Southern
hybridization (Fig, 1). Methods for the sequence determination are described in
chapter II.
RESULTS AND DISCUSSION
At first a coding region for ribosomal protein 512 on. the .BamHI fragment (Ball)
was identified using Southern hybridization with·an g. gracilis probe (provided by
Drs. Montandon and Stutz). The Ball fragment· was mapped at the junction (J lA)
between the inverted repeat (IRA) and the large single copy (lSC) region (Fig. 1).
DNA sequence analysis of the Ball fragment revealed that the coding sequence for the
ribosomal protein S12 wa~ found, however, the N-terminal 38 amino acids of the
protein was missing in this coding regi9n.
By amino acid homology search between g. coli 512 (Post et~. 1978) protein and
open reading frames deduced from the nucleotide sequence data files. the missing N
terminal 38 amino-acid sequence was foun~ on the BglII fragment ·(~g5) appro~imately
60 kb away on the opposite DNA strand (Fig. 1). ~omplete nucleotide sequences of
the coding regions for ribosomal protein 512; rps12A -and rps12B-C, including the
fl an king regi ons, are -shown in F; g. 2. Exons 1: and 2 were fo 11 owed by a consensus
sequence (GTGCG) of the 51 boundary regions found in fungal mitochondrial gr:oup II
introns and g. gracilis introns (Michel ·and Dujon 1983). \ In addition,· much conserved
sequences, RAGCCG.AUGAA •• GAAA •• UUCAUGU.CGGUUY, were found in'the 1ntrons 75
nucleotides upstream from exon 2 and 61 nucleotides upstream from exon 3. This
consensus sequence has been present in all the introns found so far in chloroplast
genes of ~. gracilis (Keller and Michel 1985). The secondary structures of introns
-65-
A ORF~03 (Exon3)
mATGTCAGcAAMGMGCAAMCTTTATGGTAtTGTAGAmAGTTCCTATAGMMCAATTCTACTATTMAAATTi.crITTTAAACMAMMTTITATTTCTTAicGTTACGnr 120 F ~ SilK E A K L Y G ! V D L v· A I t N N S T I K 11 +------>(- ---+
rps12A .(t.onl) .
ATCCMACTMMMTmGCATATMGrTA TGCCTACTATTCMCAATTMTTACAMTMMCACMCCCATCci.MATMMcAAMTCACCAGCCCTTAMC(;ATGCCCTCM 240 ? TIL I R UK R PIE U R T K SPA L K G CPO
CGTMACGAGTATCTACTAGAGTGTA TGCGACTTCTTrMATCAJ.MilccTTMAAATTTMACATcAAAATTCCATMMATTTTTTTATTTTMTAACGTAMGATATAGTAICTA 360 R R- G V C T R V Y U~Y9 (S' 'nt.-on) , f-' (----T _,
TTGTTGTTTAcATAtAATTTATAGTTTCCTTTGGTGCAMTCCMTCATCTTMCmAGcATAGAAAACCATTTCTcr.AAGGGTAGCGACTGATTCTcAATCCCTTMGCCAGMATrT qeo (--I-
TATTMAAAATTTTTCCATAATATMTArTACTTATATMCCGTMMACGAMCTGMCGGTCAGCTATrCACCGMCCTTCAAATMCATCCCGTTMTTMTMAMMACMTTrT 600 +->
TGAMcrniTTTAGTGTrTCATTMTMAAMAGCTTcAMTCAGAMTrATA(;AMTAACTGATATTATCMTATATATTATATATTACMGCTTCGGTATATACMAGGACCTATTC 720 +-----:.- <-----t
. Gr:.GMGGAGAMCTATAci.M~GGAi.TGCATMTTmCTTAGTTAMGGTCCTATcci:TTMTT~CTMGMGGTATCATACCTAMMMTTATTATTMGcAAGCTiATI,GTA 840 +--> <--I-
GCMATCCmTGGTArnTrTTTTTATACAIMGMMCGMGAMTrTTTTATAGcMTCTMGAAAATMMTMAACTTTTTTArTAlAAMATTGIAGATTATAGIMGCMACT 960 +- --> (---+
GCMTMMAMIATTTATTGMMICGATGTTTTGATATM.MAMTACACACACAcr.AATTTTTGMTMTTMMCGAGTATATAcAGCMTGACTAGAGTTMACGTGGTTATGTA lOBO +-> (_. -~ rp120> H T R V K R G Y V
GCACGMMCGGCGTAMMTATTCTTACGCTTACATCTGGATTTCMGGMCTCATTCGMACTTTTThGMCTGCTMTCMCMGcM.TGAGAGCATTAGCATCATCTCATCGCGAT 1200 ARK R R· KilT L T l T S G F Q G T II 5 K L F R T A II Q Q G ~ R A LAS 5 )I R D
AGACGTMACcMMAGAAATCTTAGACGTrrATGGATTACTCGAGTTMTGCAGCCGcM.GAGATMTGGMmCCTATMTMATTMTTGMTArTTATAT~TTCTT 1320 RGKRKRIILRRLWITRVIIAAARDIIGISYIIKLltYLYKKKIL
TTMATAGllAAMrrcTAGCTCAMTAGCTATATTAGATAMmTGTrTTTCGACMTMTTAMMTATTATTACAcAATAMMMACCTtTCCGG~TAATTAMMMTAMncc 1440 L II R K I L A Q 1 A I L 0 K fer S Til K II I ! T E --. _. __
GGGGAGGTTiMTAMAAAACTGTATATTAATAMTMTGATMATTMTmCGTTArTTAMAMGGTMCMAGCTMAACACGAGCTCGTTTMnGCTATACTTAATMACGTTG 1560 - ---IIEIIIIlFl'LLALVRARKIAITLLRQ
<rp.ll1
Figure 2. Complete nucleotide sequences near exon 1 (A). and exons 2 and 3
(B) of the rps12 gene. The exons of rps12 gene are boxed, and the consensus
sequences of their 5' intron boundary regions (gugyg, y represents C or T
residues), as well as tho'se of the near 3' intron boundary regions (ragccg
augaa-gaaa--uucaugu-cgguuy. and cuayy(-)y-ay, r represents A or G residues)
are shown under the nucleo£ide sequences. J LA stands for a junctibn of ~n
inverted repeat (IRA) and a large single copy region (LSC). Amino acids are
expr~ssed as one letter symbols under the nucleotide sequences. Possible stem
structure for the transcription termination are shown by bold arrows.
-66:-
B Inyorl~d Rep •• t -_ Lorgo Singlo Copy
J£,A
TCAMTHTiATGTTMMW.TACATATi.GMGMMAMGMAAATAATTGmGAAiTTllGAMTAAMTGTTATAMCTA~~TCA1rTGMCGACMGCCGTATGMATGAAAA TTAAGA,. TATAAT> -----l"'o!I.9ccog-augbb-g1l4'O-
ep.IZB (EXOH 2) (3' loteM)
TATCAAGTA(GGTTTTGTAMGTGACAATTrAGGTMCTiAITTGTCAACTTTTC CTACAACAcCAAivv.AACCMACTCTGCCTTACGMAAATAGClW.cTTAGACTMCClCTG -uut.U?U-C99uuY---------------cu.yy-y-. 'T T P K K P N 5 A L R X 1 h R V R L T S G
GATTTGAMTTACTGCATATATTCCAGGTATTGGCCATAAITTGCAAG ..... CATTCAGTTGnTTGGTMGr.GGAGGAAGGGTCAMGATTTACCTGGTCTMGATATU.TATTATTAGAG F E I T A YIP GIG II H l Q [ II 5 V V L VR G G R V K 0 L ~ G V R Y Il I I ~ G
GMtACTGGATGCTGTAGcAGTAAMGATCGTCAACAAGGGCGTTCT iGCGTTGTATATTATMTCTAHMMTGTATCATTTTAGATACCTMrri"TTGCTGATAATATGTMAA o A V G V K 0 R G R S ugyg-(Inlrcn) ----------------
MTAGCfMCCAGTGATTAAMTTTACArTTTMAACGci..w.A.MGCAGcCfATATGTArATMMTMMTAMATATTATCTATArTATATACTATACAATATCT"cGGTTTTATTT ~------------(I"tron)
ATAGTTAAAATAMAATTTMcTTTTcccTTAcnmMTTCAMATMMMAAnrTACTTHTTAGAACAAGTTMMTAAATAGCAA.w.TAAAMt..ATHATTTTTATACMTA -------------------(Inlron)
TTTTTATAMTAMCCTMGGAHTTTTATTTMCGATTATAhMTACMGATTTCCAATAGTAMACACTGGAAACCcATACTCMTTAAAAGTGAGTWCATCMTAAMTTMAcA -----( Inleoo)
rps12C (UOII 3) .. .. .. .. • • .. .. 4 • • •
120
240
350
460
600
720
B40
ATGTMMAGCCGTATICGTTGMAATAGGATGTACGGTTTGGIIGGGAGATMMAMTCCACCCTAC TATGGAGTMMMfHCAAN AMmMMTMCTCTTMATMMAA 950 ---ragc-cg-ZlIugo!4-gl!Ula.-UUC5ugu-cgguuy cuayyy-a Y G V :K K S ~
ATTAACTTTMTIAmArTATTATGTCACGTMAAGTAnGCAGAAAAACAAGTTGcMMCCTGATCCAATATATCGGMTCGATTAGTTMTATCrTAGTTMTCGTATTTTAMM 1080 ~a. II 5 R K S I A E K Q ~ A K POP I Y R I/R LVI/ II LVI/ R ILK II
ATG~TCATTAGCrTATCGGATTi:mATAAAGi:TATGMAMTATAAMCM.MAAr:AJi.AAAAJw.TcCATTAmGTATTACGTCMGCAGTTCGMMGTMCTCClMCG 1200 G K K S LAY R I l Y K A 1\ 'K 1/ I K Q K T K K 1/ P L r V L R Q A Y R K V T PI/V
• .. • • 0- • .. • • • • ..
TCACAGTCAMGCMGACCCATCGATGGATCCACTTATCAAGTTCCACTAGAMTTAMTCTACACAAGGAMGGCATTAGCCATTCGTTGGCTATTAGGAGCCTCACGGAAACGCTCAG 1320 lYKARRIDGSTYQVPlEIKSTQGKALAJRlIllGASRKRSG
.. .. .. .. .. .. .. .. . .. .. . GTCMMTATGGCTTTTAMCTTAGTTATGMTTMTTGACGCAGCCAGAGATAATGGAATTGCTATICGTMAAAAGMGAAACICATMAATGGCAGMGCTMTAGAGCTTTTGCTC 1440
Q II 1\ ArK L S Y ELIDA II R 0 1/ G I A IRK K E E T H K H A E A 1/ R A f A H
AmlCGTTAAATMAAACGTATAMTTArAMAAAAu.ATTTTTATTGrATIGAAATATGCTITMTAmmATTAnACAMTATnCAATACMTMAAATTCrTTTAGTTTTTT 1560 r R .... __ __ _ .. ---------
Figure 2 (continued).
-67-:-:
U G
A A A A G A
C V A G A U
"")~ " A A A
A A U G G C AAGCCGUAU A
~~U~G~~~ GA
u G U U G G U A A A G U G A C A A" U A U G
C G U G
: = : ; ~ :: :. -: t: ~. A U U U U CAe U G U U A U A" C A U V C C -
5'-" --ACUAGAGUGUAU,," ThrArgValTyr
."..., ACUACMCACCA--3' TyrThrThrPro
" £Xon 1 " Exon 2
A A U A A A U
U A A A U A U A " A G A A
A : U A C C : G C : G U:A V:A AA C:G A:U G A A:U C:G U A U:AAA:U U V A AGe G A C G
"CA G A V G Ace AAGCCGUAU A
/. ~ ; ~ ~ ~""~ D t u ~ ~ b l ~ G U U ACCUUU G
G AU " ~ ~GGGAGAUA\
C~~~~~~" ~CCCACCU\/ G ""C-C ."..., MUAUGGAGUA-3' G ysTyrGlyVal U G E.cn 3
5' -CMGGGCGUUCUA~ "GlnGlyArgScrL
Ellon 2
Figure 3. Possibly secondary structures of introns in rps12 gene. Intron
exon boundaries are shown by arrow heads. Two distinct RNA "molecules
containing exonl and exon2 are placed close together according to group II
introns specific stem-loop structure. Amino acid sequence are shown under the
RNA sequences by three letter.codes.
-68~
A ORF203 rps12A rp120
psbB
1" i • • .c 1 ..
8gl11 (8g 5 ) B9111
B JLA rps 12B-C rps7
168 trnV . ! . .. f • ( I~
1 kb -BamHI (Ba11 ) BamHI
Figure 4. Gene organizations near the rps12 gene in exon 1 (A) and exons2
and 3 (B). Arrows indicate the direction of transcription. The abbreviations
rps12, rps7 and rp120 are the same as in Fig. 1. The symbol, psbB, represents
the gene for the ,P680 chlorophyll ~ apoprotein in photosystem rI. JLA stands
for a junction of an inverted repeat (IRA) and a large single copy region
(LSC) •
-69-
E'. 9n.a ci.Li.-I>
('I, po.lYllloilpha
E; coli.
E.. 91laei..li.-I>
('I. polY11lon.pha
E. coli.
'10 '20 30 40 50 60 + + + ... + +
MPTLEHLTRSPRKKIKRKTKSPALKGCPQKRAICMRVYTTTPKKPNSALRKVTRVRLSSG *** * * * * *********** * * *~*********** **** ** MPTIQQLIRNKRQPIENRTKSPALKGCPQRRGVCTRVY~TTPKKPNSALRKIARVRLTSG
* * ** * * *** *** ********************* ***** * MATVNQLVRKPRARKVAKSNVPALEACPQKRGVCTRVYT~PKKPNSALRKVCRVRLTNG
70 80 90 100 110 120 + + + + + +
LEVTAYIPGIGHNLQEHSVVLIRGGRVKDLPGVKYHVIRGCLDAASVKNRKNARSKYGVKKPKPK * ****************** *********** ** *** *** ** * ******** * --FEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYHIIRGTLDAVGVKDRQQGRSKYGVKKSK
** * ** * ********* * ************** ** ** ***** * ******* * FEVTSYIGGEGHNLQEHSVILIRGGRVKDLPGVRYHYVRGALDCSGVKDRKQARSKYGVKRPKA
Figure 5. Amino acid sequences for the ribosomal protein S12 from M. polymorpha compared with· those from~. coli and~. gracilis. Vertical arrow
heads indicate sites of splicing junctions in the ribosomal protein S12 from
M. polymorpha. Asterisks denote amino acids that are identical between the
two proteins. Amino acids are expressed as one letter symbols.
-70-
between exons are shown in Fi.g. 3 •. Splice Junctions are indicated by arroW.heads
deduced from·the conserved sequences and secondary structures. The putative
branching nucleotide (adenine) is shown by underline in Fig. 3.
The gene organizations deduced from DNA sequences nea~ coding regions for
ribosomal protein S12 are shown in Fig. 4. ORFs corresponding to the ribosomal
proteins 57 and L20 were identified by their amino acid sequence homologies, 42.6%
and 46.6%, for the respective ~. coli ribosomal proteins (Post and Nomura 1980,
Hittmann and Seib 1979). In case of the ribosomal protein 514 of ]1. po1ymorpha
amino acid homology was 45% to that of ~. coli (Umesono et~. 1984). By contrast,
the amino acid sequence of chloroplast ribosomal protein 512 from]1. po1ymorpha
showed markedly higher homologies, 73.6% to that of ~. gracilis (Montandon and 5tutz
1984) and 70.2% to that of ~. coli (Post et ~. ·1978) (Fig. 5). The amino acid
sequence of the ribosomal protein 512 from]1. polymorpha near the splice junctions
(arrow heads in Fig~ 5) showed ~n even higher homology to sequences from~. gracilis
and ~. coli. both of which have no intron. This highly conserved amino acid.
sequence suggests that the chloroplast ribosomal protein 512 may have an .essential
role in the ribosomal function during protein synthesis in chloroplasts and that
rps12 gene may have a regulating function in the coordinative biogenesis of
chloroplast ribosome.
The ORF203, which was interrupted by two introns (518 bp and 380 bp), was
detected further upstream from exon 1 of rps12 gene (Fig. 3A). An reading frame
from the ATG codon (67372) to the TAG stop codon (67130) in the third exon of ORF203
was previously called ORF80 because of assuming no intron. Although the exon-i~tron
boundary sequences of ORF203 are a little diverged from the consensus sequences,
introns are correctly excised from the precursor messenger RNA including rps12A ~nd
rp120 coding sequences (Kohchi and Umes.ono unpublished dat~).. In a.ddition. first
exon of ORF203 (68640-68570) shows 20 out of 23 amino acid residues identity (87.0%)
with an reading frame in spinach X-gene next to psbB gene, which is shown to be
transcribed in vivo.
-71-:--
Exon 1 of ribosomal protein' 512 with the 5' intron boundary sequence .was
followed by a coding sequence of ribosomal protein L20(Fig. 2A) 786 bpapart.
Following the coding re~ion' of ribo~omal protein 512 in exon 3 isa ribosomal
protein S7 coding region (Fig. 2B). Close linkage of ribosomal protein S7 and S12
genes also exists in I. gracilis' (Montandon and Stutz 1983) and I. coli str-operon
(Post et ~. 1978, Post and Nomura 1980).
Transcription fo~ e~oris 2 and 3t as well as for the ribosomalprotein·S7 gene,
is initiated by a tyj:dcal'prokaryotic promoter sequence (...:35 and -10 regions) found
upstream (Fig. 28). S1 mappings showed that this promoter was highly active in
chloroplasts as well as in' I. coli (Fukuzawa et~. 1985). Northern hybridizations
also' showed the active transcrlption for exon 1. If the split gene described here
provides active mRNA that ca~ld be translated ihto mature S12 protein, there must be
a rejoining of exons at the RNA or DNA level. The DNA r~arrangement of these coding
regions is not observed in chloroplast genome by restriction analysis. Results of
Sl mappings and Northern hybridizations suggest that transcription units for exons 2
and 3 are independent of the unit for exon 1. Therefore, active mRNA for ribosomal
protein 512 was thought to be formed post-transcriptionally by a mechanism such as
that of trans-sp 1; cing descri bed by 501 nick (1985) and Konarsk et al. (1985). An
further investigation of the transcription and splicing mechanisms for the split
gene rps12 would clarify this complex gene organization and expression ·in the
chloro·plast.
After the publication 6f these results, the nucleotide sequences of the tobacco
ch 1 orop 1 ast ri bosoma'l protei n S12 was reported to be trans-sp 1 it as in the case of
liverwort (Torazawa et ll. 1986). This suggest that trans-split: rps12 ·gene on the
chloroplast genome may be a general feature 'in plant chloroplasts and has some
regulatory;function in chloroplast biogenesis •.
-72-
REFERENCES-
Aiba H. Adhya S. Crombrugghe B (1981) Evidence for two functional ~prom~ters in intact Escherichia coli cells. J Biol Chem 256:11905-11910.
Aldrich J, Cherney B, MerJin E, Williams C, M~ts L (1985) Recombination within
the inverted repeat sequences of the Chlamydomonas reinhardij chloroplast
genome produces two orientation isomers. Curr Genet 9:233-238 Alt J, Westhoff P, Sears BB, Nelson N. Hurt E, Hauska G,. Herrmann RG (1983)
Genes and transcripts for the polypeptides of the cytochromeb-f complex
from spinach thylakoid membranes. EMBO J 2:979-986
Alt J, Winter P, Sebald W, Moser JG, S~hedel R, Westhoff P, Herrmann RG (1983)
Localization and nucleotide sequence of the gene for the ATP synthase
proteolipid subunit on the spinach plastid chromosome. Curr Genet 7:129-138
A1t J, Herrmann RG (1984) Nucleotide sequence of the gene for pre-apocytochrome
f in the spinach plastid chromosome. Curr Genet 8:551-557
Alt J, Morris J, Westhoff P, Herrmann RG (1984) Nucleotide sequence of the
clustered genes for the 44 kd chlorophyll,£ apoprotein and the "32 kd"-like
protein of the photosystem II reaction center in the spinach plastid
chromosome. Curr Genet 8:597-606
Anderson S, Bankier AT, Barrell BG, Bruiji MHl, Coulson AR, Drouin J, Eperon
IC, Nierlich DP, Roe BA, Sanger F, Shreier PH, Smith AJH, Staden R, Young
IG (1981) Sequence and organization ·of the human mitochondrial genome.
Nature 290:457-465
Bartsch M, Kimura M, Subramanian AR (1982) Purification, primary structure, and
homology relationship of a chloroplast ribosomai protei~. Proc Natl Acad Sci USA 79:6871-6875
Bartsch M (1985) Correlation of chloroplast and bacterial ribosomal proteins by
cross-reactions of antibodies specific to purified Escherichia coli
ribosomal proteins. J Biol Chern 260:237-241
Bedbrook JR, Kolodner R, Bogorad L (1976) Endonuclease re~ognition sites mapped
on Zea mays chloroplast DNA. Proc Natl Acad Sci USA 73:4309-4312
Bedwell D, Davis G, Gosink M, Post L, Nomura M, Kestler,H, Zengel JM, Lindahl L
(1985) ~ucleotide sequence of the alpha ribosomal protein operon of
Escherichia coli Nucl Acids Res 13:3891-3903
Bennoun P (1982) Evidence for a respiratory chain in the chloroplast. Proc Natl
Acad Sci 79:4352-4356
Biggin MD, Gibson TJ, Hong GF (.1983) Buffer gradient gels and 35S ·label as an . '
-73-
aid to rapid DNA sequence determination. Proc Natl Acad Sci USA 80:3963-3965
Bird CR, Koller B, Auffret AD, Huttly AK, Howe ,CJ, Dyer TA, Gray JC (1985) The whea~ chloropia~t gene for CFo subunit I 6f ATP synthase contain~ a large
intron. EMBO· J 4:1381-1388 Bannard G, Michel F, Weil JH, Steinmetz A (1984) Nucleotide sequence of the
split tRNALeu(UAA) gene from Vicia faba chloroplasts: evidence for
structural homologies of the chloroplast tRNALeu intron with the'intron
from the autosplicable Tetrahymena ribosomal RNA precursor. Mol Gen Genet
194:330-336 Briat JF, Mache R (1980) Properties and characterization of a spinach chloroplast
RNA polymerase isolated from a transcriptionally active DNA-protein. Eur J
Bi ochem 111: 503 Calaghan JL, Pirtle RM, Pirtle IL, Kashdan MA, Vreman, HJ, Dudock BS (1980)
Homology between chloroplast a.nd prokaryotic initiator tRNA, nucleotide
sequence of spinach chloroplast methionine initiator tRNA. J Biol Chern
255:9981'-9984 Canaday J, Guillemaut P, Gloeckler R, Weil JH (1981) The nucleotide sequence of
spinach chloropla~t tryptophan transfer RNA. Nucl Acids Res 9:47-53
Casadaban MJ, Chou J, Cohen SN (1980) In vitro gene fusions that join an
enzymatically active (3-galactosidase segment to amino-terminal fragments of
exogenous proteins: Escherichia coli plasmid vectors for the detection and
cloning of translational initiation signals. J Bacteriol 143:971-980
Casadaban MJ, Cohen SN (1980) Analysis of gene control signals by DNA fusion
and cloning in Escherichia coli. J Mol Biol 138:179-207
Carrillo N, Seyer P, Tyagi A, Herrmann RG (1986) Cytochrome Q-559 genes from
Oenothera hookeri and Nicotiana tabacum shows a remarkably high degree of
conservation as compared to spinach. Curr Genet 10:619~624
Cerretti DP, Dean 0, Davis GR,Bedwell DM, Nomura M (1983) The ~ ribosomal
protein operon of Escherichia coli: sequence and cotranscription of the
ribosomal protein genes and a protein export gene. Nucl Acids Res 11: 2599-2616
Chomyn A, Mariottini p, Cleeter MWJ, Ragan CI, Matsuno-Yagi A, Hatefi Y,
Doolittle RF, Attardi G (1985) Six unidentified reading frames of human
mitochondrial DNA encode componen'ts of the respiratory-chain NADH
dehydrogenase. Nature 314:592-597
Cozens AL, Walker JE, Phillips AL, Huttly AK, Gray JC (1986) A sixth subunit of
ATP synthase, an Fo component, is encoded in the pea chloroplast genome.
-74-
EMBO J 5: 217-222
Crouse E J" Schmitt J M, Bohnert H J (1985) Chloroplast and cyanobacterial genomes, genes, and RNAs: a compilation. Pant Mol Biol Rep,3:43-89
Curtis SE, Haselkorn'R (1983) rsolation and sequence of the gene for the large subunit of ribulose l,5-bisphosphate carboxylase from the cyanobacterium
Anabaena 7120., Proc Natl Acad Sci USA 80: 1835-1839
Curtis SE, Haselkorn R (1984) Isolation, sequence and expression of two members
of the 32 Kd thylakoid membrane protein gene family from cyanobacterium Anabaena 7120. Plant Mol Biol 3:249-258
Deininger PL (1983) Random subcloning of sonicated DNA: application to shotgun
DNA sequence analysis. Anal Biochem 129:216-223
Deno H, Kato K, Shinozaki K, Sugiura M (1982) Nucleotide sequences of tobacco chloroplast genes for elongator tRNAMet and tRNAValCUAC): the tRNAVal(UAC)
gene contains a long intron. Nucl Acids Res 10:7511-7520
Deno H, Shinozaki K, Sugiura M (1983) Nucleotide sequence of tobacco
chloroplast gene for the ~ subunit of proton-translocating ATPase. Nucl
Acids Res 11:2185-2191
oeno H, Shinozaki K. Sugiura M (1984) Structure and transcription pattern of a
tobacco chloroplast gene coding for subunit III of proton-translocating ATPase. Gene 32:195~201
Deno H. Sugiura M (1984) Chloroplast tRNAGly gene contains a long intron in the
D stem: Nucleotide sequence of tobacco chloroplast gene for tRNAG1y(UCC)
and tRNAAr9CUCU). Proc Natl Acad Sci USA 81.:405-408
Dorne AM. Lescure AM, Mache R (1984) Site of synthesis of spinach chloroplast
ribosomal proteins and formation of incomplete ribosomal particles in
isolated chloroplasts. Plant Mol Bio1 3:83-90
Dron M.Rahire M, Rochaix JD (1982) Sequence of the chloroplast DNA region of
Chlamydomonas reinhardii containing the gene of the large subunit of
ribulose bisphosphate carboxylase and parts of its flanking genes. J Mol
8iol 162:775-793
Ellis RJ (1981) Ch16roplast proteins: synthesis. transport, and assembly. Ann
Rev Plant Physiol 32:111-37
Erickson JM, Rahire M, Rochaix JD (1984) Chlamydomonas reinhard;i gene for the
32000 mol. wt. protein of ,photosystem II contains four large introns and is
located entirely within the chloroplast inverted repeat. EMBO J 3:2753-2762
Erion JL (1985) Characterization of the mRNA transcripts of the maiZe,
ribulose-l,5-bisphosphate carboxylase, large subunit gene. Plant Mol Biol
-75-
4:169-179 Filho EJ," Haitley MR, M~ch~ R (1981) Pea ,chloroplast ribo~o~~l prot~ins:
charatterization and ~ite of synthesii.Mol Gen Genet 184:484-488
Fish LE, Kuc"k U, 80gorad L (1985) Two partially homologous adjacent light"inducible maize chloroplast genes encoding polypeptides of the P700
ch 1 orophyll ~-prote; n complex of photosystem 1. J 8i 01 Chern 260: 1413-1421
Francls'MA, Dudock as (1982) Nucleotide sequence of a spinach chloroplast
isoleucinetRNA. J Biol Chern 257:11195-11198 Francis M, Kashdan M, Sprouse H, Otis L, Dudock-B,(1982) Nucleotide sequence of
a spinach chloroplast proline tRNA. Nucl Acids Res 10:2755-2758
Fromm H, Edelman M, Koller 8, Goloubinoff P, Galun E (1986) The enigma of the
gene coding for ribosomal protein S12 in the chloroplasts of Nicotiana.
Nucl Acids Res 14:883-898
Fukuzawa H, Uchida V, YamanoY, Ohyama K, Komano T (1985) Molecular cloning of
promoters functional in Escherichia coli from chloroplast DNA of a
liverwort, Marchantia eolymorpha. Agric Biol Chern 49:2725-2731
Fukuzawa H. Koheh; T, Shirai H, Ohyama X, Umesono K. Inokuchi H. 'Ozeki H (1986)
Coding sequences for chloroplast ribosomal protein S12 from a liverwort,
March-anti a pol ymorpha, are separated- far apart on the different DNA
strands. FE8S Lett 198:11-15
Gantt JS, Key JL (1986)I~01ation of nuclear encoded plastid ribosomal p~otein
cDNAs. Mol Gene Genet 202:186-193
Gatenby AA, Castleton JA. Saul MW (1981) Expression in .h coli of maize and
wheat chloroplast genes: for large subunit of ribulose bisphosphate
carboxylase. Nature 291:117-121
Gingrich JC, Hallick RB (1985) the Euglena gracilis chloropla.t ribulose-1,5-
bisphosphate carboxylase gkne. I. Complete DNA sequence and analysiS of the
nine intervening sequences,' J B;01 Chem 260:16156-16161
Goloubinoff P. Edelman M,Hallick RB (1984) Chloroplast-coded atrazine '
resistance in Solanum nigrum: psbA loci from susceptible and resistant
biotypes are isogenic except for a single codon change. Nucl Acids Res 12:9489-9496
Graves MC, SpfemulH LL (1983) Activity of Euglena graciliS chloroplast
ri t:iosomes with 'procaryotic and eucari oti c i ni ti at; on factors • Arch Bi ochem - Biophys 222:;92
Greenberg BM, Narita JO, Flaherty CD, Gruissem \~,' Rushlow KAt Hallick RB (1984),
" Evidence for two RNA polymerase activities in Euglenagraci1is chloroplasts.
--'-76-
J Biol Chem259:14880
Gruissem W, Green'berg BM, Zurawski G, Prescott D, Hallie!< RB (1983) Biosynthesis of chloroplast transfer RNA in a spinach chloroplast
transcription system. Cell 35:815-828 Gruissem ~I, Zurawski G (1985) Ahaly'sis of promoter regions for the spinach
ch10roplast rbcL, atRB and psbA, genes. EMBO J 4:3375-3383
Gruissem W, Zurawski G (1985) Identification and mutational analysis of the promoter for a spinach chloroplast transfer RNA'gene. EMBO J 4:1637-1644
Hallick RB, Hollingseorth MJ, Nickoloff JA (1984) Transfer RNA genes of Euglena
gracili~chloropl~st DNA. A revew. Plant Mol 8iol 3:169-175
Hal1ick RB, Gingrich JC, Johanningmeier U, Pass avant CW (1985) Introns in
Euglena and Nicotiana chloroplast protein genes. in: Molecular form and'
function of the plant genome. PleniJm, New York, pp211-220
Hauska G (1985) Organization and function of cytochrome b6f/bc1 complexes: in Molecular biology of the photosynthetic apparatus. Cold Spring Harbor Lab.
New York
Heinemeyer W, A1t J, Herrmann RG (1984) Nucleotide sequence of the clustered
genes for apocytochrome J16 and subunit 4 of the cytochrome b/f complex in
the spinach plastid chromosome. Curr Genet 8:~43-549
Hellmund 0, Metzlaff M, Serfling E (19'84) A transfer RNAArg gene of Pelargoniuin
chloroplasts, but not a 5S RNA gene~ is efficiently transcribed after
injection into Xenopus oocyte nuclei. Nuc Acids Res 12:8253-8268
Hennig J, Herrmann RG (1986) Chloroplast ATP synthase of spinach contains nine
nonidentical subunit species, six of which are encoded by plastid
chromosomes in two operons in a phylogenetically conserved arrangement. Mol
Gen Genet 203:117-128
Herrmann RG, Alt J, Schiller B, Widger WR, Cramer ~IA (1984) Nucleotide sequence
of the gene, for apocytochrome Q-559 on the spinach plastid chromosome:
, implications for tne structure of 'the membrane protein. FEBS lett 176:239-244
Higgins CF, HilesID, Hhalley K, Jainieson OJ (1985) Nucleotide binding by
membrane components of bacterial periplasmic binding protein-dependent
transport 'systems. EMBO J 4: 1033-1040
Hird SM, Willey DL, Dyer TA, Gray JC (1986) Location and nucleotide sequence of • • j,
the gene for cytochrome Q-559 ;n wheat chloroplast DNA. Mol Gen Genet
203:95-100
Hirschberg' J, McIntosh L-(1983) Molecular basis of ' herbicide resistance iri
Amaranthus hybridus.' Science 222:1346-1349
-77-
Holshuh K, Bottomley W, Whitfeld PR (1984) Structure of the spinach chloroplast genes for the 02 and 44kd reaction-centre proteins of photosystem II and
tRNASer(UGA). Nucl Acids Res 12:8819-8834 Howe CJ, Auffret AD, Doherty A, Bowman CM, Dyer TA, Gray JC (1982) Location and
nucleotide sequence of the gene for the proton-translocating subunit of wheat chloroplast ATP synthase. Proc Natl Acad Sci USA 79:6903-6907
Hu N. Messing J The making of strand-specific probes. Gene 17:271-277
Karabin GD, Farley M, Hallick RB (1984) Chloroplast gene for MR 32000 polypeptide of photosystem II in Euglena gracilis is interrupted by four
introns with conserved. boundary sequences. Nucl Acids Res 12:5801-5812
Kashdan MA, Dudock B (1982) The gene for a spinach chloroplast isoleucine tRNA
has a methionine anticodon. J Biol Chern 257:11191-11194
Kato A, Takaiwa F, Shinozaki K, Sugiura M (1985) Location and nucleotide sequence of the gene for tobacco chloroplast tRNAArg(ACG) and tRNALeu(UAG).
Curr Genet 9:405-409 Keller M, Michel F (1985) The introns of the Euglena gracilis chloroplast gene
which codes for the 32-kDa protein of photosystem II. Evidence for
structural homologies with class II introns. FE8S lett 179:69-73
Keller M,. Stutz E (1984) Structure of the Euglena gracilis chloroplast gene (Q§QA)
coding for the 32 Kd protein of photosystem II. FEBS Lett 175:173-177
Keus RJ, Starn NJ, Zwiers T, Heij HT, Groot GSP (1984) The nucleotide sequence of the genes for tRNAArgUCU, tRNAAr9ACG and tRNAAsnGUU on Spirodela oligorhiza
chloroplast DNA. Nucl Acids Res 12:5639-5646
Kidd GH, Bogorad L (1979) Peptide maps comparing subunits of maize chloroplast
and type II nuclear DNA-dependent RNA polymerases. Proc Natl Acad Sci USA
76:4890-4892
Kirsch W, Seyer P, Herrmann RG (1986) Nucleotide sequence of the cluster genes
for two P700 chlorophyll '~ apoproteins of the photosystem I reaction center
and the. ribos,omal protein S14 of the spinach plastid chromosome. Curr Genet 10:843-855
Koch W, Edwards K, Kassel H (1981) Sequencing of the 16S-23S spacer in
a ribosomal RNA operon of Zea mays chloroplast DNA reveal~ two split tRNA genes. Cell 25:203-213
Kohchi T, Shirai H, Fukuzawa H. Sana T, Sano S, Ohyama K, Umesono K, Ozeki H
(1986) Structure and organization of Marchantia polymorpha chloroplast
genome. (IV) Inverted repeat and small single copy region including frx and ndh genes. In preparation.
-78-
Koller B, Gingrich JC, Stiegler GL, Farley MA, Delius H, Hallick RB .(1984) Nine i ntrons wi th conserved boundary sequences in the Euglena gracil is chloroplast ribulose1,5-bisphosphate carboxylase gene. Cell 36:545~553
Konarska MM, Padgett RA, Sharp PA (1985) Trans splicing of mRNA precursors in vitro. Cell 42:165-171
Kong XF, Lovett PS, Kung SD (1984) The Nicotiana chloroplast genome. IX. Identification of regions active as procaryotic promoters in Escherichia coli Gene 31:23-30
Kozak M (1983) Comparison of initiation of protein synthesis in procaryotes, eucaryotes, and organelles. Microbiol Rev 47:1-45
Krebbers ET, Larrinua M, McIntosh L, Bogorad L (1982) The maize chloroplast
genes for the ~ and ~subunits of the photosynthetic coupling factor CF1 are fused. Nucl Acids Res 10:4985-5002
Krebbers E, Steinmetz A, Bogorad L (1984) DNA sequences for the Zea mays tRNA
genes tV-UAC and tS-UGA: tV-UAC contains a large intron. Plant Mol Biol 3: 13-20
Langridge P (1981) Synthesis of the large subunit of spinach ribulose bisphosphate carboxylase may involve a precursor polypeptide. FEBSlett 123:85-89
Lerbs S, Brautigam E, Parthier B (1985) Polypeptides of DNA-dependent RNA
polymerase of spinach chloroplasts: characterization by antibody-linked
polymerase assay and determination of sites of synthesis.EMBO_J 4:1661-1666
Link G, Langridge U (1984) Structure of the chloroplast gene for the precursor
of the Mr 32,000 photosystem II protein from mustard (Snapis alba L.) Nucl Acids Res 12:945-958
!~ani ati s T, Fri tsch EF. Sambrook J (1982) liMo 1 ecu 1 ar c 1 oni ng: A 1 aboratory
manual ll Cold Spring Harbor Lab, New York
Maxam AM, Gilbert W (1977) A new method for sequencing DNA. Proc Natl Acad Sci USA 74:560-564
McIntosh L, Poulsen C, Bogorad L (1980) Chloroplast gene sequence for the large
subunit of ribulose bisphosphatecarboxylase of maize. Nature 288:556-560
Meek DW. Hayward R (1984) Nucleotide sequence of the rpoA-rplQDNA of Escherichia co.Ti: a second regulatory binding site for prote:in S4? Nucl Acids Res 12: 5813-5821
Messing J (1983) New M13 Vectors for cloning. in I1Methods in, Enzymologi', Academic Press, New York, 101: pp20-7B
Mets LJ, Geist L (1983) Linkage of a known chloroplast gene mutation to the
-79-
uniparental genome of Chlamydomonas' reinhar.dii~ Genetics 105:559-0579,
Michel F, Dujon B (1983) C~nservation of· RNA secondary structures in two. intron
families' including mitochondrial-, ch10roplast- and nuclear-encoded
members. EMBO J 2:33-38 Montandon PE, Stutz E (1983) Nucleotide sequence of a .Euglena gracilis
chloroplast genome region coding for the elongation factor TUi evidence for
a spliced mRNA. Nuc1 Acids Res 11:5877-5892 Montandon PE. Stutz E (1984) The genes for the ribosomal protein S12 and S7 are
clustered with the gene for the EF-Tu protein on the ch10ropla~t genome of
Euglena gracilis. Nucl Acid Res l2:285l~2859
Morris J, Herrmann RG (1984) Nucleotide sequence of the gene for the P680
chloro~hyll ~ ap~protein of the photosystem II reaction center from
spinach. Nuel Acids Res 12:2837-2850 Muller GS, Hallick RB, Alt J, Westhoff P, Herrmann RG (1986) Spinach plastid
genes coding for initiation factor IR-l, ribosomal protein Sl1 and RNA
. polymerase C(-subunit. Nucl Acid Res 14: 1029-1044 Mullet JE, Orozco EM, Chua NH (1985) Multiple transcripts for higher plant rbcL
and atpB genes and localization of the transcription initiation site of the
rbcL gene. Plant Mol Biol 4:39-54.
Nargang F, Mcintosh L, Somerville C (1984) Nucleotide sequence of the
ribulosebisphosphate carboxylase gene from Rodospirillum rubrum. Mol Gen
Genet 193:220-224
Ohme M, Tanaka M, Chunwonfse J,Shiriozaki K, Sugiura M (1986) A tobacco
'chloroplast DNA sequence possibly coding for'a polypeptide similar to ~
coli RNA polymerase~-subunit. FEBS lett 200:87-90
Ohyama K, Wetter LR, Yamano Y, Fukuzawa H,Komano T (1982) A simple method for
isolation of chloroplast DNA from Marchantia polymorpha L. cell suspension
cultures. Agrie Biol Chern 46:237-242
Ohyama K, Yamano Y, Fukuzawa H, Komano T, Yamagishi H, Fujimoto S,~ Sugiura ~
(1983) PhY~ical mapping~ of chloroplast DNA from a liverwort Marchantia
polyrriorpha L. cell suspension cultures. Mol Gen Genet 189: 1...;9
Ohyama K,Fukuzawa H, Kohchi. T,Shirai H, Sa no T, Sano S, Umesono K, Shiki S,
Takeuchi M, Chang Z, Aota S, Inokuchi H, Ozeki H (1986) Chloroplast gene
organization deduced from complete sequence of liverwort Marchantia
polymorpha chloroplast DNA. Nature 322:572-574
Oishi KK, Shapiro DR, Tewari KK (1984) Sequence organization of a pea ' .
chloroplast DNA gene· coding for'a 34,500-daltoh ·protein. Mol Cell Siol
-80-
4: 2556-2563 Palmer JD (1983) Chloroplast [)NA exists in two. orientations. Nature 301:92-93 Palme~ JD (1985) Comparative organization'pf chloroplast. genome. Ann Rev ~enet
19:325-354 Perron CV, Vi ei ra J, Messing' J (1985) Improved M13 phage c loni ng v~.ctors and
. host strains: nucleotide sequences of the M13mp18 ani pUC19 vecto~s.Gene . ,
33:103-119 Phillips Al, Gray JC (1984) location and. nucleotide sequence of the gene for
the 15.2 kDa ,polypeptide of the cytochrome b,..f ~omplex from pea
chloroplast. Mol Gen Genet 194:477-484 Pirtle R, Ca1agan J, Pirtle I, Kashdan M, Vreman H, Dudock B (1982) The
nucleotide sequence of spinach chloroplast methionine elongator tRNA. Nucl,· Acids Res 9:183-188
Platt T, Muller-Hill B, Miller JH (1972) Assays of the lac operon enz·ymes. in: Experiments in molecular genetics. Cold S~ring Harbor lab., NewYor.k, pp.352
Pon Cl, Wittmann lB, Gualerzi C (1979) Structure-function relationships in Escherichia coli initiation factors. II Elucidati~n of the primary', structure of initiation factor IF-1. FEBS lett 101:157-160
Posno M, Vliet A. Groot GSP (1986) The gene for Spirodela oligorhiza chloroplast ribosomal protein homologous to h coli ribosomal protein l16
is split by a large intron near its 5' end: structure and expre~sion. Nucl
Acid Res 14:3181-3195
Post LEI Arfsten AE~ Reusser F, Nomura M (1978) DNA sequence of promoter regions
for the str'and ~ ribosomal protein operons in ~. coli. Cell 15:215~229
Post lE, Nomura M (1980) DNA sequences ·from the str-operon of Escherichia coli.
J Biol Chern 255:4660-4666
PribnoVi D (1975) Bacteriophage T7 early promoters: Nucleotide sequences of two RNA polymerase binding $ites. J Mol 8io1, 99:419-443
Projan SJ, Carleton S, Novick RP.(1983) Determination of plasmid co~y number by
fluorescence densitometry. Plasmid 9:182,..190
Quigley F, Weil. JH (1985) Organization and sequence of five tRNA genes and of,
an unidentified reading frame in the wheat chloroplast genome: evidence for . , . . - .. gene rearrangements during the evolution of chloroplast genomes. Curr Genet
9:495-503
Rasmussen OF, Bookjans G, Stummann.BM, Hennings~n K~ (1~84) localization ~nd
nucleotide sequence of the gene for the membrane polypeptideD2 from pea
chloroplast DNA. Plant Mol Bio1 3:191-199
-81=
Reichelt BY, Delaney SF (1983) The nucleotide sequence for the large subunit of
ribulose l,5...;bi sphosphate carboxyl ase from a unice 11 ul ar cyanobacteri urn, Synechococcus PCC6301. DNA 2:121~129
Rochaix JD, Malnoe P (1978) Anatomy of the chloroplast ribosomal DNA of
Chlamydomonas reinhardii. Cell 15:661~670
RochaixJD, Dron M, Rahire M, Malnoe P (1985) Sequence homology between the 32K dalton and the D2 chloroplast membrane polypeptides of Chlamydomonas
reinhardii. Pl~nt Mol Biol 3:363~370
Rochaix JD, Rahire M, Michel F (1985) The chloroplast ~ibosomal intron of Chlamydomonas reinhardii codes for a polypeptide related to mitochondrial
maturase. Nucl Acids Res 13:975~984
Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain terminating
inhibitors. Proc Natl Acad Sci USA 74:5463-5467
Saraste M, Gay NJ, Eberle A, Runswick MJ, Walker J (1981) The atp operon:
nu~leotide sequence ~f the genes for th~ r, b, and e subunits of
Escherichia coli ATP synthase. Nucl Acids Res 9:5287~5296
Schmidt RJ~ Richardson CB, Gillham NW. Boynton JE (1983) Sites of synthesis of
chloroplast ribosomal proteins in.Chlamydomonas. J Cell Biol 96:1451~1463
Schmidt RJ, Hosler JP, Gillham NW, Boynton JE (1~85) Biogene~is and evolution
of chloroplast ribosomes: cooperation of nuclear and chloroplast g~nes. In:
Molecular biology of the photosynthetic apparatus. Cold Spring Harbor Lab, New York, pp417~427
Schwarz Zs, Kossel H (1980) The primary structure of 16S rDNA from Zea mays
. chloroplast is homologous to 1. coli 16S rRNA. Nature 283:739~742
Shine L, Oalgarno L (1974) The 31 -terminal sequence of Escherichia coli 16S
ribosomal RNA: Complementary to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci USA 71:1342-1346
Shinozaki K, Sugiura M (1982) The nucleotide sequence of the tobacco
chloroplast gene for the large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase. Gene 20:91-102
Shinozaki K, Oeno H, Kato A, Sugiura M·{1983) Overlap and cotranscription of
the genes for the beta and epsilon subunits of tobacco chloroplast ATPase. Gene 24:147-155
Shinozaki K, Yamada C, Takahata N, Sugiura M (1983) Molecular cloning and
sequence analysis of the cyanobacterial gene for the large subunit of
ribulose~l,5-bisph~sphate carboxylase/oxygenas~. Proc Natl Acad Sci USA 80:4050-4054
-82-
Shinozaki K; Deno H, Wakasugi T, Sugiura M (1986) Tobacco chloroplast gene coding fbr subunit I of proton-translocating ATPase; comparison with the wheat subunit I and ~. coli subunit b. Curr Genet 10:421-423
Shinozaki K, Deno H, Sug.itaM, Kuramotsu S, Sugiura M (1986).Intron in the·gene
for the ribosomal protein S16 of tobacco chloroplast and its conserved boundary sequences. Mol Gen.Genet 202:1-5
Solnick D (1985) Trans splicing of mRNA precursors. Cell 42:157-164 Spielmann A, Stutz E (1983) Nucleotide sequence of soybean chloroplast DNA
regions which contain the EQA and trnH genes and cover the ends of the large single copy region and one end of the inverted repeats. Nuc1 Acids
Res 11;7157-7167 Steinmetz AA, Gubbins EJ, Bogorad L (1983) The anticodon of the maize
chloroplast.gene for tRNALeuUM is spl.it by a large intron. Nucl Acids Res
10:3027-3037 Steinmetz AA, Castroviejo M, Sayre RT, ·Bogorad L (1986) Protein PSI I-G. An
additional component of photosystem II identified through its plastid gene
in maize. J Biol Chem 261:2485-2488 Strittmatter G, Gozdzicka-Jozefiak A, Kossel H (1985) Identifi~ation of an.rRNA
operon promoter from Zea.mays chloroplasts which excludes the proximal
tRNAValGAC from the primary transcript. EMBO J 4:599-604
Subramanian AR, Steinmetz A. Bogorad L (1983) Maiie chloroplast DNA encodes a protein homologous to the bacterial ribosomal assembly protein S4~ Nuc1
Acids Res 11:5211-5286
Sugita M, Sugiura M (1983) A putative gene for tobacco chloroplast coding for
ribosomal protein similar to h coli ribosomal protein S19. Nucl Acids Res
11: 1913-1918 Sugita M. Sugiura M (1984) Nucleotide sequence and transcription of the gene for
the 32,000 dalton thylakoid membrane protein from Nicotiana tabacum. Mol
Gen Genet 195:308-313
Sugita M, Shinozaki K •. Sugiura M (1985) Tobacco chloroplast tRNALyS(UUU) gene
contains a 2.5-kilobasepair intron: an open reading frame and a conserved
boundary sequence in the i ntron. Proc Natl Acad Sci USA 82: 3557-:-3561
Takaiwa F. Sugiura M (1982) Nucleotide sequence of the 16S-23S spacer region in
anrRNA gene cJuster from tobacco chloroplast DNA. Nucl Acids Res. 10:2665-
2676 Takanami M, Sugimoto K. Sugisaki H, Okamoto T (1976) Sequence of·~romoter for
·coat protein gene of bacteriophage fd. Nature 260:291-302
-83-
Tewari KK. Goel A (1983) Solubilizatlon and partial purification, of RNA'
polymerase from pea chloroplasts. Biochem 22:2142-2148 Tohdoh N, Sugiura M (1982) The complete nucleotide sequence of a 16S ribosomal
RNA gene from tobacco chloroplasts. Gene 17:213-218 Torazawa K, Hayashida N, Obokata J, Shinozaki K,'Sugiura M (1986) The 5' part of
the gene for ribosomal protein S12 is located 30 kbp downstream from its
3' part in tobacco chloroplast genome. Nucl Acids Res 14:3143 Ty~gi AK, Herrmann RG (1986) Location and nucleotide sequence of the pre
apocytochrome fgene on the.Oenbthera hookeri plastid chromosome
(Euoenothera plastome I). Gurr Genet 10:481-486 Umesono K. Inokuchi H, Ohyama K, Ozeki H (1984) Nucleotide sequence of
Marchant i a pol ymorpha ch 1 orop.l ast DNA: a regi on possi b 1 y encodi ng three
tRNAs and three proteins including a homologue of ~ coli ribosomal protein
S14. Nucl Acids Res 12:9551-9565 Umesono K, InokuchiH, Shiki V; Takeuchi M, Chang Z. Fukuzawa H, Kohchi T, Sano
T, Ohyama K, Ozeki H (1986) Structure and organization of Marchantia
polymorpha chloroplast genome. (II) Inverted orientation of a 25 kbp
portion in LSC region. in preparation.
Hesthoff P (1985) Transcription of the gene encoding the 51 kd chlorophyll
!!,-apoprotein of thephotosystem II reaction centre from spinach.
Mol Gen Genet 201:115-123
Hhitfeld PR; Bottomley Iv (1983) Organization and structure of chloroplast
genes. Ann Rev Plant Physiol 34:279-310
Hidger WR, Cramer WA, Herrmann RG, Trebst A (1984) Sequence homology and
structural similarity between cytochrome Q of mitochondrial complex III and
the chloroplast ~-f complex: position of the cytochrome .Q hemes in the
membrane. Proc Natl Acad Sci USA 81:674-678
~ji1bur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acid, and
protein data banks. Proc Natl Acad Sci USA 80:726-730
l~illey DL, Huttiy AK, Phillips AL, Gray JC (1984) Localization of the gene for
cytochrome f in pea chloroplast DNA. Mol Gen Genet 189:85-89
Willey DL, Auffret AD, Gray JC (1984) Structure and topology of cytochrome f in
pea chloroplast membran~s. Cell 36:555-562
Willey DL, Howe CJ, Auffret AD, Bowman CM, Dyer TA, Gray JG (1984) Location and
nucleotide sequence of the gene for cytochrome f in wheat chloroplast DNA.
MolGen Genet 194:416-422
Wittrnann-Liebold B, Seib C (1979) The.primary structure of protein 1.:20 from the
-84-
large subunit of the Escherichia coli ribosome. FEBS lett 103:61-65 Yamano Y, Ohyama K, Komano K (1984) Nucleotide sequence of chloroplast 5S
ribos~mal RNA from cell suspension culture of the l~v~rwort M~rchantia polymorpha and Jungermannia subulata. Nucl ACids Res 12:4621~4624
Yamano Y, Kohchi T, Fukuzawa H, Ohyama K, Komano T (1985) Nucleotide sequences of chloroplast 4.55 ribosomal RNA from a leafy liverwort, Jungermannia subulata, and a thalloid liverwort, Marchantia polymorpha.FEBS lett
185:203-207 Perron YC, Vieira J, Messing J (1985) .Improved M13 phage cloning vectors and
host strains: nucleotide sequences of the M13mp18 and pUC19 vectors. Gene 33: 103-119
Zurawski G, Perrot B, Bottomley W, Whitfeld PR (1981) The structu~e of the gene for the large subunit of ribulose 1,5-bisphosphate carboxylase from spinach
chloroplast DNA. Nucl Acids Res 9:3251-3270 Zurawski G, Bottomley W, Whitfeld. PR (1982) ~tructure of the genes for the (3
and £ subunits of spinach chloroplast ATPase indicate a dicistronic mRNA and an overlapping translation stop/start signal. Proc Natl Acad Sci USA 79:6260-6264
Zurawski G, Bohnert HJ, Whitfeld PR, Bottomley W (1982) Nucleotide sequence of
the gene for the MR 32,000 thylakoid membrane protein from Spinacia
oleracea and Nicotiana debneyi predicts a totally conserved primary
translation product of MR 38,950. Proc Natl Acad Sci USA 79:7699-7703
Zurawski G, Clegg MT (1984) The Barley chloroplast DNA atpBE, trnM2, and trnVl loci. Nucl Acids Res 12:2549-2559
Zurawski G, Clegg MT, Brown AH (1984) The nature of nucleotide sequence
divergence between barley and maize chloroplast DNA. Genetics 106:735-749 Zurawski G, Bottomley W, Whitfeld PR (1984) Junctions of the large single copy
region and the.inverted repeats in Spinacia oleracea and Nicotiana debneyi
chloroplast DNA: sequence of the genes for tRNAHis and the ribosomal
protein S19 and L2. Nucl Acids Res 12:6547-6558
Zurawski G, Zurawski SM (1985) Structure of the Escherichia coli S10 ribosomal
protein operon. Nucl Acids Res 13:4521-4526
~85':""
SU~Y
CHAPTER I Molecular cloning of promoters functional in Escherichia coli
from chloroplast DNA
DNA fragments cloned from chloroplast DNA of a liverwort, Marchantia polymorpha,
functional in £. coli as transcriptional promoters using gene fusion to the £. coli.
lac'Z gene. A recombinant plasmid gave as high a level of~-galactosidase activity
as when it was induced by IPTG in £. coli wild type strain W3110. The inserted
chloroplast DNA fragment was sequenced and mapped at the terminus of the inverted
repeat region upstream from the 165 ribosomal RNA gene. The direction of the
transcription from this promoter was opposite from that of 165 ribosomal RNA gene.
This highly active promoter was for trnI-C*AU gene and clustered genes for ribosomal
proteins. 51 nuclease mappings using both chloroplast and £. coli RNAs showed that
the transcription starts at almost the same position downstream from the consensus
Pribnow-box-like region. This clone also had a higher activity of j1-galactosidase
in £. coli than those containing promoters of rbcL and the P subunit gene of H+-ATP
synthase. Two clusters of genes for ribosomal proteins were identified downstream
from this highly active promoters.
CHAPTER II Structure and gene organization of the chloroplast genome
. The nucleotide sequence of the large single copy region (psbG-16S rRNA gene;
30,600 bp) of the chloroplast DNA from a liverwort, ~. polymorpha was determined.
This region encodes genes for seven tRNAsi tRNAVal(GAC), tRNAIle(C*AU),
tRNAArg(CCG), tRNAPro(UGG), tRNATrp(CCA), tRNAMet(CAU). tRNAVal(UAC), ten
photosynthetic polypeptides; the large subunit of ribulose-l.5~bisphosphate
carboxylase/oxygenase (rbcL), 51 kd photosystem II chlorophyll ~ apoprotein (psbB).
apocytochrome b-559 polypeptides (psbE and psbF), cytochrome f preprotein (petA),
cytochrome b6 polypeptide (petB) and cytochrome b6/f complex subunit 4 polypeptide
(petD), f and E subunits of H+-ATP synthase (atpB and atpE). photosystem II G
protein (psbG). and ribosomal proteins (L2, Ll4, L16, L20, L22. L23. L33, 53, 58,
-86-
511, 512, 518, 518 and S19), initiation factor 1 (infA) and ~ subunit of RNA
polymerase (rpoA). Interestingly, functionally related genes are clustered as
follow: (1) A ribosomal protein gene cluster involving transcriptional and
translational machinery, trnI~L23-L2-S19-L22-S3-L15-L14-S8~infA-secX-S11~rpoA, was
found at the terminus of the large single copy region next to the inverted repeat
region (IRB). (2) A cluster of photosynthetic genes, psbB-ORF35-0RF27-0RF74-petB
petD is located next to the ribosomal protein gene cluster. (3) A cluster incl~ding
photosynthetic genes rbcL-trnR-ORF316-0RF36b-ORF184-0RF434-petA;. was also found in
large single copy region. Introns (intervening sequences) were found in coding
sequences for ribosomal protein genes (rp12, rpl16 and rps12), tRNAVa1 (UAC) gene and
photosynthetic genes (petB and petD). Interestingly, an open reading frame was
found to show significant amino acid sequence homology to ·a subunit of NADH
dehydrogenase in human mitochondria.
CHAPTER III Split gen~ for chloroplast ribosomal protein S12
A coding sequence corresponding to the ~. coli ribosomal protein 512 gene
(rps12) was f~und to be split into three exons. Strikingly, the first exon with the
5' intron boundary sequence was located on the opposite strand of the chloroplast
DNA (121 kb, Circular molecule) approximately 50 kb away from the rest of the exons.
The amino acid sequence deduced from the DNA sequence was highly homologous to the
sequences of the 512 ribosomal protein of ~. coli (70.2%), and Euglena gracilis
chloroplasts (73.6%). As the DNA rearrangement of these coding regions is not
observed, the active messenger RNA for ribosomal 'protein S12 is thought to be formed
post-transcriptionally such as that of trans-splicing. This may be the first
identification of an example for in vivo trans-splicing.
-~-
LIST OF PUBLICATIONS
(a) Ohyama K, Wetter LR, Yamano Y, Fukuzawa H, Komano T (1982) "A simple met.hod for isola:tion of chloroplast DNA from Marchantia
polymorpha L. cell suspension culture."
Agric Biol Chem 46:237-242
(b) Ohyama K, Yamano Y, ~u~uzawa H. Komano T. Yamagishi H. Fujimoto S. Sugiura M
,(1983 ) "Physi<;:al mappings of chloroplast DNA from liverwort Marchantia po 1 ymorpha L. ce 11 suspens i on cultures. 11
Mol Gen Genet 189:i-9
(c) Tanaka A. Yamano Y. Fukuzawa H. Ohyama K. Komano T (1984) "in vitro DNA synthesis by chloroplasts isolated from Marchantia polymorpha L. cell suspension cultures. It
Agric Biol Chern 48:1239-1244
Cd) Yamano Y. Kohchi T, Fukuzawa H. Ohyama K. Komano T (1985)
ItNuc1eotide sequence of chloroplast 4:5S ribosomal RNA from a leafy liverwort. Jungermannia subulata, and a thalloid liverwort, Marchantia polymorpha,"
FEBS Letter 185:203-207
(e) Fukuzawa H. Uchida Y. Yamano Y, Ohyama K, Komano T (1985)
IIMole'cular cloning of promoters functional in Escherichia coli from
chloroplast DNA of a liverwort Marchantia polymorpha. 1I
Agric. Biol. Chern. (1985) 49j2725-2731
-88-
(f) Fukuzawa H, Kohchi T, Sano T, Shirai H. Umesono K, Ohyama K, Ozeki H. Komano T
(1986) "Structure and organization of Marchantia po1ymorphach1oroplast genome
(III) LSC region (rbcL"-J LB ) having gene clusters of ribosomal proteins
and photosynthetic polypeptides.",
in preparation.
(g) Fukuzawa H, Kohchi T, Shirai H, Ohyama K; Umesono K, Inoktichi H, Ozeki H (1986)
"Coding sequences for chloroplast ribosomal protein S12 from the
liverwort. Marchantia polymorpha, are separated far apart on the different DNA strands."
FEBS Letter (1986) 198:11-15
(h) Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sa no T,Sane S, Umesono K; Shiki S,
Takeuchi M, Chang Z, Aota Si Inokuchi H~ Oz~ki H (1986)
"Chloroplast gene organization deduced from complete sequence of
liverwort Marchantia polymorpha chloroplast DNA."
Nature 322:572-574
Chapter I is described in reference (e).
Chapter II is described in reference (f).
Chapter III is described in reference ~g).
-89-
ACKNOWLEDGMENT
The author wishes to express his grateful acknowledgment to Professor Tohru
Komano, Faculty of Agriculture, Kyoto University, for his continuous encouragement
throughout this work, and for critical reading of this manuscript.
The author would like to express his great gratitude to Dr. Kanji Ohyama,
Laboratory of Plant Molecular Biology, Research Center for Cell and Tissue Culture,
Faculty of Agriculture, Kyoto University, for allowing to use his facilities and his
kind guidance throughout this study.
The author thanks to Professor Haruo Ozeki, Faculty of Science, Kyoto
University, and also to Drs. Kazuhiko Umesono, Yoshiaki Yamano, Yuko Uchida, for
their helpful suggestion and valuable discussion.
The author thanks to Dr. Toshimichi Ikemura, National Institute of Genetics and
Dr. Minoru Kanehisa, Chemical research institute, Kyoto University, for their kind
support to computer analysis using data base.
The author is indebted to Mr. Takayuki Kohchi. Mr. Tohru Sano. Mr. Hiromasa
Shirai. Mr. Shigehiro Asano, Mr. Yoshinori Yamashita and all the members of the
laboratory of Plant Molecular Biology, Research Center for Cell and Tissue Culture,
and the laboratory of Biochemistry, Department of Agricultural Chemistry, Faculty of
Agriculture, Kyoto University, for their kind assistances in this study.
9f-rcPr Hideya Fukuzawa
-90-