cyp1 (hap1) regulator of oxygen-dependent gene expression in yeast

14
J. Mol. Biol. (1988) 204, 263-276 CYPl (HAPl) Regulator of Oxygen-dependent Gene Expression in Yeast I. Overall Organization of the Protein Sequence Displays Several Novel Structural Domains Francine Creusot, Jacqueline Verdi&e, Mauricette Gaisne and Piotr P. Slonimski Centre de Ghne’tique Mole’culaire du C.N.R.S. Laboratoire propre associe’ ci I’Universite’ Pierre et Marie Curie 91190 Gif-sur- Yvette, France (Received 10 February 1988, and in revised form 21 June 1988) In the yeast Saccharomyces cerevisiae the CYPl gene that modulates the expression of isol- (CYCl) and iso2-cytochrome c (CYPS) structural genes gives rise to two classes of mutated alleles; one class, represented by CYPl-18, has opposite effects on CYCl and CYPS, it reduces the expression of CYCl while it stimulates that of CYP3. The other class, represented by cypl-23 or the related allele hapl-1, reduces the expression of both CYCl and CYP3 genes. Genetic data suggested that the CYPl product is a positive regulator of the cytochrome c genes. The CYPl -18 allele has been cloned. We show here that the iso overproducer function of CYPl-18 is included in a 5300 base XhoI-PstI fragment. The sequence of this fragment reveals a unique, long, uninterrupted open reading frame of 4449 nucleotides able to encode a protein of 1483 amino acid residues. The predicted product of this open reading frame contains several interesting features. The N-terminal part of the protein resembles a nucleic acid-binding domain, in which two domains can be distinguished. The first is similar to a “finger” DNA binding motif, as found in TFIIIA and other regulatory proteins. The second consists of seven tandemly repeated sequences with a KCPVDH motif. Because of its structure, it is tempting to speculate that this region may act as a “redox sensor” folded around a metal atom or heme and involved in recognition of respiratory effecters. These two domains are separated by an “opa” sequence of 13 Gln residues. Implication of these domains for the function of CYPl-18 is discussed. 1. Introduction The yeast Saccharomyces cerevisiae is a faculta- tive aerobe and the presence/absence of oxygen controls its overall cell physiology and the synthesis of many different enzymes. The apoproteins of several cytochromes and other respiratory enzymes are not synthesized under anaerobic conditions and their formation is induced by oxygen and heme (Ephrussi & Slonimski, 1950; Slonimski, 1955; Guarente & Mason, 1983). Sherman et al., 1968; Verdi&e & Petrochilo, 1975; Downie et al., 1977; Smith et al., 1979; Montgomery et al., 1980). The expression of the two genes is differently regulated in the wild-type yeast: isol- and iso2cytochrome c account for 95 y0 and 5%, respectively, of total cellular cytochrome c and the derepression of CYCl transcription but not of CYP3 is dependent on oxygen and intracellular level of heme (Slonimski et al., 1965; Fukuhara, 1966; Guarente & Mason, 1983; Laz et al., 1984). The regulation of cytochrome c synthesis has Catabolite derepression and oxygen/heme been used as a model system in this respect for activation of CYCl occurs through an “upstream several reasons. Two distinct molecular species of activation site” or UAS located in the 5’ non-coding this protein are found in a yeast cell; they are region of the gene. This regulatory region consists isofunctional in spite of differences in their primary of two subsites, UASl and UAS2, which are sequence and are encoded by independent genes, independent functional units and the transcription CYCl for isol-cytochrome c and CYP3 (also called dependent on UASl is derepressed by increasing CYC7) for iso2cytochrome c (Slonimski et al., 1965; the levels of intracellular heme during growth on 0022-2836/88/220263-14 $03.00/O 263 0 1988 Academic Press Limited

Upload: piotr-p

Post on 04-Jan-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

J. Mol. Biol. (1988) 204, 263-276

CYPl (HAPl) Regulator of Oxygen-dependent Gene Expression in Yeast

I. Overall Organization of the Protein Sequence Displays Several Novel Structural Domains

Francine Creusot, Jacqueline Verdi&e, Mauricette Gaisne and Piotr P. Slonimski

Centre de Ghne’tique Mole’culaire du C.N.R.S. Laboratoire propre associe’ ci I’Universite’ Pierre et Marie Curie

91190 Gif-sur- Yvette, France

(Received 10 February 1988, and in revised form 21 June 1988)

In the yeast Saccharomyces cerevisiae the CYPl gene that modulates the expression of isol- (CYCl) and iso2-cytochrome c (CYPS) structural genes gives rise to two classes of mutated alleles; one class, represented by CYPl-18, has opposite effects on CYCl and CYPS, it reduces the expression of CYCl while it stimulates that of CYP3. The other class, represented by cypl-23 or the related allele hapl-1, reduces the expression of both CYCl and CYP3 genes. Genetic data suggested that the CYPl product is a positive regulator of the cytochrome c genes.

The CYPl -18 allele has been cloned. We show here that the iso overproducer function of CYPl-18 is included in a 5300 base XhoI-PstI fragment. The sequence of this fragment reveals a unique, long, uninterrupted open reading frame of 4449 nucleotides able to encode a protein of 1483 amino acid residues. The predicted product of this open reading frame contains several interesting features. The N-terminal part of the protein resembles a nucleic acid-binding domain, in which two domains can be distinguished. The first is similar to a “finger” DNA binding motif, as found in TFIIIA and other regulatory proteins. The second consists of seven tandemly repeated sequences with a KCPVDH motif. Because of its structure, it is tempting to speculate that this region may act as a “redox sensor” folded around a metal atom or heme and involved in recognition of respiratory effecters. These two domains are separated by an “opa” sequence of 13 Gln residues. Implication of these domains for the function of CYPl-18 is discussed.

1. Introduction

The yeast Saccharomyces cerevisiae is a faculta- tive aerobe and the presence/absence of oxygen controls its overall cell physiology and the synthesis of many different enzymes. The apoproteins of several cytochromes and other respiratory enzymes are not synthesized under anaerobic conditions and their formation is induced by oxygen and heme (Ephrussi & Slonimski, 1950; Slonimski, 1955; Guarente & Mason, 1983).

Sherman et al., 1968; Verdi&e & Petrochilo, 1975; Downie et al., 1977; Smith et al., 1979; Montgomery et al., 1980). The expression of the two genes is differently regulated in the wild-type yeast: isol- and iso2cytochrome c account for 95 y0 and 5%, respectively, of total cellular cytochrome c and the derepression of CYCl transcription but not of CYP3 is dependent on oxygen and intracellular level of heme (Slonimski et al., 1965; Fukuhara, 1966; Guarente & Mason, 1983; Laz et al., 1984).

The regulation of cytochrome c synthesis has Catabolite derepression and oxygen/heme been used as a model system in this respect for activation of CYCl occurs through an “upstream several reasons. Two distinct molecular species of activation site” or UAS located in the 5’ non-coding this protein are found in a yeast cell; they are region of the gene. This regulatory region consists isofunctional in spite of differences in their primary of two subsites, UASl and UAS2, which are sequence and are encoded by independent genes, independent functional units and the transcription CYCl for isol-cytochrome c and CYP3 (also called dependent on UASl is derepressed by increasing CYC7) for iso2cytochrome c (Slonimski et al., 1965; the levels of intracellular heme during growth on

0022-2836/88/220263-14 $03.00/O 263

0 1988 Academic Press Limited

264 I? Creusot et al.

glucose (Guarente et al., 1984). Iso2-cytochrome c also possesses a regulatory region in the upstream 5’ non-coding sequence of the gene. This region contains both negative and positive regulatory sites and derepressed expression of CYP3 would be the result of antagonistic effects of these sites (Verdi&e & P&trochilo, 1979; Wright & Zitomer, 1984, 1985).

Genetic control of the synthesis of isocyto- chromes c and other respiratory enzymes has been discovered by the analysis of several independent genes that act in trans and modulate the expression of CYCI and CYP3 (Clavilier et al., 1969, 1976; Rothstein & Sherman, 1980). Among them, the gene CYPI is of particular interest. An intriguing class of mutations that have opposite effects on CYCI and CYP3 expression have been isolated from this locus. Such alleles, as CYPI-18, activate, at the transcriptional level, the expression of CYP3 while reducing that of CYCI. These alleles are referred to as the iso2-overproducer alleles (Clavilier et al., 1969, 1976; Montgomery et al., 1982). A second class of alleles of CYPI has been obtained that reduces both isol- and iso2-cytochrome e synthesis; these are the iso2-underproducer alleles represented by cypl-23 (Verdi&e $ Petrochilo, 1979). An indepen- dently isolated hapl-l mutation has been shown by complementation and recombination to belong to the CYPI locus. Thus CYPI and HAP1 are one and the same gene (Verdi&e et al., 1986).

Genetic data led to the proposal that the CYPl+ product would be a strong activator of CYCI and a weak activator of CYPS; CYPI-18 product would efficiently activate CYP3 and be inactive on CYCI, while the cypl-23 allele would have no effect on either CYCI or CYP3 genes (Verdi&e & Petrochilo, 1979). Consistent with that interpretation are the results of Pfeifer et al. (1987), who defined precisely the sequences bound by the CYPI’ product on the CYCI and CYP3 genes and thus uncovered the fact that the upstream target sequences present on these two cytochrome c genes show no obvious similarity. An interesting question is how one protein recognizes two different sequences in such a way that the activating function can be heme and/or oxygen-dependent for one of them (CYCI) and not for the other (CYP3).

In an approach towards understanding this regulatory system, we decided to determine the sequence of the CYPI gene. We have reported the isolation from a cosmid bank of the overproducer allele CYPI-18 (Verdi&e et al., 1985). This allele was chosen by virtue of its ability to convert an isol-cytochrome c deleted (and therefore lactate-) strain to a lactate+ phenotype. Here, we have delimited further the functional CYPI-18 gene to a shorter DNA fragment, which was subsequently sequenced. We show that CYPI-18 can encode a polypeptide chain of 164,000 Da with several interesting structural features which, we believe, are directly related to its activity as a master regulator of oxido-reduction-dependent gene expression.

In the accompanying paper (Verdikre et al.,

1988), we determine the structure of the wild-type CYPl+ gene. The comparison of the two sequences suggests a molecular model responsible for the regulatory characteristics of the CYPI protein.

2. Materials and Methods

(a) Yeast strains

All the following strains were constructed in our laboratory. VP209-7B (alpha leu2 ura3 cycl-I), FJl-32A (alpha trpl-1 ura3 cycl-1 CYPl-18), FJl-2C (alpha trpl-1 ura3 cycl-1), FJl-9B (a leu2 ura3 cycl-1 CYPl-18) and FJ21-1D (alpha trpl-1 ura3 cycl-1 cypl-m). In this last strain, the allele cypl-m cannot be stated unambiguously (see Verdi&e et aZ., 1986). This segregent is issued from a cross in which one of the parents carried the underproducer allele cypl-23 and the other the underproducer allele hapl-1 from strain BWG-7a hapl-1. As the 2 alleles have the same null phenotype, do not complement and belong to the same CYPl or HAP1 gene (Verdi&e et al., 1986), it is not possible to discriminate between the 2 underproducer alleles in the progeny.

(b) Bacterial strains

Escherichia coli HBlOl (F- Zeu- pro- Zac- gal- Hrs- recA- r- m- str-) were used for DNA cloning and JMlOl (delta lae pro) SupE thi F’ traD36 proAB lacIq ZacZdeltaM15 were used for propagating phages M13.

(c) Media

Yeast were grown on complete, minimum, glycerol or lactate media as described by Verdi&e & Petrochilo (1979). Bacteria were grown on LB or LBA or on 2 x TY medium (1 o/o (w/v) yeast extract, 1.6% (w/v) Bacto- tryptone, 0.5% (w/v) NaCI) for growth of M13. JMlOl were subcloned on minimum M9 medium (Miller, 1972).

(d) Plasmids

Plasmid clone 6 carrying the yeast ura3 marker was described by Bach et al. (1979). pG63-11 is a shuttle vector containing the yeast ura3 marker and the 2 pm origin of replication (Gerbaud et al., 1981). pMHU158 is a shuttle vector construceed by Heusterspreute et al. (1985) containing the leu2 marker and the 2 pm origin of replication. In this plasmid, the leu2 promoter is partially deleted so that the expression of the marker is reduced to 5% of the normal level. This decreases the efficiency of plasmid replication in transformed yeast. To circumvent this problem, we constructed the plasmid YEpUl by inserting the 1.2 kbt Hind111 ura3 fragment of clone 6 into the Hind111 site of pMH158. The integrative plasmid YIp2 was constructed by inserting the internal 0.9 kb NruI-SphI fragment (see Fig. 1) of CYPl-18 into the homologous sites of clone 6.

(e) DNA isolation and genetic engineering techniques

Yeast DNA and bacterial plasmid isolation, DNA fragment purification, restriction digests, ligation and bacterial transformation were carried out, &s described by Verdi&e et al. (1985).

t Abbreviations used: kb, lo3 bases or base-pairs; ORF, open reading frame.

Novel Structural Domains in the CYPl Protein 265

(f) Yeast transformation (k) Analysis of RNA

Replicative or integrative transformations were performed as described (Verdi&e et al., 1985). (Ura3+) transformants were selected on minimum medium supplemented with 60 /Ig leucine/ml or 20 pg tryptophan/ ml. The glycerol or lactate phenotypes of transformants were further tested by growth on glycerol or lactate medium.

RNA was electrophoresed through either 1 “/0 or 0.8% (w/v) agarose gels containing 2.2 M-formaldehyde and was then transferred to nitrocellulose filters as described by Maniatis et al. (1982). Filters were hybridized according to Thomas (1980).

(1) Computer unalyeis

(g) Subcloniny of CYPI-18

YEpJFM2 was digested with the appropriate enzymes (Fig, l(a)), DNA fragments were purified and inserted into the homologous sites of the vectors. The 9 kb BamHTIHindIII fragment (N.6) was subcloned in pG63-11. All the other fragments were cloned into YEpUl. For cloning the 5.3 kb PstI-XhoI fragment (N.3), YEpUl was first cut with XhoI and then partially digested with PstI in order to cut at only 1 of the 2 sites. After ligation of the fragment, recombinant plasmids were selected in E. coli HBlOl and insertion of the yeast fragment at the right location into the polylinker site of the vector was checked by restriction mapping. Subcloning of the 7 kb XbaI-XhoI fragment (N.2) was carried out in 2 steps. First, the 1 kb Xbu-XhoI fragment (N.2.1.) was cloned in the XbaI-XhoI sites of YEpUl and then the 6 kb XbaI fragment (N.2.2) was inserted into the unique XbaI site of the recombinant vector. The right orientation of the 6 kb fragment was checked by restriction analysis.

Sequence data treatments were performed using the computer facilities of CIT12 in Paris on a DPSS computer with the help of the French Minis&e de la Recherche et la Technologie (Programme mobilisateur “Essor des Biotechnologies”).

3. Results

(a) Cloning the C YPl-18 functional allele

The iso2-overproducer allele CY Pl-18 had been

(h) DNA sequencing

Shotgun cloning of the 5.3 kb Z’stI-XhoI fragment (N.3 in Fig. l(b)) was carried out into M13mp18 (Yanisch- Perron et al., 1985). Recombinant clones were selected by transformation of E. coli JMlOl. Clones were sequenced by the dideoxynucleotide chain termination method (Sanger et al., 1977) with the following modifications: DNA was primed with the 17-mer (Boehringer- Mannheim), chain termination reactions were performed with [%]dATP at 37°C for 25 min, fragments were separated on 6% (w/v) polyacrylamide gels. Clones giving band compression were resolved by using deoxyinosine instead of deoxyguanosine (Mills & Kramer, 1979). The entire sequence was determined on both strands,

(i) Preparation of RNA

Cells from strains FJl-2C (CYPl+) FJl-32A (CYPl-18) were grown on media containing 1 o/o (w/v) yeast extract, 1 y. (w/v) Bactopeptone and 2% (w/v) glucose to an A,,, of 2 (1 or 2 generations before stationary phase). Total RNA was prepared by the method of Maccechini et al. (1979), except that the LiCl precipitation step was omitted. Poly(A)+ RNA was purified on oligo(dT)- cellulose as described by Fraser (1975).

cloned by taking advantage of its ability to restore a lactate+ (let+) phenotype to an isol-cytochrome c dificient, thus (let-), yeast strain. This allele was first isolated as a rearranged DNA fragment inserted in plasmid YEpJFMl and subsequently in a non-rearranged BamHI genomic fragment of 23 kb in YEpJFM2 (Verdi&e et al., 1985). This insert was subcloned further in order to delineate the functional fragment(s). Comparison of DNA inserted in YEpJFMl and YEpJFM2 showed that the sequence located between the Hind111 and the XhoI sites at positions 14.5 and 19 kb, respectively (Fig. l(a)), displayed an identical restriction pattern. This observation suggested that the CYPl-18 function should lie in this region and prompted us to investigate more closely this part of the inserted DNA. The fragments shown in Figure l(a) were cloned in pG63-11 or in YEpUl (see Materials and Methods). Recombinant plasmids were used for transformation of a (let-) yeast strain; (ura3+) transformants were then screened for the CYPl-18 function by checking comple- mentation of the (let-) phenotype. As shown in Figure l(a), the fragments N.l (XhoI, 15 kb), N.2 (XbaI-XhoI, 7.5 kb) and N.3 (PstI-XhoI, 5.3 kb) were able to restore growth on lactate medium. None of the other fragments was able to do so; in particular, fragments N.5 and N.6 lacking about 1 kb on the right and left sides of the 5.3 kb PstI- XhoI fragment, respectively, did not have the CYPl-18 function. These results indicate that the 5.3 kb PstI-XhoI fragment is the shortest DNA sequence still carrying the complete CYPl-18 function. The recombinant plasmid carrying this insert is referred to as YEpJFM3.

( j) Preparation of radioactive probes

Double-stranded probes were nick-translated to specific activities of 5 x 10’ to 1 X 10’ cts/min per pg. Single- stranded probes were radiolabelled to approx. the same specific activity by performing DNA sequencing reactions (see above), except that no dideoxynucleotides were used and that [cr-35S]dATP was replaced by [cr-32P]dATP. The labelled DNA corresponds to the transcribed strand.

(b) Inactivation by disruption of the CYPZ gene

Different alleles of CYPl display various pheno- types with respect to cytochrome c synthesis (Clavilier et al., 1976; Verdi&e & Petrochilo, 1979). It was therefore important to check whether their inactivation leads to a common phenotype corre- sponding to the loss of all functions. Inactivation

266 F. Creusot et al.

6

5

4 ., u

2.1 ~2.r

1 :

+

s x Xb PH s Xb x

t t t ttt t t a t

I I ./ 1 I \

(b)

I ORF I

5’ 3

0.5 Kb -

Figure 1. Cloning and restriction map of the CYPl-18 region. The genomic BamHI fragment of 23 kb inserted in the plasmid YEpJFM2 (thick line) was subcloned into 7 replicative plasmids ((a), numbered 1 to 7) which were used to transform a let-) strain in order to determine the CYPl-18 function. The shortest fragment (subclone 3, PstI-XhoI fragment, 5.3 kb long), which restores growth on lactate is shown enlarged in (b). The internal SphI-NruI gene fragment cloned into the integrative plasmid YIp2, is indicated above the restriction map of the 5.3 kb fragment. The dotted lines with the numbers indicate the single-stranded DNA fragments cloned in Ml3 and used in Northern blot analysis (Fig. 5). The CYPl-18 ORF is shown under the restriction map. Abbreviations: B, BarnHI; X, XhoI; S, S&I; Bg, BgZII; Xb, XbaI; P, P&I; H, HindIII; Sp, SphI; R, EcoRI; N. NruI; Sa, SacI; and K, KpnI.

by disruption was achieved by inserting in the CYPl locus the non-replicative plasmid YIp2 by integration directed by double-strand break repair (Orr-Weaver et al., 1981) after linearization in the unique Sal1 site of the yeast insert (Fig. l(b)). CYPl+, CYPl-18 and cypl-m haploid strains (see Materials and Methods) were transformed with the linearized plasmid and (ura3+) transformants were selected. Integration into yeast DNA at the right location was checked by controlling mitotic stability of the (ura3+) phenotype and by Southern blot analysis of yeast DNA probed with 32P- labelled pBR322 (not shown). Table 1 reveals that disruption of any allele of CYPl prevents in an isol -cytochrome c-deficient (cycl-1 ) strain the growth on lactate as sole carbon source, and dramatically reduces growth on glycerol medium but allows normal growth on glucose. So, CYPl does not control any function essential for cell life. Analysis of cytochromes performed by low- temperature spectroscopy as described by Verdi&e et al. (1986) confirmed that all the disrupted

transformants behaved as an iso2- underproducer cypl-23 strain and did not synthesize any detect- able amount of cytochrome c. The fact that the null

Alleles

Table 1 Disruption of the C YYl locus

Media

GlUCO~ Glycerol Lactate

CYPl+ + + -

(‘YPI :: URA3 + +/- -

CYPI-1x f + + CYPI-18::URA3 + +/- -

cypl -In + +/- - (!YPl-m :: URA3 + +/- -

cycl-1-ura3- yeast strains carrying the CYPl+ (VP209-7B), the overproducer CYPl-18 (FJI-9B) or the underproducer cypl-m (FJZl-1D) allele were transformed with the integrative plasmid YIp2 linearized at the unique Sal1 site in the yeast insert. (ura3+) transformants were selected and tested for growth on complete, glycerol and lactate media.

Novel Structural Domains in the CYPl Protein 267

disruption or in vivo mutagenesis) have the same negative phenotype agrees well with the conclusion that both CYPl+ and CYPl-18 act as positive regulatory loci of iso-cytochrome c synthesis.

alleles of CYPI (whether obtained by gene (c) Sequence of the CYPl-18 allele

Both strands of the entire 5.3 kb P&I-XhoI fragment were sequenced by the dideoxynucleotide chain termination method (Sanger et al.. 1977). The sequence is shown in Figure 2. The sequence

1

61

121

181

241 300

7 301

27 361

47 421

67 481

87 541

107 601

127 661

147 721

167 781

187 841

207 901

227 961

247 1021

267 1081

287 1141

307 1201

327 1261

347 1321

367 1381

CTGCAGTGCTCTTTTAAGGTTTGAAGATATTGCACAAAATTGTATCTGATTTCATCTCAT

CGAACCTCATTCCATCAACTTCCTTTTATCAAAGCATCTTGTTTTGTTTTCGCAGAAAAT .

TTTTCTTCCCTTTCAATTTGGCCTTTTAGTTTTTCTCCAGTCGCAGGCAAGAAGGTAAG~

AAATAGAAGAAAAAGAAAAAAAAAAAAAGGGAACAATAGGTTAGATGTCGAATACCCCTT 1MSNTPY

ATAATTCATCTGTGCCTTCCATTGCATCCATGACCCAGTCTTCGGTCTCAAGAAGTCCTA NSSVPSIASUTQSS VSRSPN

ACATGCATACAGCAACTACGCCCGGTGCCAACACCAGCTCTAACTCTCCACCCTTGCACA N HTATTPGANTSSNSPPLHM

TGTCTTCAGATTCGTCCAAGATCAAGAGGAAGCGTAACAGAATTCCGCTCAGGTGCACCA SSDSSKI KRKRNR IPLRCTI

TTTGTCGGAAAAGGAAAGTCAAATGTGACAAACTCAGACCACACTGCCAGCAGTGCACTA CRKRKVKCDKLR P H C Q Q C T K

AAACTGGGG;AGCCCATCTCTGCCACTACATGGAACAGACCTGGGCAGA~GAGGCAGAGA T G V A Ii LCHYMEQTWAEEAEK

. AAGAATTGCTGAAGGACAACGAATTAAAGAAGCTTAGGGAGCGCGTAAAATCTTTAGAAA

ELLKDNELKKLRERV K S L E K . .

AGACTCTTTCTAAGGTGCACTCTTCTCCTTCGTCTAACTCCTTGAAAAGTTACAACACTC TLSKVHSSPSSNSLKSYNTP

CCGAGAGCAGCAACCTGTTTATGGGTAGCGATGAACACACCACCCTTGTTAATGCAAATA ESSNLFMGSDEHTTLVNANT

CAGGCTCTGCTTCCTCTGCCTCGCATATGCATCAGCAACAACAGCAACAGCAGCAACAG~ GSASSASHHHQQQQQQQQQE

AACAACAACAAGACTTTTCCAGAAGTGCGAACGCCAACGCGAATTCCTC~TCCCTTTCT~ QQQDFSRSANANANSSSLSI

. . TCTCAAATAAATATGACAACGATGAGCTGGACTTAACTAAGGACTTTGATCTTTTGCATA

S NKYDNDELDLTKDFDLLHI

TCAAAAGTAACGGAACCATCCACTTAGGTGCCACCCACTGGTTGTCTATCATGAAAGGT~ KSNGTI HLGATHWLS I Y K G D

ACCCGTACC;AAAACTTTTGTGGGGTCATATCTTCGCTATGAGGGAAAAGTTAAATGAA~ P Y L K LLWGHIFAMREKLNEW

GGTACTACCAAAAAAATTCGTACTCTAAGCTGAAGTCAAGCAAATGTCCCATCAATCAC~ Y Y Q K N S Y S K L K S S K C P I N H A

. CGCAAGCGCCGCCTTCTGCCGCTGCCGCCGCTACCAGAAAATGTCCTGTTGATCACTCC~

QAPPSAAAAATRKCPVDHSA .

CGTTTT~GTCTGG~ATGGT~G~~~~AAAGGAGGAGA~T~~T~TT~~TAGGAAATGTCCA~ FSSGMVA PKEETPL PRKCPV

TTGACCACACCATGTTCTCTTCGGGAATGATTCCTCCCAGAGAGGACACTTCGTCCCAG~ DHTMFSSGMIPPREDTSSQK

AGAGGTGTCCCGTTGACCACACCATGTATTCCGCAGGAATGATGCCGCCCAAGGACGAGA R CPVDHTM Y S A G H M P P K D E T

CACCTTCCCCATTTTCCACTAAAGCTATGATAGACCATAACAAGCATACAATGAATCCG~ P S P F S T K A M I D H N K HTMNPP

Fig. 2.

60

120

180

240

360

420

480

540

600

660

720

700

840

900

960

1020

1080

1140

1200

1260

1320

1380

1440

268 F. Creusot et al.

387 1441

407 1501

427 1561

447 1621

467 1681

487 1741

507 1801

527 1861

547 1921

567 1981

587 2041

607

2101

627 2161

647 2221

667 2281

687

2341

707 2401

727 2461

747 2521

767 2581

787 2641

807 2701

827 2761

847 2821

867 2881

CTCAGTCAAAATGTCCTGTGGACCATAGAAACTATATGAAGGATTATCCCTCTGACATGG QSKCPVDHRNYMKDYPSDHA

CAAATTCTTCTTCGAACCCGGCAAGTCGT;GCCCCATTGACCATTCAAG~ATGAAAAAT~ N S S S N P A S R c P I D H S S U K N T

CAGCGGCCT;ACCAGCTTCAACGCACAAT~CCATCCCACACCACCAACC~CAGTCCGGA~ AALPAS T Ii N T I p H H Q P Q S G S

CTCATGCTCGTTCGCATCCCGCACAAAGCAOGAAACATGATTCCTACAT~ACAGAATCT~ HARSHPAQSRK HDSYMTESE

AAGTCCTCGCAACACTTTG;GAGATGTTGCCACCAAAGCGCGTCATCGC~TTATTCATC~ V LATLCEHLPP K R V I A L F I E

AGAAATTCTTCAAACATTTATACCCTGCCATTCCAATCTTAGATGAACAGAATTTCAAAA KFFKHLYPAIPI L D E Q N F K N

ATCACGTGAATCAAATGCTTTCGTTGTCTTCGATGAATCCCACAGTTAACAACTTTGGTA H V N 0 M L S L S SHNPTVNN F G W

TGAGCATGCCATCTTCATCTACACTAGAGAACCAACCCATAACACAAATCAATCTTCCAA SMPSSSTLENQPITQIN L P K

AACTTTCCGATTCTTGTAACTTAGGTATT~TGATAATAA;CTTGAGATT~ACATGGCTA~ LSDSCNLGILIII LRLTWLS

CCATACCTTCTAATTCCTGCGAAGTCGACCTGGGAGAAGAAAGTGGCTC~TTTTTAGTG~ IPSNSCEVDLGEESGSFLVP

CCAACGAATCTAGCAATATGTCTGCATCTGCATTGACCT~GATGGCTAA~GAAGAATCA~ NESSNMSASALTSMAKEESL

. TTCTGCTAAAGCATGAGACACCGGTCGAGGCACTGGAGCTATGTCAAAA~TACTTGATT~

LLKHETPVEALELCQ K Y L I K

AATTCGATGAACTTTCTAGTATTTCCAATAACAACGTTAATTTAACCAC~GTGCAGTTT~ FDELSSI SNNNVNLTTVQFA

CCATTTTTTACAACTTCTATATGAAAAGTGCCTCTAATGATTTGACTACCTTGACAAATA I FYNFYMKSASNDLTTLTHT

. CCAACAACACTGGCATGGCCAATCCTGGTCACGATTCCGAGTCTCACCAGATCCTATTGT

N N TGHANPGHDSESHQILLS .

CCAATATTACTCAAATGGCCTTTAGTTGTGGGTTACACAGAGACCCTGATAATTTTCCTC N ITQUAFSCGLHRDPDNFPQ

. AATTAAACGCTACCATTCCAGCAACCAGCCAGGACGTGTCTAACAACGGGAGCAAAAAGG

L N A T IPATSQDVSWNGSKKA

CAAACCCTAGCACCAATCCAACTTTGAATAACAACATGTCTGCTGCCACTACCAACAGCA NPSTNPTLNNNMSAATTNSS

. GTAGCAGATCTGGCAGTGCTGATTCAAGAAGTGGTTCTAACCCTGTGAACAAGAAGGAAA

SRSGSADSRSGSNPVNKKEN . . . . . .

ATCAGGTTAGTATCGAAAGATTTAAACACACTTGGAGGAAAATTTGGTATTACATTGTTA QVSIERFKHTWRKIWYYIVS

. GCATGGATG;TAACCAATC;CTTTCCCTGGGGAGCCCTCGACTACTAAGAAATCTGAGGG

MDVNQSLSLGS PRLLRN L R D . .

ATTTCAGCGATACAAAGCTACCAAGTGCGTCAAGGATTGATTATGTTCGCGATATCAAAG FSDTKLPSASRI DYVRDIKE

. AGTTAATCATTGTGAAGAATTTTACTCTTTTTTTCCAAATTGATTTGTGTATTATTGCTG

L I I v K N F T L F F Q I DLCIIAV

TATTAAATCACATTTTGAA;GTTTCTTTAGCAAGAAGCGTGAGAAAATTTGAACTGGATT L N H ILWVSLARSVRKFELDS

CATTGATTAATTTATTGAAAAATCTGACCTATGGTACTGAGAATGTCAATGATGTAGTGA L I NLLKNLTYGTENVNDVVS

Fig. 2.

1500

1560

1620

1680

1740

1800

1860

1920

1980

2040

2100

2160

2220

2280

2340

2400

2460

2520

2580

2640

2700

2760

2820

2880

2940

Novel Structural Domains in the CYPl Protein 269

887 2941

907 3001

927 3061

947 3121

967 3181

987 3241

1007 3301

1027 3361

1047 3421

1067 3481

1087 3541

1107 3601

1127 3661

1147 3721

1167 3781

1187 3841

1207 3901

1227 3961

1247 4021

1267 4081

1287 4141

1307 4201

1327 4261

1347 4321

1367 4381

GCTCCTTGATCAACAAAGGGTTATTACCAACTTCGGAAGGTGGTTCTGTAGATTCAAATA s L I NKGLLPTSEGGSVD S N N

ATGATGAAATTTACGGTCTACCGAAACTACCCGATATTCTAAACCATGGTCAACATAACC D E I Y G L P K L P D I L N H G Q Ii N 0

. . . AAAACTTGTATGCTGATGGAAGAAATACTTCTAGTAGTGATATAGATAAGAAATTGGACC

NLYADGR NTSSSD IDKKLDL .

TTCCTCACGAATCTACAACGAGAGCTCTATTCTTTTCCAAGCATATGACAATTAGAATGT PHESTTRALF FSKHMT I R w L

. . TGCTGTACTTATTGAACTACATTTTGTTTACTCATTATGAACCAATGGGCAGTGAAGATC

LYLLNY ILFTHYEPMGSEDP . .

CTGGTACTAATATCCTAGCTAAGGAGTACGCTCAAGAGGCATTAAATTTTGCCATGGATG G T N I LAKEYAQEALNFAMDG

. . GCTACAGAAACTGCATGATTTTCTTCAACAATATCAGAAACACCAATTCACTATTCGATT

YRNCMI F F N N I R N T N S L F D Y

ACATGAATGTTATCTTGTCTTACCCTTGTTTGGACATTGGACATCGTTCTTTACAATTTA I4 N V I L S Y P C L D I GHRSLQFI

. . . . TCGTTTGTTTGATCCTGAGAGCTAAATGTGGCCCATTGACTGGTATGCGTGAATCATCGA

V c L I L R A K CGPLTGMR E S S I

TCATTACTAATGGTACATCAAGTGGATTTAATAGTTCGGTAGAAGATGAGGACGTCAAAG I TNGTSSGFNSSVEDEDVKV

. . . TTAAACAAGAATCTTCTGATGAATTGAAAAAAGACGATTTCATGAAAGATGTAAATTTGG

K Q ESSDELKKDDFMKDVNLD

ATTCAGGCGATTCATTAGCAGAGATTCTAATGTCAAGAATGCTGCTATTTCAAAAACTA~ S G D S L A E I LMSRMLLFQKLT

CAAAACAACTATCAAAGAAGTACAACTACGCTATTCGTATGAACAAATCCACTGGATTCT K Q L S K K Y N Y A I R Y NKSTGFF

TTGTCTCTTTACTAGATACACCTTCAAAGAAATCAGACTCGAAATCGGGTGGTAGTTCA~ v s LLDTPSK KSDSKSGGSSF

TCATGTTGGGTAATTGGAAACATCCAAAGGTTTCAAACATGAGCGGATTTCTTGCTGGTG N L G NWKHPKVSNMSGFLAGD

ACAAAGACCAATTACAGAAATGCCCCGTGTACCAAGATGCGCTGGGGTT~GTTAGTCCA~ K DQLQKCP v Y a DALGFVSPT

. . . CCGGTGCTAATGAAGGTTCTGCTCCGATGCAAGGCATGTCCTTACAGGGCTCTACTGCTA

GANEGSAPMQGMSLOGSTAR . . .

GGATGGGAGGGACCCAGTTGCCACCAATTAGATCATACAAACCTATCACGTACACAAGTA M G G T Q L P P I RSYKPITYTSS

. . . . GTAATCTACGTCGTATGAATGAAACGGGTGAGGCAGAAGCTAAGAGAAGAAGATTTAATG

NLRRMNETGEAEAKRRRFND . .

ATGGCTATATTGATAATAATAGTAACAACGATATACCTAGAGGAATCAGCCCAAAACCT~ GYIDNNSNNDIPRGISPKPS

CAAATGGGCTATCATCGGTGCAGCCACTA;TATCGTCAT;TTCCATGAA~CAGCTAAAC~ NGLSSVOPLLSSFSWIIQLNG

. . GGGGTACCATTCCAACGGTTCCATCGTTAACCAACATTACTTCACAAATGGGAGCTTTA~

G T I PTVPSLTN ITSQMGALP . . . .

CATCTTTAGATAGGATCACCACTAATCAAATAAATTTGCCAGACCCATCTAGAGATGAAG S L D R ITTNPINLPDPSRDEA

. . . . . CATTTGACAACTCCATCAAGCAAATGACGCCTATGACAAGTGCATTCATGAATGCTAATA

F D N S I K Q M T P N T S A F I4 N A I T . . . . .

CTACAATTCCAAGTTCAACTTTAAACGGGAATATGAACATGAATGGAGCTGGAACTGCGA T I P S S T L NGNMNMN GAGTAN

.

Fig. 2.

3000

3060

3120

3180

3240

3300

3360

3420

3480

3540

3600

3660

3720

3780

3840

3900

3960

4020

4080

4140

4200

4260

4320

4380

4440

270 F. Creusot et al.

1387 4441

1407 4501

1427 4561

1447 4621

1467 4681

4741

4801

4861

4921

4981

5041

5101

5161

5221

ATACAGATACAAGTGCCAACGGCAGTGCTTTATCGACACTGACAAGCCCACAAGGCTCAG TDTSANGSALSTLTSPOGSD

. ACTTAGCATCCAATTCTGCTACACAGTATAAACCTGACTTAGAAGACTTTTTGATGCAAA

LASNSATQYKPDLEDFLMQN

ATTCTAACTTTAATGGGCTAATGATAAATCCTTCCAGTC;GGTAGAAGT~GTTGGTGGA~ SNFNGLMI NPSSLVEVVGGY

. . ACAACGATCCTAATAACCTTGGAAGAAATGACGCGGTTGATTTTCTACCCGTTGATAATG

NDPNNLGRNDAVDFLPVDNV

TTGAAATTGATGGACTAGTAGATTTTTATAGAGCAGATTTTCCAATATGGGAGTGATGCG E I DGLVDFYRADFPIWE l 1483

GTTTGTTTATGTATAAATAGAAAGGGACAIATTTTTATTATTATTATTT~AGTTATTTA~ . . .

CATGTATGTTATTTTATCGGTGAATTGTAATGATCGAATAATAATTGGAGGTTGAAGAAG . . .

TTGCAAAAAATATTTACTTGGACGAGTCCGGAATCGAACCGGAGACCTCTCCCATGCTAA . .

GGGAGCGCGCTACCGACTACGCCACACGCCCAATTTTCTTGTTGTCTGTTGTGTTTCAGA .

ATGGGGTACGTCAGTATGATAATAATCATCCTAAACGTTCTTAAATTACATATGAAACAA .

CCTTATAGCAAAACGAACAGAATGAGCAACATGAGATGAAACTCCGCCTCCTTAACTGAA .

CTTTCCAAACGTATAAACGCCTGAAAAATTAGTTTAGATCCGAGATTCCGCGCTTCCACC .

ATCAAGTATGATCCATATTTTATATAATATATAAGATAAGTAACATTCCGTAAGCTGATA .

ATCCCTTTTGGCAACTCGTTACTTCCCAAAGACTGTTTATATTAGGATTGTCAAGGGACC . 5235

CCGGTATTGCTCGAC

4500

4560

4620

4680

4740

4800

4860

4920

4980

5040

5100

5160

5220

Figure 2. Nucleotide sequence and predicted amino acid sequence of the yeast CYPl-18 gene. The complete DNA sequence of the 5.3 kb coding strand from the 5’ P&I site to the 3’ X/WI site is presented. The supposed TATA boxes, CT block and CAAG box and termination TA(T)GT - - - - -TTT sequences are underlined. The predicted amino acid sequence, in the l-letter code, is shown below the DNA sequence. Upper and lower numbers before each line refer to the nucleotide and amino acid positions, respectively. The coding strand is the non-transcribed DNA strand identical with the mRNA. On the opposite strand we have located (Creusot et al., unpublished results) the structural gene of the alanine tRNA (positions 4819 to 4881) adjacent to a sigma element (from position 4997).

displays only one large continuous open reading frame (ORF) starting with an ATG at position 225 and ending with a TGA at position 4674. The two remaining reading frames on the same strand and the three registers on the opposite strand contain 61 to 148 stop codons and could not code for a protein longer than a few dozen amino acid residues. The fact that no TACTAAC element, indicative of a possible intron (Langford & Gallwitz, 1983; Pikielny et al., 1983), is found in the entire sequence strongly suggests that the ORF is not interrupted and can encode a large protein of 1483 amino acid residues with a predicted molecular weight of about 164,220.

(i) 5’ Upstream and 3’ downstream regions

In higher eukaryotes, TATA box sequences required for proper transcription initiation are found around 30 nucleotides upstream from the mRNA start sites (Gannon et al., 1979). In yeast, most but not all genes have TATA boxes. They usually are located further upstream and can be found as far as 150 to 200 nucleotides from the mRNA start sites (Gallwitz et aZ., 1981; Hitzeman et

al., 1981). In the CYPl-18 allele, no canonical TATA box is evident in the region upstream from

the ORF. However, a related sequence, TATCAAA located at position 87, might be an adequate TATA box. This sequence appears to be homologous to that of the MAT alpha2 promoter (Astell et al.,

1981) and is followed at position 120 to 158 by a CT block and at 168 by a CAAG box, sequences frequently found in yeast promoters (Dobson et al.,

1982). Although these sequences are generally associated with highly expressed genes, they are found in poorly expressed genes, as iso2-cyto- chrome c (Montgomery et al., 1980). A remarkable feature in the promoter region is the presence from position 169 to 213 of a long, nearly perfect homopurine stretch containing about 70% adenine. Such sequences are found in a similar location in the promoter of ADRl, a yeast regulatory protein of alcohol dehydrogenase (Hartshorne et al., 1986) and ARGRII a yeast regulatory gene of arginine metabolism (Messinguy et al., 1986). Another candidate for transcription initiation could be a TATA box present at position 240, which is located within the first seven codons of the ORF.

The sequence closest to the consensus for transcription termination (Zaret & Sherman, 1982) occurs within the first 100 nucleotides downstream from the TGA codon at 4674. It consists of three

Novel Structural Domains in the CYPl Protein 271

possible TA(T)GT sequences located at positions 4688, 4730 and 4745 and an A+T-rich region followed by TTT at position 4752. As is usual in yeast genes, there is no evident polyadenylation consensus signal AATAAA in the 562 nucleotides of the 3’ untranslated side of the gene.

detected in the structure of the CYPl-18 protein. We shall analyse them successively from the N to the C terminus.

Downstream and on the opposite strand we have located, by sequence comparisons, the structural gene of the alanine tRNA and a sigma-like element (positions 4819 and 4907, respectively; Fig. 2). This will be discussed elsewhere (Creusot et al., unpublished results).

(ii) The protein coohg region

In eukaryotes, translation is usually initiated at the most 5’ upstream ATG of the mRNA (Kozak, 1981). Even though the CYPl-I8 ORF starts with a methionine residue at position 225, a more favourable initiation codon might be the third ATG at amino acid position 27, which completely fits the consensus sequence AXXATGXXT described for yeast genes (Dobson et al., 1982). Translation of the putative protein involves all the 61 codons at least once with a bias index of 0.13. According to Bennetzen et al. (1982), codon usage in mRNA translation would be correlated to the level of expression. Although the bias index of CYPl-18 is higher than that of other little-expressed regulatory proteins, such as PPRl (Kammerer et al., 1984), PH04 (Legrain et al., 1986) or GAL4 (Laughon & Gesteland, 1984), it is similar to other regulatory proteins like ARGRTT (Messinguy et al., 1986) and ADRI (Hartshorne et al., 1986), which have an index of 0.07, and it is still indicative of a low level of expression of CYPI-18. This is again consistent with the idea of CYPl being a regulatory locus and expected not to be expressed at a high level.

The first sequence consists of a stretch of 21 amino acid residues (residues 64 to 84; Fig. 2) bordered by two C-X2-C motifs. This domain displays a high level of structural homology to the repeated units of the DNA-binding “fingers” found in the TFIIIA factor involved in the 5 S RNA gene regulation of Xenopus laevis (Miller et al., 1985) and several other regulatory proteins (see Berg, 1986; Klug & Rhodes, 1987). Figure 3 shows a com- parison of the putative DNA-binding “finger-like” region of CYPl-18 protein with six other yeast regulatory proteins; GAL4 (Laughon & Gesteland, 1984), PPRl (Kammerer et al., 1984), ARGRII (Messinguy et al., 1986), LAC9 (Breunig & Kuger, 1987), PDRl (Balzi et al., 1987) and LEU3 (Friden & Schimmel, 1987). A high level of sequence homology is evident. Besides the conserved Cys in positions corresponding to 64, 67, 81 and 84 of CYPl, three other residues of the finger are highly conserved: Arg68, Lys71 and Pro79. Homology with TFIIIA and ADRI appears to be more restricted to the C-X,-C motifs. In addition, several different proteins have been shown to contain this structural domain as, for example, the gag proteins of retroviruses, the coat protein of the double- stranded DNA cauliflower mosaic virus (Covey, 1986), the developmental genes of Drosophila (Vincent, 1986), the human glucocorticoid receptors (Weinberger et al., 1985) and, more recently, human transcription factor Spl (Kadonaga et al., 1987) and the sex-determining region of the Y chromosome (Page et al., 1987). The overall lengths of the putative finger part are similar in all these proteins, and range from 11 to 16 residues.

The CYPl-18 protein is highly hydrophilic, with The next striking feature of the N-terminal a total of 12.5% and 9.5% of basic and acidic region of the protein is the presence of an residues, respectively. Charged amino acids are not uninterrupted stretch of 12 glutamine residues and dispersed randomly throughout the sequence. The one glutamic acid residue (residues 177 to 189; N-terminal region displays two basic domains. The Fig. 2). An homologous sequence, consisting of 31 first, from residues 45 to 147, contains 29 o/o basic glutamine residues in tandem, the OPA sequence and only 12% acidic residues; the second, at was first described by Wharton et al. (1985) in the position 210 to 507, consists of 18 y. basic and 10 y. Notch locus, a gene involved in Drosophila acidic residues. The C-terminal region (from 1388 to neurogenesis and has been observed since in many 1483) is acidic, with 13% acidic and 3% basic other developmental or regulatory genes of different residues. A number of interesting features can be organisms. The function of OPA sequences is

64 67 81

Figure 3. Amino acid sequence of the N-terminal, DNA-binding finger-like domain of the CYPl-18 protein. Numbers on the left side of the sequence refer to the position of the amino acids in the corresponding protein. Homologous regions are boxed. (-), Gap alignment.

272 i? Creusot et al.

unknown. They could act as a “hinge”, separating two structural domains of the protein, a plausible interpretation, since polyglutamine stretches could easily adopt a random coil configuration, but they could be viewed as nucleic acid signals, at the DNA or RNA level, involving regular repetitions of the cytosine-purine-purine motif.

The third region of interest is found between amino acid residues 280 and 438 (Fig. 2). As shown in Figure 4, it contains seven adjacent repeat units in tandem. The most conserved amino acids in these repeats form the KCPVDH motif. A few replace- ments occurring within this motif are of con- servative nature (arginine for lysine, isoleucine for valine, asparagine for aspartic acid). This motif is repeated every 16 to 26 residues. As shown in the right panel of Figure 4, the successive KCPVDH motifs could be joined by chelating a metal, thus forming a series of finger-like domains. Although the main features of this structure are reminiscent of that of the TFIIIA finger domain, differences are observed. The order of the putative metal ligands would be Cys-His/Cys-His instead of Cys-Cys/ His-His in TFIIIA or Cys-Cys/Cys-Cys found in the &-finger motif of CYPl and other regulatory yeast proteins (see Fig. 3). The distance between metal ligands would be three amino acid residues (Pro, Val, Asp) in CYPl-18, rather than a distance of four residues in TFIIIA and two in other yeast regulatory proteins (see Fig. 3). In TFIIIA and in Drosophila proteins, the potential fingers would be connected by relatively short stretches of sequences and would protrude from a linear arrangement (Vincent, 1986), while in CYPl-18 protein the looping out regions are of comparable length on both sides of a linear arrangement (Fig. 4). These

I

II

III

Is!

YlI

Sindbir

(4

1 5 10 15 20 25

looping out regions are particularly rich in alanine, proline and serine residues (44 residues out of 102 are Ala, Pro or Ser, while only 24 are expected on a random basis, the x2 value is 10 for 1 degree of freedom). It has been noted that serine/threonine- rich segments might be sites for phosphorylation (Kadonaga et al., 1987), which could be a potential means for regulation of CYPl activity. Repeats II, III and IV are very well-conserved both in length (24 to 26 residues) and in nature of amino acids, while repeats I, V, VI and VIII have less homology. Moreover, repeat V shows a noteworthy exception, as the KCPVDH motif itself is not totally conserved: alanine and methionine residues substitute the highly conserved cysteine and proline residues, suggesting that this unit might be functionally different. It should be stressed that it is the overall structure and organization of the KCPVDH repeat but not its primary sequence that is comparable to the DNA-binding domains of other regulatory proteins (there is no sequence homology between the segments shown in Figs 3 and 4). In a computer-assisted search, we have found the KCPVDH sequence motif in the non-structural protein NS3 of Sindbis virus, a single-stranded RNA alpha virus (Strauss et al., 1984). The exact function of this protein is unknown but it is believed to be involved in the expression and/or replication of the viral genome. In this protein, the exact KCPVDH motif is present only once and the adjacent amino acid sequence displays some similarity with the repeats of the CYPl-18 protein (Fig. 4).

The C-terminal part of the CYPI-18 protein is particularly acid (the last 163 residues have 9.2% excess of acid over basic amino acids, in contrast to

P

b)

Figure 4. Amino acid sequence of the internal tandemly repeating KCPVDH motif in the CYPl-18 protein. (a) The amino acid positions within the repeats are numbered above the sequence and the repeats from I to VII. Numbers on each side of the repeats refer to the positions of the residues in the CYPl-18 sequence. Homologous regions are boxed: identical amino acids are underlined. The homologous region of the non-structural protein NS3 of the Sindbis virus

(Strauss et al., 1984) is shown below. (b) A hypothetical presentation of the successive repeats, stressing the possibility of forming alternating and looping out domains by chelating a metal. The particularly abundant residues of alanine, proline and serine are circled (44 such residues are present, while 24 are expected on a random basis).

Novel Structural Domains in the GYP1 Protein 273

the overall excess of basic over acid residues in the whole ORF). This part could be involved in the transcriptional activation following the suggestion of Hope & Struhl (1986) and Legrain et al. (1986), that properly located acidic regions may be sufficient for transcription activation.

(d) RNA transcripts of the CYPl gene

Poly(A)+-containing RNAs isolated from strains carrying either the wild-type or the CYPl-18 allele were analysed by Northern blotting with five different probes spanning various regions of the CYPl gene.

The largest probe used consisted of the purified 5.3 kb PstI-XhoI fragment, which contains the entire gene (Fig. 5, lanes c and d). The second probe contained the 900 base-pair SphI-NruI fragment internal to the CYPl coding region and inserted in YIp2 (Fig. 5, lanes a and b). This probe carries also the URA3 fragment used as an internal standard. In addition to these double-stranded probes, three single-stranded probes cloned in Ml3 were used. Each contained a 32P-labelled DNA insert a few hundred bases long synthesized on a template corresponding to the non-transcribed strand (see Materials and Methods). Thus, the labelled DNA is complementary to the mRNA. The probes were derived either from the 5’ end, the middle portion or the 3’ end of the gene (Fig. 5, lanes e to i). A coherent set of results is obtained with this variety of probes.

The major RNA species transcribed from the CYPl gene is 48 kb long and can be detected with all the probes used, whether located at the 5’ end, at the middle, or at the 3’ end of the gene. The 4.8 kb value was deduced from calibrations with various RNAs revealed with specific probes: URA3, 0.9 kb (Bach et al., 1979), CYB2 1.9 kb (Guiard, 1985) and NAMZ, 3.0 kb (Labouesse et al., 1985). This length is in excellent agreement with the size of the ORF, which is 4449 base-pairs (Fig. 2) and strongly suggests that the entire ORF is transcribed. Furthermore, the transcribed regions flanking the ORF must be relatively short (4.8 kb -4.5 kb = O-3 kb). This is consistent with the presence of putative transcription initiation signals some 140 base-pairs upstream from the first AUG codon and transcription termination signals some 100 base-pairs downstream from the trans- lation termination codon (see Fig. 2). In addition to the main 4.8 kb transcript, a less abundant RNA is revealed by a subset of DNA probes (Fig. 5, lanes a, c, d and e). Its estimated length is 2.2 kb. Although the hybridization signal is weak, we believe that it does correspond to a specific transcript of the 5’ part of the CYPl gene for a number of reasons. The 2.2 kb RNA is detected both with the two double- stranded probes covering the entire gene and the middle portion of CYPl and with the single- stranded probe 28L derived from the 5’ end of the gene (Fig. 5, lane e). Importantly enough, the 2.2 kb RNA cannot be seen when either the probes

(a)

SP N Xb X

t + t t

28L 9H IIOH

WC, M Y

YIP2

(b) a b cd ef ghi

O-5 Kb -

4*8-

2*2-

ura3 -

Figure 5. Transcripts of the wild-type CYPl+ and of the overproducer CYPl-I8 mutant analysed by Northern blotting. (a) Restriction map of the CYPl gene and positions of the different probes used. Straight lines, double-stranded DNA probes; wavy lines, single-stranded Ml3 recombinant probes. (b) Pattern of hybridization of the different probes to RNA from CYPl-18 (lanes a, c, e, g and h) and CYPl+ strains (lanes b, d, f and i). Each lane contains approx. 5 pg of poly(A)+ RNA, fractionated on 1 o/0 (w/v) agarose slab gel under denaturing conditions as

described in Materials and Methods. Lanes a and b were hybridized with YIp2 and comparison between CYPl and URA3 signals gave an upper limit to the level of CYPl transcript. Lanes c to i derive from a single filter that was cut into 4 strips. The strip with lanes c and d was hybridized to the double-stranded, PatI-XhoI purified fragment. Lanes c and f were hybridized to Ml3 recombinant clone 28L, lane g to clone 9H, and lanes h and i to clone 1lOM. Numbers on the left correspond to the size (in kb) of the CYPl transcripts, numbers on the right indicate the size of the calibration standards revealed after de- and rehybridization (for details, see the text).

9H or 1lOH were used (Fig. 5, lanes g, h and i). An interpretation of these results could be that CYPl gives rise to at least two transcripts, and that both of them would be initiated similarly at the 5’ region of the gene but the shorter one would terminate prematurely. The size of the 2.2 kb transcript is in good agreement with the observation that hybrid- ization occurs with two different probes that are about 2 kb apart at the 5’ portion of CYPl (28L and YIp2) and that no signal is observed when a smaller probe that corresponds to the 3’ end of YIp2 (clone 9H) is used.

274 F. Creusot et al.

4. Discussion

The isolated 5.3 kb PstI-XhoI fragment contains the CYPl-18 function that stimulates the expression of iso2-cytochrome c structural gene CYPS. Disruption of the CYPl locus by integration of a plasmid-borne internal gene fragment, at its homologous chromosomal site, converts the CYPl’ and CYPl-18 “overproducer” alleles to a null allele that has a phenotype indistinguishable from the previously characterized “underproducer” cypl-23 allele. These results strongly support the earlier conclusion that CYPI+ and CYPI-18 are two positive regulatory alleles modulating the synthesis of iso-cytochromes c (Verdi&e & Petrochilo, 1979; Verdi&e et al., 1985).

In this paper, we present the nucleotide and predicted amino acid sequences of the entire 5.3 kb fragment and an analysis of the gene transcripts.

Northern blot analysis reveals the presence of two transcripts approximately 4.8 and 2.2 kb long. The major species is the 4.8 kb transcript. As already discussed, this size agrees well with the size of the ORF present in the CYPI-18 allele. The significance of the 2.2 kb transcript is not clear. It might reflect the existence of another gene strongly homologous to the 5’ part of CYPI or it might, result from a preferential cut in the major transcript. It might also result from premature termination due to the presence of a sequence that resembles the polyadenylation consensus signal TATG . . TTT (Zaret & Sherman, 1982) present at position 1997 and 2080 after the ATG and which would generate transcripts of about 2 kb.

The sequence displays a very large uninterrupted ORF able to encode a putative protein of 164,220 Da. Analysis of the sequence allowed us to identify several domains in the CYPl-18 sequence. An acidic domain localized in the C-terminal part of the protein is reminiscent of the short acidic sequences found in the regulatory proteins GAL4 and GCN4, which have been shown to be sufficient to confer a regulatory activity when fused to a DNA-binding region (Hope & Struhl, 1986; Ma & Ptashne, 1987). The hypothesis that the acidic C-terminal part of the CYPl-18 protein has an activating function is consistent with the observa- tion that the 6 kb XbaI fragment (fragment 5 in Fig. l), in which 145 codons at the C terminus of the ORF are missing failed to complement the lactate- phenotype of the transformed recipient cycl-I CYP3 CYPI strain. Thus, the CYPI-18 protein, which lacks only the C-terminal 10% of its total length, is unable to activate the transcription of the iso2-cytochrome c mRNA. Another argument in favour of the C-terminal region responsible for the activating function is the genetic localization (data not shown) of the null allele cypl-23 at the C-terminal part of the protein.

The most interesting features of the CYPl protein are the two conspicuous domains included in the N-terminal basic region and separated by the OPA-like sequence. The N-terminal part of the protein contains a 28-residue sequence that is

remarkably similar to the DNA binding motif that emerged from the work on TFIIIA (Miller et al., 1985). The classical structure of the repeated unit is based on the tetrahedral co-ordination of the zinc atom by either two cysteine and two histidine residues (abbreviated C-C/H-H), as in the case of TFIIIA, the yeast regulatory gene ADRl and several eukaryotic regulatory proteins (Berg, 1986) or by four cysteine residues (abbreviated C-C/C-C), as in the other known regulatory genes of yeast. Tt is assumed that this structure allows the folding of the motif into a.DNA binding finger. In yeast, the sequence of six regulatory genes PPRI, GAL4, ARGRII, PDRI, LAC9 and LEU3 has revealed the presence of one such finger (Kammerer et al., 1984; Laughon & Gesteland, 1984; Messinguy et al., 1986; Balzi et al., 1987; Breunig & Kuger, 1987; Friden & Schimmel, 1987). As shown in Figure 3, the CYPl protein conforms perfectly to a C-C/C-C consensus sequence with several additional residues (Arg, Lys, Cys, Pro) being conserved among the seven yeast regulatory proteins, and it could form an orthodox Zn finger between Cys64-Cys67 and Cys81-Cys84. However, in addition to this motif, the CYPl protein displays an additional characteristic; that is, the close presence, at positions 91 to 94, of a His-His motif. In theory, an alternative finger could be formed, either between Cys64-Cys67 and HisSl- His94, or between Cys81-Cys84 and HisSlIHis94, the two histidine residues acting as ligands for zinc instead of the four cysteine residues present in other yeast regulatory proteins. It should be emphasized that this is the first time, to our knowledge, that two pairs of cysteine and one pair of histidine residues are found adjacent in the same protein. We believe that their presence is not fortuitous but is related to the physiological role of the CYPl regulatory protein. This hypothesis is further documented and discussed in the accompanying paper (Verdiere et al., 1988).

The second domain of particular interest extends from amino acid residue 280 to 420 and consists of seven repeated units, which include the motif KCPVDH (Fig. 4). We propose two possible functions for this region. As mentioned in Results, these repeats could form a multiple finger-like structure and have potential for binding DNA and/or RNA. The motif C-H/C-H present here is different from the canonical C-C/C-C or C-C/H-H structures but has been observed in other proteins involved in nucleic acid manipulations (Berg, 1986; Herbert et aE., 1988). Another and more interesting possibility is that the KCPVDH repeated unit is not folded around a zinc atom but around a physiological “sensor”, for example heme or a metal atom, such as iron, copper, cobalt, vanadium or molybdenum, which could be either in an oxidized or in a reduced form according to the redox state of the cell. The presence in the repeated sequences of amino acid residues, particularly methionine, that could form a heme ligand reminiscent of that of some cytochrome c proteins strengthens the idea that heme could be the physiological redox sensor.

Novel Structural Domains in the C YPl Protein 275

It is thus conceivable that the CYPl protein itself would be a hemoprotein of a low redox potential.

This paper is dedicated to the memory of Marika Somlo. We thank one of the referees for underlining the presence and the interest of a conserved methionine residue in the KCVPVDH repeats. We are grateful to J. Banroques, B. Guiard, C. J. Herbert, C. Jacq, R. Labbe, M. Labouesse, E. Petrochilo for valuable suggestions; to J. Renowicki for help in the computer analysis, B. Poirer for typing the manuscript and C. Grandhamp for artwork. This work was supported by grants from the CNRS, ATP Microbiologic and Biologie Moleculaire de Gene, Ligue Nationale Francaise contre le Cancer and INSERM no. 86024.

References

Astell, C. R., Ahlstrom-Jonasson, L., Smith, M., Tatchell, K., Nasmyth, K. A. & Hall, B. D. (1981). CelE, 27, 15-24.

Bach, M. L., Lacroute, F. & Botstein, D. (1979). Proc. Nat. Acad. Sci., U.S.A. 76, 386-390.

Balzi, E., Chen, W., Vlaszewski, S., Copieaux, E. & Goffeau, A. (1987). J. Biol. Chem. 262, 16871-16879.

Bennetzen, J. L. & Hall, B. D. (1982). J. Biol. Chem. 257, 3018-3025.

Berg, ,J. M. (1986). Science, 232, 485-487.

Breunig, D. & Kuger, P. (1987). Mol. Cell. BioZ. 7, 44OC- 4406.

Clavilier, I,., P&e, G. Slonimski, P. P. (1969). MoZ. Gen.

Genet. 104, 1955218. Clavilier, L., PB&Aubert, G., Somlo, M. & Slonimski,

P. P. (1976). Biochimie, 58, 155-172. Covey, S. N. (1986). NucZ. Acids Res. 14, 623-633. Dayhoff, M. 0. (1978). Editor of Atlas of Protein Sequence

and Structure, vol. 5, suppl. 3, National Biochemical Research Foundation, Washington, DC.

Dobson, M. J., Truite, M. F., Roberts, N. A., Kingsman, A. J. & Kingsman, S. M. (1982). NucZ. Acids Res. 10, 2625-2637.

Downie, J. A., Stewart, J. W., Brockman, M., Schweingruber, A. M. & Sherman, F. (1977). J. MOE. BioZ. 133, 369-384.

Ephrussi, B. & Slonimski, P. P. (1950). Biochim. Biophys. Acta, 6, 256-267.

Fraser, R. (1975). Eur. J. Biochem. 60, 477-486.

Friden, P. & Schimmel, P. (1987). Mol. Cell. BioZ. 7, 2708-27 17.

Fukuhara, H. (1966). J. Mol. BioZ. 17, 334. Gannon, F., O’Hare, K., Perrin, F., Le Pennec, J. P.,

Benoist, C., Cachet, M., Breathnach, R., Royal, A., Garapin, A., Cami, B. & Chambon, P. (1979). Nature

(London), 278, 428-434. Gallwitz, D., Perrin, F. & Seidel, R. (1981). NucZ. Acids

Res. 9, 6339-6350. Gerbaud, C., Elmerich, C., Tandeau de Marsac, N. C.,

Chocat, P., Charpin, N., Guerineau, M. & Aubert, J. P. (1981). Curr. Genet. 3, 173-180.

Guarente, L. (1985). Current Communications in MoZecular Biology (Gluzman, Y., ed.), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

Guarente, L. & Mason, T. (1983). CeZZ, 32, 1279-1286. Guarente, L., Lalonde, B., Gifford, P. & Alani, E. (1984).

CeZZ, 36, 503-511. Guiard, B. (1985). EMBO J. 4, 3265-3272. Hartshorne, T. A., Blumberg, H. & Young, E. T. (1986).

,1%Lture (London), 320, 283-287.

Herbert, C. J., Labouesse, M., Dujardin, G. & Slonimski, P. P. (1988). EMBO J. 7, 473-483.

Hereford, L. & Rosbach, M. (1977). Cell, 10, 453-462. Heusterspreute, M., Oberto, J., Vinh Ha Thi & Davidson,

J. (1985). Gene, 34, 363-366. Hitzeman, R. A., Hagie, F. E., Levine, H. L., Goeddel,

D. V., Ammerer, G. & Hall, B. D. (1981). Nature (London), 293, 717-722.

Hope, I. A. & Struhl, K. (1986). Cell, 46, 885-894. Kadonaga, J. T., Carrier, K. R., Masiarz, F. R. & Tjian,

R. (1987). CeZZ, 51, 1079-1090. Kammerer, B., Guyonvarch, A. & Hubert, ,J. C. (1984).

J. Mol. BioZ. 180, 239-250. Keegan, L., Gill, G. & Ptashne, M. (1986). Science, 231,

699-704. Klug, A. $ Rhodes, D. (1987). Trends Biochem. Sci. 12,

464-469. Kozak, M. (1981). NucZ. Acids Ras. 9, 5233-5252. Labouesse, M., Dujardin, G. & Slonimski, P. P. (1985).

Cell, 41, 133-143. Langford, C. J. & Gallwitz, D. (1983). CeZZ, 33, 519-527. Laughon, A. & Gesteland, R. F. (1984). Mol. Cell. BioZ. 4,

26C-267. Laz, T. M., Pietras, D. F. & Sherman, F. (1984). Proc.

Nat. Acad. Sci., U.S.A. 81, 4475-4479. Legrain, M., De Wilde, M. & Hilger. F. (1986). NucZ.

Acids Res. 14, 3059-3073. Ma, J. & Ptashne, M. (1987). Cell, 48, 847-853. Maccechini, M. M., Rudin, Y., Blobel, G. & Shatz, G.

(1979). Proc. Nat. Acad. Sci., U.S.A. 76, 343-347. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982).

Editors of Molecular Cloning: A Laboratory ManuaZ,

Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

Messinguy, F., Dubois, E. & Descamps, F. (1986). Eur. J.

Biochem. 157, 77-81. Miller, J. H. (1972). Editor of Experiments in Molecular

Genetics, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

Miller, J., McLachlan, A. D. 6 Klug, A. (1985). EMBO J.

4, 1609-1614. Mills, D. R. & Kramer, F. R. (1979). Proc. Nat. Acad.

Sci., U.S.A. 76, 2232-2235. Montgomery, D. L., Leung, D. W., Smith, M., Shalit, P.,

Faye, G. & Hall, B. D. (1980). PTOC. Nat. Acad. Sci.,

U.S.A. 77, 541-545. Montgomery, D. L., Boss, J. M., McAndrew, S. J., Marr,

L., Walthall, D. A. & Zitomer, R. 8. (1982). J. BioZ.

Chem. 257, 7756-7761. Oka, M., Hayashi, T. & Nakajima, A. (1985). Z’olymer J.

17, 621-631. Orr-Weaver, T. L., Szostak, J. W. & Rothstein, R. J.

( 1981). Proc. Nat. Aead. Sci., U.S.A. 78, 6354-6358.

Page, D. C., Mosher, R., Simpson, E. R., Fisher, E. M., Mardon, G., Pollack, J., McGillivray, B., de la Chapelle, A. & Brown, L. (1987). Cell, 51, 1091-1104.

Pfeifer, K., Prezant, T. & Guarente, L. (1987). Cell, 49, 19-27.

Pikielny, C. W., Teem, ,J. L. & Rosbash, M. (1983). CeZZ, 34, 395-403.

Pinkham, L. & Guarente, L. (1985). Mol. Cell. Biol. 5, 3410-3416.

Rothstein, R. J. & Sherman, F. (1980). Genetics, 94, 87 l-889.

Sanger, F., Nicklen, S. & Coulson, A. R. (1977). Proc. Nat. Acad. Sci., U.S.A. 74, 5463-5467.

Sherman, F., Stewart, J. W., Parker, J., Inhaber, E., Shipman, M. A., Putterman, G. J., Gardisky, R. L. & Margoliash, E. (1968). J. BioZ. Chem. 243, 5446-5456.

276 F. Creusot et al.

Slonimski, P. P. (1955). In Proceeding of the Third International Congress of Biochemistry, Bruxelles (LiBbecq, C., ed.), pp. 242-252, Academic Press, New York.

Slonimski, P. P., Acher, R., PBti, G., Sels, A. & Somlo, M. (1965). In International Symposium on Mechanisms of Regulation of Cellular Activities in Microorganisms, C.N.R.S., Mareeille, pp. 435-461, Gordon and Breach, New York.

Smith, M., Leung, D. W., Gillam, S., Astell, C. R., Montgomery, D. L. & Hall, B. D. (1979). Cell, 16, 753-761.

Strauss, E. G., Rice, C. M. & Strauss, J. H. (1984). Virology, 133, 92-l 10.

Thomas, P. (1980). Proc. Nat. Acad. hi., U.S.A. 77, 5201-5205.

Verdi&e, J. & Petrochilo, E. (1975). Biochem. Biophys. Res. Commun. 67, 1451-1458.

Verdi&e, J. & Petrochilo, E. (1979). Mol. Gen. Genet. 175, 209-216.

Verdi&e, J., Creusot, F. & Guerineau, M. (1985). Mol. Gen. Genet. 199, 524-533.

Verdi&e, J., Creusot, F., Guarante, L. & Slonimski, P. P. (1986). Curr. Genet. 10, 339-342.

Verdi&e, J., Gaisne, M., Guiard, B., Deframoux, P;. & Slonimski, P. P. (1988). J. Mol. Biol. 204, 277-282.

Vincent, A. (1986). Nucl. Acids Res. 14, 4385-4391. Weinberger, C., Hollenberg, S. M., Rosenfeld, M. G. &

Evans, R. M. (1985). Nature (London), 318,670-672. Wharton, K. A., Yedvobnick, B., Finnerty, V. G. &

Artavanis-Tsakonas, S. (1985). Cell, 40, 55-62. Wright, C. F. & Zitomer, R. S. (1984). Mol. Cell. Biol. 4,

2023-2030. Wright, C. F. & Zitomer, R. S. (1985). Mol. Cell. Biol. 5,

2951-2958. Yanish-Perron, C., Vieira, J. & Messing, J. (1985). Gene,

33, 103-I 19. Zaret, K. S. & Sherman, F. (1982). CeEl, 28, 563-573. Zitomer, R. S., Sellers, J. W., McCarter, D. W., Hastings,

G. A., Wick, P. & Lowry, C. (1987). Mol. Cell. Biol. 5, 2521-2526.

Edited by A. Klug