of chemistry 20, issue 111524-10528, 1981 in analysis of 5 ... · the journal of biological...

5
THE JOURNAL OF BIOLOGICAL CHEMISTRY Prmted In USA. Vol. 256, No. 20, Issue of October 25. pp. 111524-10528, 1981 Analysis of 5’ Flanking Sequences and Intron-Exon Boundaries of the Rat Prolactin Gene* (Received for publication, May 22, 1981, and in revised form, July 2, 1981) Richard A. Maurer, Christopher R. Erwin$& and John E. Donelson4 From the Department of Physiology and Biophysics and the +Department of Biochemistry, University of Iowa, Iowa City, Iowa 52242 A rat genomic DNA clone containing the 5’ flanking region, three exons, two introns, and a portion of a third intron of the rat prolactin gene was isolated and characterized. Sequence determinations were used to identify the exon-intron boundaries and analyze the DNA region upstream from the initiator methionine codon. Another genomic DNA clone that we previously characterized (Gubbins,E. J., Maurer, R. A., Lagrimini, M., Erwin, C. R., and Donelson, J. E. (1980) J. Biol. Chem. 255, 8655-8662) contains two additional exons and the 3’ flanking region of the prolactin gene. These five exons account for the complete coding sequence of rat preprolactin. Their relative locations in the two genomic DNA clones indicate that the five exons and four intervening sequences (introns) of the rat prolactin gene comprise approximately 10 kilobases of DNA. The sequence at the Intron A-Exon I1 boundary was found to contain two separate splicing sites offset by three nucleotides. These two splicing sites can explain a dis- crepancy of one codon in two published preprolactin cDNA sequences. The 5’ terminus of prolactin mRNA was mapped using the S1 nuclease protection proce- dure. The results suggest that transcription is initiated 54 nucleotides upstream from the initiator methionine codon. Analysis of the sequence preceding this site reveals (i) the sequence TATAAA at positions -21 to -22 which, by analogy with other systems, is likely involved in initiation of prolactin gene transcription and (ii) the palindrome, TGATTATATATATATTCA, at positions -62 to -45 which may be a site involved in the regulationof prolactin gene transcription. The pituitary hormone, prolactin, provides an interesting system for analysis of the hormonal regulation of gene expres- sion. A number of studies have shown that several different hormones alter the levels of Drolactin mRNA in Dituitarv sis of the amino acid sequence (7-11) and nucleotide coding sequence (12-17) has demonstrated considerable homology forthesethreehormones.This homology has led to the suggestion that they may have evolved from a common an- cestral gene via gene duplication (10). Despite the similarity in the structures of the three hormones, the regulation of their synthesis is quite different. Prolactin mRNA levels are regu- lated by estradiol, thyrotropinreleasing hormone, and dopa- mine (1-6), while growth hormone mRNA levels are stimu- lated by glucocorticoids and thyroid hormone (18-20). Cho- rionic somatomammotropin mRNA levels are develop- mentally regulated (21) and purified chemical regulators have not yet been identified. Analysis of the prolactin chromosomal gene may provide some insight into the structures which are involved in the regulation of gene expression. Furthermore, the eventual com- parison of the prolactin, growth hormone, and chorionic so- matomammotropin genes may contribute to an understanding of the mechanisms which facilitate the differential expression and regulation of these genes. We recently reported the iso- lation of a genomic DNA clone containing prolactin sequences (16). However, the genomic prolactin clone contained only the 3’ end of the gene. Chien and Thompson (22) have reported the isolation of overlapping genomic clones containing the complete prolactin gene and these clones were characterized by restriction enzyme mapping and R-loop analysis. Clearly, further analysis of the prolactin gene will require sequence analysis of the intron-exon boundaries and analysis of the sequences flanking the transcription initiation site. We report here the isolation and characterization of a genomic clone containing the5‘ end of the prolactin gene. The 5‘ end of the gene was determined by S1 nuclease mapping. The nucleotide sequence of the 5’ flanking sequences and all of the intron- exon boundaries of the prolactin gene were determined. EXPERIMENTAL PROCEDURES cells. For instance, estradiol and thyrotropin releasing her- Isolation of the Cloned Fragment of Prolactin Gene-A library of mOne prolactin m~~~ accumulation (1-5), while partial Hue I11 fragments of rat DNA cloned in bacteriophage X was dopamine treatment decreases prolactin m ~ ~ A levels (6). generously provided by L. Jagodzinski and J. Bonner (California The regu1ation Of prolactin mRNA levels may be of prolactin sequences as described previously (16) using a fragment Institute of Technology).The library was screened for the presence due to changes in the transcription of the prolactin gene Or to of the cloned prolactin cDNA which had been labeled in vitro with changes in post-transcriptional events such as processing of “P by nick translation (23) as a hybridization probe. Individual clones the initial transcript or degradation of the maturemRN.4. from this library were maintained as described by Blattner et al. (24). ~ l ~ ~ , is part of a gene family which includes Eco RI fragments from the isolated x clone were subcloned in pBR322 growth hormone and chorionic somatomammotropin. Analy- described by Clewell and Helinski (25). as described (16) and purified recombinant plasmid DNA prepared as Grant AM21803, National Science Foundation Grant PCM 76-13461, were radiolabeled at recessed 3’ termini using Escherichia coli DNA * This Research was supported by National Institutes of Health DNA Sequence Determinations-In most cases, DNA fragments and Diabetes and Endocrine Core Center Grant AM 25295. The costs polymerase I (BoehringerMannheim)andtheappropriate [a-:”P] of publication of this articlewere defrayed in part by the payment of dNTP (2000 Ci/mmol) as described (26). The 5’ termini were labeled page charges. This article must therefore be hereby marked “aduer- using polynucleotide kinase and [y-”P]ATP (2000 Ci/mmol) as de- tisement” in accordance with 18 U.S.C. Section 1734 solely to indicate scribed by Maxam and Gilbert (27).In one case, the 3’ termini of an this fact. Eco RI-Hue I11 fragment were labeled using [a-’lP]dGTP (2000 Ci/ Grant GM07091. 3 Recipient of National Institutes of Health Predoctoral Training mmol) and T4 DNA polymerase (New England BioLabs) as described (28).Uniquely labeled DNA fragments were subjected to the modifi- 10524

Upload: others

Post on 04-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OF CHEMISTRY 20, Issue 111524-10528, 1981 In Analysis of 5 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Prmted In USA. Vol. 256, No. 20, Issue of October 25. pp. 111524-10528, 1981

THE JOURNAL OF BIOLOGICAL CHEMISTRY

Prmted In U S A . Vol. 256, No. 20, Issue of October 25. pp. 111524-10528, 1981

Analysis of 5’ Flanking Sequences and Intron-Exon Boundaries of the Rat Prolactin Gene*

(Received for publication, May 22, 1981, and in revised form, July 2, 1981)

Richard A. Maurer, Christopher R. Erwin$& and John E. Donelson4 From the Department of Physiology and Biophysics and the +Department of Biochemistry, University of Iowa, Iowa City, Iowa 52242

A rat genomic DNA clone containing the 5’ flanking region, three exons, two introns, and a portion of a third intron of the rat prolactin gene was isolated and characterized. Sequence determinations were used to identify the exon-intron boundaries and analyze the DNA region upstream from the initiator methionine codon. Another genomic DNA clone that we previously characterized (Gubbins, E. J., Maurer, R. A., Lagrimini, M., Erwin, C. R., and Donelson, J. E. (1980) J. Biol. Chem. 255, 8655-8662) contains two additional exons and the 3’ flanking region of the prolactin gene. These five exons account for the complete coding sequence of rat preprolactin. Their relative locations in the two genomic DNA clones indicate that the five exons and four intervening sequences (introns) of the rat prolactin gene comprise approximately 10 kilobases of DNA. The sequence at the Intron A-Exon I1 boundary was found to contain two separate splicing sites offset by three nucleotides. These two splicing sites can explain a dis- crepancy of one codon in two published preprolactin cDNA sequences. The 5’ terminus of prolactin mRNA was mapped using the S1 nuclease protection proce- dure. The results suggest that transcription is initiated 54 nucleotides upstream from the initiator methionine codon. Analysis of the sequence preceding this site reveals (i) the sequence TATAAA at positions -21 to -22 which, by analogy with other systems, is likely involved in initiation of prolactin gene transcription and (ii) the palindrome, TGATTATATATATATTCA, at positions -62 to -45 which may be a site involved in the regulation of prolactin gene transcription.

The pituitary hormone, prolactin, provides an interesting system for analysis of the hormonal regulation of gene expres- sion. A number of studies have shown that several different hormones alter the levels of Drolactin mRNA in Dituitarv

sis of the amino acid sequence (7-11) and nucleotide coding sequence (12-17) has demonstrated considerable homology for these three hormones. This homology has led to the suggestion that they may have evolved from a common an- cestral gene via gene duplication (10). Despite the similarity in the structures of the three hormones, the regulation of their synthesis is quite different. Prolactin mRNA levels are regu- lated by estradiol, thyrotropin releasing hormone, and dopa- mine (1-6), while growth hormone mRNA levels are stimu- lated by glucocorticoids and thyroid hormone (18-20). Cho- rionic somatomammotropin mRNA levels are develop- mentally regulated (21) and purified chemical regulators have not yet been identified.

Analysis of the prolactin chromosomal gene may provide some insight into the structures which are involved in the regulation of gene expression. Furthermore, the eventual com- parison of the prolactin, growth hormone, and chorionic so- matomammotropin genes may contribute to an understanding of the mechanisms which facilitate the differential expression and regulation of these genes. We recently reported the iso- lation of a genomic DNA clone containing prolactin sequences (16). However, the genomic prolactin clone contained only the 3’ end of the gene. Chien and Thompson (22) have reported the isolation of overlapping genomic clones containing the complete prolactin gene and these clones were characterized by restriction enzyme mapping and R-loop analysis. Clearly, further analysis of the prolactin gene will require sequence analysis of the intron-exon boundaries and analysis of the sequences flanking the transcription initiation site. We report here the isolation and characterization of a genomic clone containing the 5‘ end of the prolactin gene. The 5‘ end of the gene was determined by S1 nuclease mapping. The nucleotide sequence of the 5’ flanking sequences and all of the intron- exon boundaries of the prolactin gene were determined.

EXPERIMENTAL PROCEDURES

cells. For instance, estradiol and thyrotropin releasing her- Isolation of the Cloned Fragment of Prolactin Gene-A library of mOne prolactin m~~~ accumulation (1-5), while partial Hue I11 fragments of rat DNA cloned in bacteriophage X was

dopamine treatment decreases prolactin m ~ ~ A levels (6). generously provided by L. Jagodzinski and J. Bonner (California

The regu1ation Of prolactin mRNA levels may be of prolactin sequences as described previously (16) using a fragment Institute of Technology). The library was screened for the presence

due to changes in the transcription of the prolactin gene Or to of the cloned prolactin cDNA which had been labeled in vitro with changes in post-transcriptional events such as processing of “P by nick translation (23) as a hybridization probe. Individual clones the initial transcript or degradation of the mature mRN.4. from this library were maintained as described by Blattner et al. (24).

~ l ~ ~ , is part of a gene family which includes Eco RI fragments from the isolated x clone were subcloned in pBR322 growth hormone and chorionic somatomammotropin. Analy- described by Clewell and Helinski (25). as described (16) and purified recombinant plasmid DNA prepared as

Grant AM21803, National Science Foundation Grant PCM 76-13461, were radiolabeled at recessed 3’ termini using Escherichia coli DNA * This Research was supported by National Institutes of Health DNA Sequence Determinations-In most cases, DNA fragments

and Diabetes and Endocrine Core Center Grant AM 25295. The costs polymerase I (Boehringer Mannheim) and the appropriate [a-:”P] of publication of this article were defrayed in part by the payment of dNTP (2000 Ci/mmol) as described (26). The 5’ termini were labeled page charges. This article must therefore be hereby marked “aduer- using polynucleotide kinase and [y-”P]ATP (2000 Ci/mmol) as de- tisement” in accordance with 18 U.S.C. Section 1734 solely to indicate scribed by Maxam and Gilbert (27). In one case, the 3’ termini of an this fact. Eco RI-Hue I11 fragment were labeled using [a-’lP]dGTP (2000 Ci/

Grant GM07091. 3 Recipient of National Institutes of Health Predoctoral Training mmol) and T4 DNA polymerase (New England BioLabs) as described

(28). Uniquely labeled DNA fragments were subjected to the modifi-

10524

Page 2: OF CHEMISTRY 20, Issue 111524-10528, 1981 In Analysis of 5 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Prmted In USA. Vol. 256, No. 20, Issue of October 25. pp. 111524-10528, 1981

Prolactin Gene Structure 10525

cation and cleavage reactions of Maxam and Gilbert (27) and electro- phoresed using the thin sequencing gels of Sanger and Coulson (29).

SI Mapping of the 5' End of the Prolactin Gene-The 5' end of the prolactin gene was mapped essentially using the Weaver and Weissman modification (30) of the Berk and Sharp (31) procedure. A 700-base pair Sac I-Msp I fragment suspected of containing the 5' end of the prolactin gene was labeled at the 5' termini with 32P using polynucleotide kinase as described by Maxam and Gilbert (27). This fragment was digested with Hue 111 and a 242-base pair fragment containing the radiolabeled Msp I site isolated by gel electrophoresis. This was hybridized to 0.2 pg of rat pituitary poly(A)-containing RNA prepared as described (32) in a 20-pl mixture containing 80% form- amide, 0.3 M NaCI, 40 mM 1,4-piperazinediethanesulfonic acid (pH 6.5), 1 mM EDTA, and 1 mg/ml of yeast tRNA. The hybridization mixture was covered with mineral oil and incubated for 4 h at 48 "C. Then, 0.2 ml of 0.1 M NaC1, 0.1 M sodium acetate (pH 5.0), and 1 mM ZnSO, containing 40 units of S1 nuclease (Miles) were added and the sample was incubated for 1 h at 37 "C. After extraction with phenol/ chloroform, the SI-digested sample was precipitated by addition of 5 pg of yeast tRNA and 3 volumes of ethanol. SI digestion products were analyzed using thin sequencing gels (29).

Containment Conditions-In accordance with the amended Na- tional Institutes of Health Guidelines for Recombinant DNA Re- search (1980), experiments with bacteriophage X employed P1 + HV2 containment (E. coli strain DP50SupF and X Charon 4A) and exper- iments using plasmid DNA utilized PI + HVI containment (E. coli strain HBlOl and plasmid pBR322).

RESULTS AND DISCUSSION

Identification of a Cloned DNA Segment Containing the 5' End of the Rat Prolactin Gene-We have previously iso- lated a cloned DNA fragment containing the 3' portion of the rat prolactin gene from a library containing partial Eco RI fragments cloned in bacteriophage X Charon 4A (16). Initial attempts to screen the same library for clones containing the 5' portion of the rat prolactin gene using single-stranded ["PI cDNA as a probe were not successful for unknown reasons. So the two major components of the genome screening were changed. First, a different library containing partial Hue 111 fragments of rat DNA ligated into X Charon 4A arms by Eco RI linkers was screened. This library was generously provided by L. Jagodzinski and J. Bonner (California Institute of Tech- nology). Second, a 139-base pair Alu I fragment containing 5' sequences of the cloned prolactin cDNA described by Gubbins et al. (16) was prepared for use as a hybridization probe. The isolated 139-base pair fragment was labeled in vitro with 32P by nick translation and used to screen approximately lo5 phage for prolactin sequences. Phage from plaques which hybridized to the probe in the first screening were rescreened twice. One plaque was found to be positive for prolactin sequences in all three screenings. The phage in this plaque was designated XPRLlOl and used to prepare APRLlOl DNA.

Digestion of XPRLlOl DNA with Eco RI demonstrated that this clone contains rat DNA fragments of 9.0, 1.5, and 0.2 kb' for a total insert size of nearly 11 kb. The region of the rat prolactin gene that is cloned in XPRLlOl is shown in Fig. 1. Further restriction enzyme analysis of this clone demonstrated that it contains Sac I and Xba I fragments very similar to those mapped by Chien and Thompson (22). Subsequent hybridization of radiolabeled cDNA to restriction fragments of hPRLlOl using the technique of Southern (33) confirmed the location of prolactin coding sequences on the same DNA fragments as reported by Chien and Thompson (22). Thus, although XPRLlOl was isolated from a different genomic library than that of Chien and Thompson (22), it appeared to have the same general sequence organization as their clones. To facilitate further studies of the prolactin gene, the 9.0- and 1.5-kb Eco RI fragments of the XPRLlOl insert were sub- cloned into the Eco RI site of plasmid pBR322 and the subcloned DNA used for all subsequent sequencing work.

I The abbreviation used is: kb, kilobase.

FIG. 1. Physical map of the rat prolactin chromosomal gene and the rat DNA segment cloned in XPRL101. The five exons of the prolactin gene are labeled Z through V and are indicated by the dark boxes. The four introns are lettered A through D and are indicated by open boxes. Cleavage sites for the restriction endonucle- ase, Eco RI (E), were determined in this and previous studies (16, 22). The Eco RI sites at the termini of the XPRLlOl insert are enclosed in parentheses as these were likely created during the preparation of the rat genomic library when synthetic DNA linkers containing Eco RI sites were ligated to partially Hue 111-digested rat DNA. Chien and Thompson (22) detected an Eco RI site in two different genomic clones which maps at a position similar to the Eco RI site at the 3' terminus of the XPRLlOl insert. Therefore, we have indicated that the prolactin gene contains an Eco RI site at this position, although it may not be the same Eco RI site as the site at the 3' end of XPRL101. The two small Eco RI fragments in Intron C were mapped by Chien and Thompson (22). All other restriction sites were determined in the present study and in our previous analysis of the 3' portion of the prolactin chromosomal gene (16). The horizontal arrows pointing to the right and left show sequences obtained from the upper and lower strand, respectively. Solid arrows indicate se- quences determined from DNA fragments labeled at the 3' terminus; the dashed arrow indicates a DNA sequence determined from a DNA fragment labeled at the 5' terminus. An abbreviated restriction en- zyme map is shown for Hue I11 (H), HinfI (Hf), Msp I (M), and TuqI (2') sites.

Sequence Analysis of the Prolactin Gene-The objective of the DNA sequencing experiments was to determine the sequence of all of the exon-intron boundaries of the prolactin gene and the region preceding the transcription initiation site. The strategy was to use the R-loop and restriction map data of Chien and Thompson (22) together with the known prolac- tin coding sequence (16, 17) to identify restriction fragments that contained regions of interest which could be easily se- quenced. For example, the data of Chien and Thompson (22) indicated that Exon I and its boundaries were within a 1.2- kilobase Sac I-Xba I fragment while Exon I1 was within a 2.7- kilobase Xba I-Eco RI fragment. Restriction enzyme digestion of our cloned genomic DNA demonstrated the presence of Sac I-Xba I and Xba I-Eco RI fragments of the appropriate size. Also, the nucleotide sequence of cloned prolactin cDNA indicated that the coding sequence contains three Msp I sites, two of which were near the 5' end of the coding sequence, Since the Sac I-Xba I fragment and the Xba I-Eco RI frag- ment were each found to contain a single Msp I site, it was suspected, and subsequently confirmed, that these Msp I sites were within Exon I and Exon 11. The data of Chien and Thompson (22) indicated that Exon I11 should lie within the 1.5-kb Eco RI fragment. After the sequences contained in Exon I and I1 were determined, it was possible to predict the nucleotide sequence and restriction sites which should lie in Exon I11 since the sequence of Exons IV and V had been determined previously (16). Of the several restriction enzymes predicted to cut within Exon 111, one enzyme, Hue 111, was found to cut the 1.5-kb Eco RI fragment once. This site was used to sequence the boundaries of Exon 111.

The sequence of about 1400 nucleotides was obtained (Fig.

Page 3: OF CHEMISTRY 20, Issue 111524-10528, 1981 In Analysis of 5 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Prmted In USA. Vol. 256, No. 20, Issue of October 25. pp. 111524-10528, 1981

10526 Prolactin Gene Structure 5 ' CAGCATTTCTCTCATTTCCT TTTGCTGTAATTAATCAAAA TCCTTCCTTTCTGGCCACTA TGTCTTCCTGAATATGAATA

K U A T A A A A T A K A T T T G A TGTTTAAAATTATTGGGGCT ATCTTAATGACGGAAATAGA TWTTGGGA&GGAAG&GA

T ~ ~ ~ ~ ~ A T A ~ A ~ T A T ~ T G A A G G T G T C G A A G ~ T T T ATAUGTCAATGTCTGCAG~ TGAGA~E-T Exon I

~ P ~ ~ G I C T T ~ T T G G C G I I I G T G - G T A T G T G ~ T G C A I n t r o n A

- ETCATTGAATAAGTGGCTT TCTCAGTATTCCTGATGTGG CATGGGGTTGTATACCTAAT G C A G G T T T C A A ~ T T A ~ G G

GGGATGAGGAATAACTGTTT TTCTCCAGACATGGCATGAC CAAAACTACTGACACAGACC CCM&ACAGCTGTU;TTTTG

fCCMATTTGTTCCTX4TTC ATGTCAGCACAGATATTAAA ATGATATTTGGGTTTAGAGC AAGGGTGATGGGGTTGATGC

TATTCCACAGMGAGGATGG GATCATCATGAGACACTTM CTTGCAC. .. . ..CCTC A C A C C A T A C A T C T G T G ~ ~

TATTMTATCAGATAAACAT ACCATGTAGGATGTTGACTC TTTCTTCACTACAAGTGCAA GCATACATGTAAGATGCTGG

GACUATAAGGTTTCTCTT CATCTMGTAGATCATCACA CACTGATACCTGAATTTCTT TAGCAGGGKACTCCTCCTG

/ 1 Exon 11

CTGATGATGTCAAACCTTCT GTTCTGCCAAMTGTGCAGA CCCTGCCAGTCTGTTCTGGT GGCGACTGCCAWCACCTCT

C C G T M

GTACTTCACTCATTTCTAGC ATTCCTTGAAAAGGGTCTCT TTGCAACCATTCTGCTGCAT GTCTGTTAAATAATGATCCG Intron E

W G T T C M A C K T A T A T C T A C A T A A G A W T M A A A T C T G G C T T A T M M A T T G G A A C A AGTAGAACAAGAGGATTTCA

TMAATCTTACAWTTCACA AAAGTGGAATGACCTTCMT ATTTTGTATTTCATATTTT.. ./. , .ATTCTAGTTATflT

ATGCTTTMAiTCTCAGACT TTGTTAATGGAATTTCTGTA TAATTTCTAGGATAAACAGT ATGTCCAAGATCGTGAGTTT

ATTGCCAAGGCCATCAATGA CTGCCCCACTTCTTCCCTAG CTACTCCTGMGACAAGGAA CAAGCCCAWAAGTCCCTGT

GAGTCCTTACCCCAITTCTT CCCAACATAATTGAGGCAGA ACATTGGTTTTGGTGTACCC TATGTTATTAGPGcAcTG ... 3 ' I n t r o n C

Exon 111

FIG. 2. The nucleotide sequence of the regions indicated by the horizontal arrows in Fig. 1. The sequence of the coding strand only is shown. Sequences for Exons I, 11, and I11 are indicated by lines above and below the nucleotide sequence. The 5' terminus of Exon I is enclosed in a dotted line to indicate the possible ambiguity involved in the location of mRNA 5' termini by S1 mapping (see Fig. 4). Symmetrical sequences preceding Exon I are underlined with heavy lines. The initiator methionine codon is indicated by a second set of lines above and below the codon. The vertical arrow at the beginning of Exon I1 indicates the alternate splicing site discussed in the text. The slashes indicate regions in Introns A and B that were not sequenced.

2). Comparison of the genomic DNA sequence with the pro- lactin cDNA sequence reveals exon' regions of 82, 173, and 108 nucleotides (Exons I, 11, and 111 respectively). These exons code for amino acid residues -29 to -21, residues -20 to 38, and residues 39 to 74 where negative numbers refer to posi- tions in the NHs-terminal precursor segment. The final two exons, Exons IV and V, were sequenced previously (16) and code for residues 75 to 134 and residues 135 to 197, respec- tively, of preprolactin. These five exons together contain the entire coding sequence of rat preprolactin.

Analysis of the Exon-Intron Boundaries-A comparison of the exon-intron boundaries reveals several features (Fig. 3). At most of the boundaries, duplication of sequences allows more than one possible splice site. However, there is a unique site for the splice which removes Intron C. All of the bound- aries follow the GU ... AG rule proposed by Breathnach et al. (34). Also, all of the exon-intron boundaries show considerable homology with a consensus splice junction (35, 36). It has been suggested that a common splice sequence may hybridize to small nuclear RNAs and that the resulting hybrid is im- portant for splicing (35). However, as has been discussed by Sharp (36), it is clear that other mechanisms are required to allow the accurate removal of introns and splicing of the RNA.

Perhaps the most interesting feature of the intron-exon boundaries concerns the possible existence of two separate

Exon is used here to indicate the portion of a chromosomal gene which contains sequences present in the mature mRNA. Thus, the fist portion of Exon 1 contains sequences for the 5' untranslated region of prolactin mRNA.

Class of I n t r o n

0 I I 1

5' m G GUAUGUGcUG ... I n t r o n A. , . A u u u c u u u a C A J G 3 ' + t - UUU

G GUGAGCAUUU ... I n t r o n D...UUCUUAUUAG CAG

C GUGAGUCCUU ... I n t r o n C...UGUGGAUUAG CCU

G GUAAGUACU U... I n t r o n B...UAAUUUCUAG . + . * " $ + +

C o n r e n r u ~ S e q u Q n c e : AG GURAG YNVIVNCAG

FIG. 3. Comparison of the exon-intron boundaries of the rat prolactin gene. The nucleotide sequences of the exon-intron bound- aries are from Fig. 2 and our previous work (16). The sequences were aligned using the GU ... AG common sequence proposed by Breathnach et al. (34) and the vertical lines indicate possible splice sites which would be utilized if the GU and AG were the ends of the spliced out intron. The arrow indicates the differential splice junction at the end of Intron A as discussed in the text. The consensus sequence and intron classifications are as described by Sharp (36). Class 0 introns interrupt the reading frame between codons, class I introns interrupt between the first and second nucleotide of a codon, and class I1 introns interrupt between the second and third nucleotide of a codon. Duplication of sequences at the boundaries allows assignment to more than one class of introns (+ or -). The intron class which would be formed by splices obeying the GU,..AG rule is indicated by "*". The abbreviations used are: R, purine nucleotide; Y , pyrimidine nucleo- tide; N , any nucleotide.

splicing sites at the 3' end of Intron A (Fig. 3). This possibil- ity is suggested by the fact that a major difference occurs in the two published sequences of the rat prolactin cDNA (16, 17). Our cDNA sequence (16) contains the nucleotide triplet, GCA, which codes for an alanine a t position -20 of the precursor segment. The nucleotide sequence of Cooke et al. (17) does not have this triplet at the same position and therefore codes for a protein containing one less amino acid (alanine at position -20) in the precursor segment. The studies described here demonstrate that this portion of the genomic coding sequence occurs precisely at the Intron A-Exon I1 boundary and that two possible splicing sites occur at this boundary. The two sites are three nucleotides apart and are situated so that splicing at the first site would result in an RNA containing the GCA triplet coding for alanine a t position -20 and splicing at the second site would result in an RNA lacking this triplet. Both of these sites follow the GU ... AG rule and both show homology with the consensus boundary se- quence. Therefore, it is possible that two mRNA structures result from the same DNA structure by differential splicing. Analysis of the amino acid sequence of the translation prod- ucts directed by prolactin mRNA demonstrated that most of the preprolactin molecules have an alanine a t position -20 of the precursor segment (37). Thus, if two forms of prolactin mRNA arise due to utilization of two splice sites at the 3' end of Intron A, then the first splice site would seem to be the preferred splice site. There is evidence for differential splicing in other systems. Early et al. (38) have reported that differ- ential splicing can produce two different mRNAs from a single precursor RNA transcript of the immunoglobulin p gene. The different mRNAs code for proteins with different COOH- terminal sequences and different functional properties; one protein is membrane-bound and the other is secreted. It seems unlikely that the presence or absence of one alanine in the NHz-terminal precursor segment of preprolactin would have any effect on the secretion of prolactin, but this possibility cannot be excluded.

Identification of the Transcription Initiation Site-The 5' terminus of mature prolactin mRNA was mapped using the S1 nuclease protection procedure (30, 31). A 242-base pair fragment uniquely labeled with "P at the 5' terminus of the Msp I site in Exon I and extending to an upstream Hae I11 site was prepared and hybridized to rat pituitary mRNA. After digestion with S1 nuclease and electrophoresis on a thin

Page 4: OF CHEMISTRY 20, Issue 111524-10528, 1981 In Analysis of 5 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Prmted In USA. Vol. 256, No. 20, Issue of October 25. pp. 111524-10528, 1981

Prolactin Gene Structure 10527

sequencing gel, a major band and several minor bands were found (Fig. 4). The size of the S1-protected fragments was determined by comparison to the electrophoretic mobility of the same DNA fragment subjected to the sequencing reac- tions. The results suggest that within 1 or 2 nucleotides, the 5’ terminus of prolactin mRNA maps at a position 77 nucleo- tides upstream from the 5’ terminus of the Msp I site or 54 nucleotides upstream from the initiator methionine codon. If the 5’ terminus of the primary transcript of the prolactin gene was processed before capping, then mapping the location of the 5’ terminus of the mature mRNA would not indicate the

A G T C S ,

150

125.

100

90.

80-

77-

70-

FIG. 4. Autoradiogram of a DNA sequencing gel showing the S1 mapping of the 5‘ terminus of prolactin mRNA. A 242- base pair Hue 111-Msp I fragment, uniquely labeled with 32P at the 5’ terminus of the Msp I site, was hybridized to rat pituitary mRNA and then digested with S1 nuclease. The S1 nuclease-digested sample (SI) was electrophoresed on a DNA sequencing gel with samples of the same labeled DNA which had been subjected to the adenine-specific (A), guanine-specific (G), (thymine + cytosine)-specific (T), and cytosine-specific (0 sequencing reactions (27). The numbers repre- sent the distance from the 5’-labeled terminus. The arrow points to the shortest SI-protected fragment.

transcription initiation site. However, analysis of flanking DNA sequences (see below) suggests that the site located by S1 mapping is in fact the transcription initiation site.

Analysis of the 5’ flanking sequences upstream from the transcription initiation site revealed at least two interesting features. The sequence TTTATAAA was found at positions -29 to -22 from the start of transcription (Fig. 2). This sequence is similar to the TATA box which has been found about 30 nucleotides upstream from the transcription initia- tion site of a number of eukaryotic genes transcribed by RNA polymerase I1 (39, 40). The TATA box is similar to the Pribnow box (41) of the prokaryotic promoter and a number of studies have shown that the TATA box is likely involved in initiation of transcription (40, 42). Thus, this site is likely involved in transcription of the prolactin gene. Another fea- ture of the prolactin 5‘ flanking region is the presence of a 16- base pair hyphenated palindrome, TGATTATATATA- TATTCA, a t positions -62 to -45 from the start of transcrip- tion. The presence of this large, symmetrical sequence up- stream from the transcription initiation site suggests that it may be a regulatory site. Thus, the ability of estradiol and thyrotropin releasing hormone to stimulate prolactin mRNA accumulation (1-5) or the effects of dopamine on decreasing prolactin mRNA levels (6) might involve the interaction of a regulatory protein with this large symmetrical sequence. Al- though both the chicken ovalbumin and ovomucoid genes are induced by estradiol, they do not contain a similar, large symmetrical sequence upstream from the transcription initi- ation site (43,441. This may simply reflect a species difference or it may indicate that this site is not involved in the estrogenic regulation of prolactin gene transcription. Perhaps this site is involved in mediating the effects of dopamine or thyrotropin releasing hormone rather than the effects of estradiol on prolactin gene expression. Clearly, at this time, the possible function of this site remains speculative. Further studies uti- lizing soluble cell-free transcription systems (45,46) combined with specific deletions or mutations in these sequences may aid in establishing their function.

Our analysis of genomic clones in the present and previous studies (16) as well as Chien and Thompson’s (22) R-loop analysis of several genomic clones suggests that the prolactin gene consists of approximately 10 kb of DNA. We have previously detected large, potential nuclear precursors for prolactin mRNA in pituitary cell nuclei (47). The largest of these precursors was approximately 7.0 kb in length. This is significantly shorter than the estimated IO-kb sue of the gene and it is likely that the 7.0-kb nuclear precursor RNA is not the primary transcript of the gene. Recently, Hoffman et al. (48) have detected larger possible nuclear precursors of pro- lactin mRNA with the largest precursor having a size of 14 kb. This is significantly larger than our estimate of the size of the prolactin gene. These size discrepancies may at least partially reflect the difficulty in determining the size of these large nucleic acids. Alternatively, considerable allelic varia- tions have been demonstrated for the ovalbumin gene (49,50) and the transferrin gene (51). This raises the possibility that allelic variants of the prolactin gene might be considerably larger than 10 kilobases. Further analysis of the variation in the structure of the prolactin gene may yield some insight into this possibility.

Acknowledgments-We thank Dr. Linda Jagodziiski and Dr. James Bonner for generously providing the rat genomic library and B. Maurer for aid in preparing this manuscript.

REFERENCES 1. Stone, R. T., Maurer, R. A., and Gorski, J. (1977) Biochemistry

16,4915-4921

Page 5: OF CHEMISTRY 20, Issue 111524-10528, 1981 In Analysis of 5 ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Prmted In USA. Vol. 256, No. 20, Issue of October 25. pp. 111524-10528, 1981

10528 Prolactin Gene Structure

2. Ryan, R., Shupnik, M. A,, and Gorski, J. (1979) Biochemistry 18, U. S. A. 62, 1159-1166

3. Seo, H., Refetoff, S., Martino, E., Vassart, G., and Brocas, H. 27. Maxam, A. M., and Gilbert, W. (1980) Methods Enzyrnol, 65,

4. Dannies, P. S., and Tashjian, A. H., Jr. (1976) Biochern. Biophys. 28. Challberg, M. D., and Englund, P. T. (1980) Methods. Enzyrnol.

5. Evans, G. A,, David, D. N., and Rosenfeld, M. G. (1978) Proc. 29. Sanger, F., and Coulson, A. R. (1978) FEBS Lett. 87, 107-110

6. Maurer, R. A. (1980) J. Biol. Chern. 255, 8092-8097 1175-1193 7. Catt, K. J., Moffat, B., and Niall, H. D. (1967) Science 157,321 31. Berk, A. J., and Sharp, P. A. (1977) Cell 12, 721-732 8. Li, C. H., Dixon, J. S., Lo, T.-B., Pankov, Y. A,, and Schimdt, K. 32. Maurer, R. A., Stone, R., and Gorski, J. (1976) J. Biol. Chern.

9. Sherwood, L. M. (1967) Proc. Natl. Acad. Sci. U. S. A. 58, 2307- 33. Southern, E. M. (1975) J. Mol. Biol. 98, 503-517

2044-2048 26. Nichols, B. P., and Donelson, J. E. (1978) J. Virol. 26,429-434

(1979) Endocrinology 104, 1083-1090 499-560

Res. Cornrnun. 70, 1180-1189 65, 39-43

Natl. Acad. Sci. U. S. A . 75, 1294-1298 30. Weaver, R. F., and Weissman, C. (1979) Nucleic Acids Res. 7,

D. (1969) Nature 224,695-696 251,2801-2807

2314 34. Breathnach, R., Benoist, C., O’Hare, K., Gannon, F., and Cham-

Greenwood, F. C. (1971) Proc. NatE. Acad. Sci. U. S. A . 68, 35. Lerner, M. R., Boyle, J. A., Mount, S. M., Wolin, S. L., and Steitz, 866-869 J. A. (1980) Nature 283, 220-224

11. Bewley, T. A., Dixon, J. S., and Li, C. H. (1972) Int. J. Peptide 36. Sharp, P. A. (1980) Cell 23, 643-646 Protein Res. 4, 281-287 37. McKean, D. J., and Maurer, R. A. (1978) Biochemistry 17, 5215-

12. Seeburg, P. H., Shine, J., Martial, J. A., Baxter, J. D., and 5219 Goodman, H. M. (1977) Nature 270,486-494 38. Early, P., Rogers, J., Davis, M., Calame, K., Bond, M., Wall, R.,

13. Shine, J., Seeburg, P. H., Martial, J. A,, Baxter, J. D., and and Hood, L. (1980) Cell 20,313-319 Goodman, H. M. (1977) Nature 270,494-499 39. Flavell, R. A. (1980) Nature 285, 356-357

14. Martial, J. A., Hallewell, R. A., Baxter, J. D., and Goodman, H. 40. Corden, J., Wasylyk, B., Buchwalder, A., Sassone-Corsi, P., Ked- M. (1979) Science 205,602-607 inger, C., and Chambon, P. (1980) Science 209, 1406-1414

15. Gubbins, E. J., Maurer, R. A,, Hartley, J. L., and Donelson, J. E. 41. Pribnow, D. (1975) J. Mol. Biol. 99, 419-443 (1979) Nucleic Acids Res. 6, 915-930 42. Proudfoot, N. J., Shander, M. H. M., Manley, J. L., Gefter, M. L.,

16. Gubbins, E. J., Maurer, R. A,, Lagrimini, M., Erwin, C. R., and and Maniatis, T. (1980) Science 209, 1329-1336 Donelson, J. E. (1980) J. Biol. Chem. 255,8655-8662 43. Gannon, F., O’Hare, K., Perrin, F., LePennec, J. P., Benoist, C.,

17. Cooke, N. E., Coit, D., Weiner, R. I., Baxter, J. D., and Martial J. Cochet, M., Breathnach, R., Royal, A., Garapin, A., Cami, B., A. (1980) J. Biol. Chern. 255, 6502-6510 and Chambon, P. (1979) Nature 278,428-434

18. Martial, J. A., Baxter, J. D., Goodman, H. M., and Seeburg, P. H. 44. Cochet, M., Gannon, F., Hen, R., Maroteaux, L., Perrin, F., and (1977) Proc. Natl. Acad. Sci. U. S. A . 74, 1816-1820 Chambon, P. Nature 282,567-574

19. Tushinski, R. J., Sussman, P. M., Yu, L.-Y., and Bancroft, F. C. 45. Weil, P. A., Luse, D. S., Segall, J., and Roeder, R. G. (1979) Cell (1977) Proc. Natl. Acad. Sci. U. S. A . 74, 2357-2361 18,469-484

20. Shapiro, L. E., Samuels, H. H., and Yaffe, B. M. (1978) Proc. 46. Manley, J. L., Fire, A,, Cano, A., Sharp, P. A., and Gefter, M. L. Natl. Acad. Sci. U. S. A. 75, 45-49 (1980) Proc. Natl. Acad. Sci. U. S. A. 77, 3855-3859

21. McWilliams, D., Callahan, R. C., and Boime, I. (1977) Proc. Natl. 47. Maurer, R. A., Gubbins, E. J., Erwin, C. R., and Donelson, J. E. Acad. Sci. U. S. A . 74, 1024-1027 (1980) J. Biol. Chern. 255,2243-2246

22. Chien, Y.-H., and Thompson, E. B. (1980) Proc. Natl. Acad. Sci. 48. Hoffman, L. M., Fritsch, M. K., and Gorski, J. (1981) J. Biol.

23. Rigby, P. W. J., Dieckman, M., Rhodes, C., and Berg, P. (1977) J. 49. Le Pennec, J. P., Baldacci, P., Perrin, F., Cami, B., Gerlinger, P., Mol. Biol. 113,237-251 Krust, A., Kourilsky, P., and Chambon, P. (1978) Nucleic Acids

24. Blattner, F. R., Williams, B. G., Blechl, A. E., Denniston-Thomp- Res. 5,4547-4562 son, K., Faber, H. E., Furlong, L.-A., Grunwald, D. J., Kiefer, 50. Lai, E. C., Woo, S. L. C., Dugaiczyk, A., and O’Malley, B. W. D. O., Moore, D. D., Schumm, J. W., Sheldon, E. L., and (1979) Cell 16,201-211 Smithies, 0. (1977) Science 196, 161-169 51. Lee, D. C., McKnight, G. S., and Palmiter, R. D. (1980) J. Bid.

10. Niall, H. D., Hogan, M. L., Sauer, R., Rosenblum, I. Y., and bon, P. (1978) Proc. Natl. Acad. Sci. U. S. A . 75,4853-4857

U. S. A . 77,4583-4587 Chem. 256,2597-2600

25. Clewell, D. B., and Helinski, D. R. (1969) Proc. Natl. Acad. Sci. Chem. 255,1442-1450