close relationship between the long terminal repeats of avian

5
Proc. NatL Acad. Sci. USA Vol. 80, pp. 3193-3197, June 1983 Biochemistry Close relationship between the long terminal repeats of avian leukosis-sarcoma virus and copia-like movable genetic elements of Drosophila (evolution/retrovirus/regulatory sequences) WATARU KUGIMIYA, HIROSHI IKENAGA, AND KAORU SAIGO Department of Biochemistry, Kyushu University 60 School of Medicine, Fukuoka 812, Japan Communicated by Arthur B. Pardee, February 16, 1983 ABSTRACT A new species of copia-like movable genetic ele- ment termed 17.6 was identified in Drosophila melanogaster, and the nucleotide sequences of its long terminal repeats (LTRs) were determined. The LTRs of 17.6 were not only homologous to those of 297, a sibling movable genetic element of 17.6, but also closely matched those of avian leukosis-sarcoma virus. This made it pos- sible (i) to identify the nucleotide sequences in 17.6 and 297 that correspond to the crucial regulatory sequences for both transcrip- tion and reverse transcription in avian leukosis-sarcoma virus and (ii) to divide the LTRs of these two elements into three regions, U3, R, and U5, like those of retrovirus proviruses. Similarity in sequence was also found to a certain extent in other copia-like ele- ments. From these results, we postulate that copia-like movable genetic elements in Drosophila originated from infection of a pro- genitor Drosophila with a retrovirus from which the present-day avian leukosis-sarcoma virus was derived. In Drosophila melanogaster, about 5% of the genome DNA is formed of copia-like movable genetic elements scattered at nu- merous sites along the chromosome (1). Recent analyses have revealed that, like retrovirus proviruses in vertebrates, the co- pia-like elements not only are bounded by long terminal direct repeats (LTRs) but also are flanked by short direct repeats of host DNA (1-11). In addition, nucleotide sequences seemingly corresponding to a primer binding site or a purine-rich se- quence (or both), both of which appear to be mandatory for the initiation of DNA synthesis in a retrovirus system (4, 12, 13), have been recognized in some copia-like elements (9-11). Cir- cular copies of copia whose structures are similar to the coun- terparts of retrovirus proviruses also have been found (14). Fur- thermore, virus-like particles containing copia RNA sequences have been identified in D. melanogaster (15). Thus, it seems to be quite reasonable to imagine that copia-like movable genetic elements and retroviruses are related evolutionarily to each other. In the present work, the nucleotide sequences of LTRs of 17.6, a new species of copia-like movable genetic element of Drosophila, were determined and compared with those of 297 (10, 16) and other similar elements. We found that the overall nucleotide sequences of the LTRs of 17.6 are not only re- markably similar to those of 297 but also correspond base-to- base with those of an avian retrovirus, avian leukosis-sarcoma virus (AL-SV). Nucleotide sequences corresponding to those crucial for transcription and reverse transcription in AL-SV were identified in both 297 and 17.6, permitting us to divide the LTRs of the two movable genetic elements into three portions, pu- tative U3, R, and U5 (17). This similarity in sequence also seemed to extend to a certain extent to the copia-like movable genetic elements other than 297 and 17.6. However, at the same time, the nucleotide sequences corresponding to the crucial regula- tory sequences in AL-SV were suggested to be not necessarily functional in Drosophila. From these results, the relationship between copia-like movable genetic elements and AL-SV is discussed. MATERIALS AND METHODS The procedures for isolation and purification of recombinant phage Ahist 17.6 were as described (16). Two recombinant plas- mids, pWTl and pWT2, which contained the 3. 1-kilobase (kb) EcoRI segment and 1.4-kb AvaI/HindIII segment, respec- tively, of Ahist 17.6 (see Fig. la) were constructed with pBR322 as a vector. The physical containment used for the construction and preparation of the hybrid plasmids and phages was P2, as specified in the guidelines of the Ministry of Education, Sci- ence and Culture of Japan (similar to P2 in the guidelines of the National Institutes of Health). All other procedures and ex- perimental conditions have been described elsewhere (10, 16). RESULTS Identification of 17.6 as a New Species of copia-like Mov- able Genetic Elements. From a cloned library of D. melano- gaster, two recombinant phages were isolated whose DNA si- multaneously hybridizes to histone genes and to a movable genetic element, 297 (16). In Ahist c, an authentic 297 element was found to be inserted into the T-A-T-A box of the H3 histone gene (10). In the other clone, Ahist 17.6, an unknown nucleo- tide sequence (which is termed 17.6) was shown to interrupt the A+T-rich spacer region between the Hi and H3 histone genes (10, 16). Heteroduplex analysis revealed the presence of a 1.7- kb homologous region between 17.6 and 297 (Fig. la). Further information on the nature of 17.6 was obtained by digesting Drosophila genomic DNA with various restriction enzymes and subjecting the digests to blot-hybridization that used as a probe the 2.1-kb HindIII segment (a segment labeled 2 in Fig. la) that is included in the region unhomologous to 297. HindIII digests gave a strong 2. 1-kb band, whereas EcoRI and BamHI digests gave many bands of almost identical intensities (Fig. 2a), suggesting that nucleotide sequences identical with that of the HindIII probe are present in many sites on the chromo- some. Densitometric analysis suggested that their copy number per haploid chromosome was about 40. Forty-six recombinant clones hybridizable to the 2. l-kb HindIII probe were isolated from 4 x 104 phages in a cloned library of D. melanogaster, and the restriction maps of some of them were determined. Two Abbreviations: LTR, long terminal repeat; bp, base pair(s); AL-SV, avian leukosis-sarcoma virus; kb, kilobase(s). 3193 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertise- ment" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Upload: others

Post on 18-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Close relationship between the long terminal repeats of avian

Proc. NatL Acad. Sci. USAVol. 80, pp. 3193-3197, June 1983Biochemistry

Close relationship between the long terminal repeats of avianleukosis-sarcoma virus and copia-like movable geneticelements of Drosophila

(evolution/retrovirus/regulatory sequences)

WATARU KUGIMIYA, HIROSHI IKENAGA, AND KAORU SAIGODepartment of Biochemistry, Kyushu University 60 School of Medicine, Fukuoka 812, Japan

Communicated by Arthur B. Pardee, February 16, 1983

ABSTRACT A new species of copia-like movable genetic ele-ment termed 17.6 was identified in Drosophila melanogaster, andthe nucleotide sequences of its long terminal repeats (LTRs) weredetermined. The LTRs of 17.6 were not only homologous to thoseof 297, a sibling movable genetic element of 17.6, but also closelymatched those of avian leukosis-sarcoma virus. This made it pos-sible (i) to identify the nucleotide sequences in 17.6 and 297 thatcorrespond to the crucial regulatory sequences for both transcrip-tion and reverse transcription in avian leukosis-sarcoma virus and(ii) to divide the LTRs of these two elements into three regions,U3, R, and U5, like those of retrovirus proviruses. Similarity insequence was also found to a certain extent in other copia-like ele-ments. From these results, we postulate that copia-like movablegenetic elements in Drosophila originated from infection of a pro-genitor Drosophila with a retrovirus from which the present-dayavian leukosis-sarcoma virus was derived.

In Drosophila melanogaster, about 5% of the genome DNA isformed of copia-like movable genetic elements scattered at nu-merous sites along the chromosome (1). Recent analyses haverevealed that, like retrovirus proviruses in vertebrates, the co-pia-like elements not only are bounded by long terminal directrepeats (LTRs) but also are flanked by short direct repeats ofhost DNA (1-11). In addition, nucleotide sequences seeminglycorresponding to a primer binding site or a purine-rich se-quence (or both), both of which appear to be mandatory for theinitiation of DNA synthesis in a retrovirus system (4, 12, 13),have been recognized in some copia-like elements (9-11). Cir-cular copies of copia whose structures are similar to the coun-terparts of retrovirus proviruses also have been found (14). Fur-thermore, virus-like particles containing copia RNA sequenceshave been identified in D. melanogaster (15). Thus, it seems tobe quite reasonable to imagine that copia-like movable geneticelements and retroviruses are related evolutionarily to each other.

In the present work, the nucleotide sequences of LTRs of17.6, a new species of copia-like movable genetic element ofDrosophila, were determined and compared with those of 297(10, 16) and other similar elements. We found that the overallnucleotide sequences of the LTRs of 17.6 are not only re-markably similar to those of 297 but also correspond base-to-base with those of an avian retrovirus, avian leukosis-sarcomavirus (AL-SV). Nucleotide sequences corresponding to thosecrucial for transcription and reverse transcription in AL-SV wereidentified in both 297 and 17.6, permitting us to divide the LTRsof the two movable genetic elements into three portions, pu-tative U3, R, and U5 (17). This similarity in sequence also seemedto extend to a certain extent to the copia-like movable genetic

elements other than 297 and 17.6. However, at the same time,the nucleotide sequences corresponding to the crucial regula-tory sequences in AL-SV were suggested to be not necessarilyfunctional in Drosophila. From these results, the relationshipbetween copia-like movable genetic elements and AL-SV isdiscussed.

MATERIALS AND METHODSThe procedures for isolation and purification of recombinantphage Ahist 17.6 were as described (16). Two recombinant plas-mids, pWTl and pWT2, which contained the 3. 1-kilobase (kb)EcoRI segment and 1.4-kb AvaI/HindIII segment, respec-tively, of Ahist 17.6 (see Fig. la) were constructed with pBR322as a vector. The physical containment used for the constructionand preparation of the hybrid plasmids and phages was P2, asspecified in the guidelines of the Ministry of Education, Sci-ence and Culture of Japan (similar to P2 in the guidelines of theNational Institutes of Health). All other procedures and ex-perimental conditions have been described elsewhere (10, 16).

RESULTSIdentification of 17.6 as a New Species of copia-like Mov-

able Genetic Elements. From a cloned library of D. melano-gaster, two recombinant phages were isolated whose DNA si-multaneously hybridizes to histone genes and to a movablegenetic element, 297 (16). In Ahist c, an authentic 297 elementwas found to be inserted into the T-A-T-A box of the H3 histonegene (10). In the other clone, Ahist 17.6, an unknown nucleo-tide sequence (which is termed 17.6) was shown to interrupt theA+T-rich spacer region between the Hi and H3 histone genes(10, 16). Heteroduplex analysis revealed the presence of a 1.7-kb homologous region between 17.6 and 297 (Fig. la). Furtherinformation on the nature of 17.6 was obtained by digestingDrosophila genomic DNA with various restriction enzymes andsubjecting the digests to blot-hybridization that used as a probethe 2.1-kb HindIII segment (a segment labeled 2 in Fig. la)that is included in the region unhomologous to 297. HindIIIdigests gave a strong 2.1-kb band, whereas EcoRI and BamHIdigests gave many bands of almost identical intensities (Fig.2a), suggesting that nucleotide sequences identical with that ofthe HindIII probe are present in many sites on the chromo-some. Densitometric analysis suggested that their copy numberper haploid chromosome was about 40. Forty-six recombinantclones hybridizable to the 2. l-kb HindIII probe were isolatedfrom 4 x 104 phages in a cloned library of D. melanogaster, andthe restriction maps of some of them were determined. Two

Abbreviations: LTR, long terminal repeat; bp, base pair(s); AL-SV, avianleukosis-sarcoma virus; kb, kilobase(s).

3193

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertise-ment" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Close relationship between the long terminal repeats of avian

3194 Biochemistry: Kugimiya et al.

D;- 1 D

1kb I 297

a E BH A A H BHE S i

E H BH S E Eb m i. E

CE SB H BHE SI I I I It I a

_l a

".0

,): _~a - -"J0 I0.BEI I

FIG. 1. Restriction enzyme maps of three recombinant phage DNAscontaining 17.6. (a) Ahist 17.6. (b) AKS605. (c) AKS634. (a) In the uppermargin of the map of Ahist 17.6, the structure and position of 17.6 areschematically shown with the structure of297. -, The 1.7-kb homol-ogous regions between 17.6 and 297; a, LTRs; ---, a histone gene re-peat. Arrows labeled 1 and 2 indicate the ranges of the 1.5-kb Pvu IIprobe of297 and the 2.1-kbHindlll probe in 17.6, respectively, whereasthose labeled 3 and 4 show the segments of Ahist 17.6 subcloned as pWT1and pWT2, respectively. (b and c) Open bars below each map representthe segments hybridizable to the 1.5-kbPvu II probe of297, while hatchedbars below AKS605 and AKS634 show the homology to LTRs of 17.6. E,EcoRI; B, BamHI; C, Cla I; A, Ava I; H, HindIl; S, Sal I; P, Pvu HI.

typical examples (those of AKS605 and AKS634) are shown inFig. 1 with a map of Ahist 17.6. DNA from these three cloneswas digested with EcoRI and HindIII and examined by South-ern hybridization that used as a probe a 1.5-kb Pvu II segmentof 297 (a segment labeled in Fig. la) that covers most of the1.7-kb region common to 17.6 and 297. In all cases, only re-striction segments located at the identical positions with re-spect to the 2.1-kb HindIII segments could hybridize to theprobe (Figs. 1 and 2b). In a separate experiment, more thanhalf of the clones positive to the 2. l-kb HindIII probe also werefound to have nucleotide sequences homologous to that of the1.5-kb Pvu II probe.The presence of LTRs in 17.6 was suggested by a blot-hy-

bridization experiment and confirmed by DNA sequence anal-ysis as shown below. The blot-hybridization experiment alsosuggested that each of the DNA inserts in AKS605 and AKS634contained two separate regions homologous to the 17.6 LTRs asillustrated in Fig. 1. Thus, it is concluded that, although it hadpartial sequence homology to 297, 17.6 is a new species of Dro-sophila copia-like movable genetic elements.

Determination of the Nucleotide Sequence of the 17.6 LTRs.Two restriction segments containing the junctions between thehistone repeat and 17.6 were subeloned as pWTl and pWT2(see Fig. la) and their partial nucleotide sequences were de-termined (Fig. 3). Comparison of these with the nucleotide se-quence of the A+T-rich spacer region in an authentic repeatunit of the histone gene (17) showed that the insertion of 17.6had occurred within a segment characteristic of the longer hi-stone repeat unit (17), yielding an apparent duplication of 4-6base pairs (bp) of host DNA.

.2

2.1 -0.

FIG. 2. Blot-hybridization patterns of genomic DNA (a) and clonedDNA(b). (a) GenomicDNA extracted fromD. melanogasterembryo wasdigested with HindIII (lane 1),BamHI (lane 2), andEcoRI (lane 3) andwas subjected to Southern hybridization with the 2.1-kb HindIll probeof 17.6. (b) AKS634 (lane 1), AKS605 (lane 2), and Ahist 17.6 (lane 3)DNAs were digested terminally with EcoRI andHindll and examinedwith the 1.5-kbPvu II probe of297. Molecular weights are shown in kb.

Like other similar elements, 17.6 contains almost identicalLTRs. From the data shown in Fig. 3, the length of the 17.6LTRs was estimated to be 510-512 bp, although the results ofthe sequence comparison described below suggest that the lengthis probably 510 bp and that a 6-bp host DNA segment, under-lined in Fig. 3 (T-A-T-A-T-A), was duplicated upon insertion.

Comparison of the Nucleotide Sequences of 17.6 LTRs and297 LTRs. The LTRs of 17.6 are longer by about 100 bp thanthose of 297 (10). For examination of the homology betweenthese LTRs, these nucleotide sequences were arranged by vi-sual inspection as shown in Fig. 4, lines 1 and 2. For conven-ience, in Fig. 4, the right- and left-hand LTRs of each elementare recombined so that the internal junctions of the right- andleft-hand LTRs are located on both sides of one LTR. Four longblocks consisting of closely homologous nucleotide sequencesare separated by three 10- to 50-bp segments in which 297 se-quences are deleted. As indicated by horizontal arrows in Fig.4 and shown in Fig. 5, these long unpaired regions and someshorter ones can be explained by duplication or triplication ofneighboring DNA with or without subsequent deletion, if somemistakes are permitted. It is known that Moloney murine leu-kosis-sarcoma virus (20) and spleen necrosis virus (21) containduplication/deletion within their LTRs. However, in these cases,sequence homologies between duplicated segments are muchhigher than in the case of 17.6/297. We interpret these findingsas supporting the notion that 17.6 and 297 LTRs have evolvedfrom a common progenitor by successive duplications with con-siderable base changes and deletions. Apart from the long un-paired regions, the homology between 17.6 and 297 LTRs is

H3---ATTAAAGAAGAGGTTAA TAC.AAAGAS~~GTGA'CATAT'TCACATACAAMACCACATAACATAGAGTAAACATATTGAAAAGCCGCATACGTAAACAATAAGTGACCACCATGCTATGTGGATCAAATAACAAAAATATCCACTCTGCATTTTGACACCCCCATACTGTATGCCATCTGCGCAGTATGCATTCTAATAAACAAATTCTTTGACAGCGGCACTTAGCCATTCTTGTAAACAAATCTTAAAGTCTGCCTGCTCTCTCTGAGGCTTCTCCTCCACTTAAGAATCCAAGAGCAATGCTCTCCCAAAAACACTAACATATTCTTTAAGCAAGCACAGAGGCTTCTCCTCATTTTCACTTTCATTTGATTTTCAGTCTTAAGCTGAACGTTAATCAATAAACACACAATCGATACCGAAATTTTGATTCGTTTTATTTTGGCAAAACTCAATTTTCAGCGTTGGTCTTAGTTCATATTCGGAAC GGTCCATTTAATAGACTCAAAACTATTTATTGCAACCATTTATTTGCAAT.TGGCGCAGTCGATGTGATCAGTGTTAAAGTTCCTTGATGCGGTAACCAGA-i--c 5.8 kb )------TCCCAGCCCAAGTATAGGCTTCTCTTTAAGGGAAGGGAAGTGACATATTCACATACAAAACCACATAACGTAGAGTAAACATATTGAAMGCCGCATACGTCAACAATAAGTGACCACCATGCTAATGTGGATCAAATAACAAAAATATCCACTCTGCATTTTGACACCCCCATACTGTATGCCATCTGCGCAGTATGCATTCTAATAAACAAATTCTTTGACAGCGGCACTTAGCCATTCTTGTAAACAAATCTTAAAGTCTGCCTGCTCTCTCTGAGGCTTCTCCTCCACTTAAGAATCCAAGAGCAATGCTCTCCCAAAAACACTAACATATTCTTTAAGCAAGCACAGAGGCTTCTCCTCATTTTCACTTTCATTTGATTTTCAGTCTTAAGCTGAACGTTAATCAATAAACACACAATCGATACCGAAATTTTGATTCGTTTTATTTTGGCAAAACTCAATTTTCAGCGTTGGTCtTAGTTCATATTCGGAACGGTCCATTTAATAGACTCAA ACTATTTATTGCAACCATTTATTTG TATTTATTTGTTCTT-.- H1

FIG. 3. Nucleotide sequence around the junctions between the histone gene and 17.6 in Ahist 17.6. Italics show the nucleotides expected to bederived from the histone gene; the remainder show those from 17.6. Arrows indicate the regions of the LTRs of 17.6. The vertical lines associatedwith an arrow indicate ambiguities in assigningthe exact ends of 17.6, whereas the solid underlines show the hostDNA segments that are apparentlyduplicated. *, Residues that differ in the two LTRs.

Proc. Natl. Acad. Sci. USA 80 (1983)

r5

f .:.; ;F L';

Page 3: Close relationship between the long terminal repeats of avian

Proc. Natd Acad. Sci. USA 80 (1983) 3195

I .LI(1) AAcTCA1TcccATAGATCCATCAATCOCAGCCU3TATAGGCTCMAATrCAt3GGGGAGTGTAT1rGGGTGcAmCCAAG ....................

(2 CGTTCTACC.CT5ATACACACCCCAATTGMU.A6CAGGATCACTTCAT..MMTA; ATAACATAGGATAi3 \ CrrlETAfWG!AT<, ~GiTATcq T ACTC .............. liTG Gc. ..

(a L.,,,,,,,,,,,,,.,.,.,.,.,....,.CXCATT AT..T ....TCGTCAGTMT .CTAATT ....... AT ATC

TGCCGCATACGTAAACMTAAGiGACAC,, .CATGCTAATGTGGATCAATPA ..CA4ATATCCTCT GC.AT&CA ACCCCT. /iATGCU WATGCPACATT.G1ATGTAA.C6ATCA ..... .CA.GMC T A.A .fW , . ,, . ..I I.I.I.I.I.I..,, ...

T. CGTT.TA ..........TCTMGCAGCGA. .GC1G TCTTTAAC..rCCTTGACCTArG AGCA.TAGACT1CCCTCT7CW1 GCTACTMTAIAACT TMAC TMT GCTCTCTCTCAG GC. TTC. C.CT.CC.A. CA,,T

C T T G GATGC,GTAG TGCTATCATCTGG,TAT CAt GTGXTT .. GiTTAG

XAWx&7iCTCrCCCwAATC,AA,ATA1TG1T. [email protected] AAAGAmGcWc

;cAG,AcGwTcr .....,. AG ... .ATTG. .'S3CCAACG..A. AT.TCCCCATT, rCAG-. AGATA1T~TimLMA , T GAT PlTAiafT Cb CM dTG TAT TCTCCTMAMACT.AT. TITIAMCTTGGC ...............,.IG.TC.TCCTTAGT. CA.A. CTGA. CGGGACA1TWATCL CATAAATA4.A CAACAATCGATA ... CCGAAI ItGATTCrlrA.,1Le ACI;ACTQTCAaQGTTGGTC. rTAG CATArFCiGAAOGTCCAFATTTAGAI WCTAIrAl[ G;F0ACALT.CAC ..RA. GGG T EC.CT3 . ,,,V ~Uf%CGG GATT:'CcCA.£.5 ACTIMfCAC. .CYATGAGCA ..... G .A

AAATiT1AC.777.I.13GCTCGATAGaTACA. .3AsCAMT.ATTG UnGCGCAGTCGATCTAT. CAGTG AAT.6 1T.....GGTG4CCCC9GAT. .

FIG. 4. Comparison of LTR nucleotide sequences and their flanking regions. Nucleotide sequences of 297 (ref. 10; unpublished data), 17.6, andAL-SV are shown in lines 1, 2, and 3, respectively, as described in the text. In particular, in the case of AL-SV, the LTR nucleotide sequence isidentical with that of the cloned LTR in ARAV 2-2 (18), whereas the flanking sequences of the LTRs are identical with those of Rous sarcoma virusprovirus, which contains a LTR sequence almost identical to that of RAV 2-2 (19). The shaded areas in lines 1 and 2 show the mutual homologybetween 297 and 17.6, and those in line 3 show the nucleotides in AL-SV that are identical to those of either 297 or 17.6 or both. Arrows labeledL show the ranges of the LTRs of 297, 17.6, and the provirus of AL-SV. Unlabeled horizontal arrows indicate the major duplication or triplicationfound in 17.6 and 297, and vertical arrows show some of the end points of the major deletions in 297 and AL-SV. Solid underlines labeled a, b, c,d, e, f, and g show, respectively, a purine-rich sequence, Hogness box, polyadenylylation signal, RNA start, poly(A) site, termination signal of tran-scription, and primer binding site in AL-SV; U3, R, and U5 are the subdivisions of LTRs.

about 64%, although 73% of the nucleotide sequence of 297LTRs can be derived from the 17.6 LTRs. Lines 1 and 2 in Fig.4 also show that the nucleotide sequence homology extends tothe internal regions of 17.6 and 297. The significance of thishomology and the LTR homology will be discussed in detailbelow.

Sequence Homology Between AL-SV and Drosophila Mov-able Genetic Elements. During inspection of LTRs from var-ious sources, we noticed that the overall nucleotide sequenceof the LTRs found in a recombinant clone of an AL-SV pro-virus (ARAV 2-2) (18) closely matched those of 297 and 17.6 LTRs(Fig. 4). That is, although the presence of a few long deletionswas unavoidable because of the short length (344 bp) of the AL-SV LTR, the LTR of AL-SV compared in Fig. 4 with those of297 and 17.6 was found to correspond base-to-base. The homol-ogies in LTR sequence between AL-SV and 17.6 and betweenAL-SV and 297 were 59% and 51%, respectively, if the AL-SV sequence was used as a standard for the calculation of ho-mology. Notice that 297 has a 30-bp deletion with respect toAL-SV (Fig. 4). If we neglect this segment, the homology be-

i1I AAGTJA.CATATTCACATACAAA CCA 24iCATAGAGTAAACATATT ...... GAAAAGCCG 55CATA.CGTAAACA.AT ......,.,. AAGTGA 75

(2) TCT ....GCGCAGTATGC.ATTCT TAACAA 176TCTITGACAGCGGCACTTA .GCOATTCTTGTAAACAAAT 214

(3) AATTTTGATTCGTTTTATTT. TGGC 418CTCAAT.... TTTC. .AGC1435

*AAAACT.ATT. .TTTTATTTCTTGGC 355

(4) ATTTATT.GCAAC 496ATTTA IS 511

FIG. 5. Four regions of multiplicated sequences found in 17.6 (in-dicated by horizontal arrows in Fig. 4). Shaded areas are as in Fig. 4.The numeral at the end of each line shows the distance in bp betweenthe left end of the LTR and the terminal base in each line. Boxes showthe nucleotidesequencesof 17.6that correspond to those deleted in297.*, line representing the corresponding sequence in 297.

tween 297 and AL-SV increases to 59%. Vertical arrows in Fig.4 indicate that some of the end points of deletions found in AL-SV LTR with respect to 17.6/297 sequences coincide with thoseof 297 and 17.6.

In general, LTRs of retroviruses contain the signals for bothinitiation and termination of transcription (22). An 18-bp-longprimer binding site and 11-bp-long purine-rich sequence, bothof which are located at internal junctions of LTRs, are nec-essary for reverse transcription (4, 12, 13, 22). Therefore, thepresence of the remarkably high similarity in LTR sequencebetween an avian retrovirus and Drosophila movable geneticelements raises the question of how the structures of such cru-cial regulatory sequences have changed during evolution as aretrovirus or movable genetic element.

According to Yamamoto et al. (23), the Hogness box and poly-adenylylation signal in AL-SV are located in the underlinedregions b and c, respectively, in Fig. 4. It is obvious that both17.6 and 297 contain A-A-T-A-A-A sequences at the corre-sponding positions, whereas the nucleotide sequences corre-sponding to the AL-SV Hogness box in 17.6 and 297, respec-tively, are not T-A-T-T-T-A-A but the related sequences A-T-T-T-T-C-A and A-T-T-T-T-T-A. Because these sequences differconsiderably from the consensus Hogness box (T-A-T-AA-A4)(24), these Hogness box-like sequences are probably nonfunc-tional. In AL-SV, the genomic RNA is known to start from theguanine labeled d in Fig. 4, line 3, whereas poly(A) additionoccurs 21-22 bp downstream from the A-A-T-A-A-A sequenceat the dinucleotide C-A labeled e, thus dividing the AL-SV LTRinto three portions, U3, R, and U5 (shown in Fig. 4) (19, 22).Recently, Soherer et aL.(11) reported that a Drosophila movablegenetic element, B104, has not only a H6gness box-like se-quence identical with the AL-SV Hogness box but also an ac-tive polyadenylylation signal, A-A-T-A-A-A, which is followed21-22 bp later by a polyadenylylation site, C-A (Fig. 6A, line2). As shown in Fig. 6A, the A-A-T-A-A-A in B104 (line 2) islocated in the exact site expected from the positions of thecounterparts in AL-SV (line 1), 297 (line 3), and 17.6 (line 4).

Biochemistry: Kugimiya et al

Page 4: Close relationship between the long terminal repeats of avian

3196 Biochemistry: Kugimiya et al

A b.c...de..(1) 'fTTCCTA.GMC.GA T.... GCATGACCAT.TCACC.A ......

(2) A .CGAAATATTCCCA... 1 ..C... AG1TVTACAAC C(3) *TMTiICITAATGAGATCCAAA I TCGTGMAgTiTICTCTCTMA-AA I(4) TAItMCTWAsCCGTTT CAACACMTCGATA. ... TTT I.

B 9 a___L(1) CTT.CAT- GGTGACCCCGACGTGAT--CTTTTGCATAGGGAGGGGG GTAGTCTTATGCAT(2) TTTGCA 4TGGCGCAGTCGATGTGAT-----CTCTTT, . AAGGGAAGGGA G ACATATT,CACAT(3) C.,..,. TGGCGCAGTCGATAGGAT- CTCTTT. AAGGGAAGGGG .ETGACGTATT.TGGGT(4) . TAC .. TGGCGACCGTGACAGGAT- ATTTTTCAAAAGGAGGGAGA .TGTAGTATATACGAA

(5) TTAC .. TGGCGACCGTGACAG. .T-----ATTTTTCAAAAGGAGGGAG ., GTAGTATTGCAA'GTGCC

(6) TAAAT ..CTCAGAAG,TGGGAT- AGTTGTCAGGACGGCCGAG ., GTAGTAGGCTGCTC(7) TTTACA TGGT.CAATCGA L4LTR(8) ............ GGTT.ATGGGC.-

LTR *-i

FIG. 6. Comparison of the nucleotide sequences corresponding tothe crucial regulatory sequences in AL-SV. (A) Regulatory sequencesfor transcription. Lines: 1, AL-SV; 2, B104 (11); 3,297; and 4 and 5,17.6. Shaded areas in lines 1 and 2 show the homology to either 17.6 or297 or to both, whereas the mutual homology between 297 and 17.6 isshownby shaded areas inlines3 and4. Bars: b, Hognessbox; c, polyade-nylylation signal; d, RNA start site; and e, poly(A) site. U3, R, and U5indicatethesubdivisions ofLTRs. Two horizontal arrows below the 17.6sequence show the duplication found near the poly(A) site (see Fig. 4).(B) Primer binding site (region g) and purine-rich sequence (region a).Lines 1,AL-SV; 2, 17.6; 3,297;4, mdg 1 (5); 5,412 (9); 6, mdg3 (6); 7,B104 (11); and 8, copia (1). Underlines in lines 4-8 show the homologyat least to AL-SV, 297, or 17.6. LTR regions are indicated. In the caseof mdg3, the authors of the original paper (6) reported that mdg3 wasbounded with about 15-bp-long inverted repeats connected to LTRs.However, our results rather support the idea that the inverted repeatsbelong not to mdg 3 but to host DNA.

The nucleotide sequence upstream from the C-A in B104 is verysimilar to that of the counterpart in 297, which is predicted byour sequence assignment shown in Fig. 4. In the case of 17.6,the corresponding sequence appears to be less homologous, butan 8-bp-long segment, A-A-A-A-C-T-C-A, exactly identical tothe nucleotide sequence upstream from the C-A in B104, canbe found at the expected position in the duplicated segment(Figs. 4, 5, and 6A). We believe that these results strongly jus-tify our correspondence of the LTR sequences between AL-SV and Drosophila elements 297 and 17.6. A putative termi-nation signal of RNA synthesis (22), T-T-G-X, also can be foundin both 297 and 17.6, which appears to correspond to T-T-G-Ain the AL-SV sequence (labeled f in Fig. 4). These resultsstrongly suggest that the LTRs of both 17.6 and 297 not onlycontain nucleotide sequences highly similar to the regulatorysequences for transcription in AL-SV at the corresponding po-sitions but also can be divided into three regions, putative U3,R, and U5 as illustrated in Figs. 4 and 6A.

Like 297(10), 17.6 seems to have nucleotide sequences cor-responding to a primer binding site (region g in Fig. 4) at oneof the internal junctions of the LTRs. As shown in Fig. 4, thesequence homology in the region of the primer binding sitebetween 17.6 and 297 is about 90%, whereas the homology be-tween similar regions of AL-SV and 17.6 and between AL-SVand 297 is 67% and 57%, respectively. In particular, all of thesesequences are bounded by T-G-G and G-A-T, which appear tocorrespond to the tRNA conservative sequences (25), terminalC-C-A and X-U-C in a T-q-C loop, respectively, although so farno tRNA has been found that shows homology to the putativeprimer binding sites of 17.6 and 297. Extremely high sequencehomology also was observed at the other internal junction ofthe LTRs, where the purine-rich sequence is in the underlinedposition labeled a in Fig. 4.

Sequence Homology Among copia-like Movable GeneticElements. Further inspection of the nucleotide sequencesshowed that copia-like elements other than 297 and 17.6 alsoshowed some homology to AL-SV, 17.6, and 297. However, inthese cases, we occasionally had difficulty in aligning DNA seg-ments more than 20-bp long. Lines 4-8 in Fig. 6B show onlythe homologies that we think are most interesting because oftheir locations in regions corresponding to the crucial regula-tory sequences for reverse transcription in AL-SV. The purine-rich sequences (regions labeled a) with their flanking regionsin mdg 1 (Fig. 6, line 4) and 412 (line 5) are identical and goodmatches with the counterpart of AL-SV (line 1) if two emptybases are inserted at the junctions of the LTRs and the ends ofthe internal bodies. In contrast, the nucleotide sequence cor-responding to the purine-rich sequence in mdg 3 (line 6) ap-pears to vary considerably, although six and seven nucleotidesat the very end of its LTR are exactly the same as the coun-terparts in AL-SV (line 1) and 412 (line 5) [or mdg 1 (line 4)],respectively.

The degree of homology in the primer binding sites (regionsg in Fig. 6B) between AL-SV (line 1) and mdg 1 (line 4) is 72%if a 2-bp-long insertion, A-A, in mdg 1 can be left out of con-sideration. The putative primer binding site of 412 (line 5) has90% homology to that of mdg 1, although two nucleotides, which,appear to correspond to U-C in a T-f-C loop of tRNA, are de-leted. B104 (Fig. 6, line 7) also contains similar sequence in theexpected region, whereas copia (line 8) and mdg 3 (line 6) ap-pear to have large deletions or extensive base substitutions (orboth) in the corresponding areas.

DISCUSSIONWe identified a new species of Drosophila copia-like movablegenetic element termed 17.6 and determined the nucleotidesequence of its LTRs. The LTRs of 17.6 were not only ho-mologous to those of 297, a sibling movable genetic element of17.6, but also closely matched those of the avian retrovirus AL-SV, allowing us to identify the nucleotide sequences in 17.6 and297 that correspond to the crucial regulatory sequences for bothtranscription and reverse transcription in AL-SV. It is very un-likely that these remarkably high similarities at the level of DNAsequence are coincidental. Copia-like movable genetic ele-ments other than 17.6 and 297 also were found to contain sim-ilar, putative regulatory sequences. However, at the same time,apparent deficiencies in sequence have been detected in thepurine-rich sequence in mdg 3 and in the primer binding sitein copia and mdg 3, suggesting that the degree of sequencesimilarity to AL-SV provirus varies from element to elementand that both copia and mdg 3 belong to the category of copia-like elements that are much less related in sequence to AL-SVprovirus than are the elements such as 17.6 and 297. The pres-ence of short sequence homologies to retrovirus proviruses inthe regions other than the regulatory sequences also have beenreported in some copia-like elements by other investigators (1,11, 18).

In a separate experiment, we recently have succeeded in iso-lating retrovirus-like particles from D. melanogaster cells (15).The isolated retrovirus-like particles were found to contain notonly 5-kb-long copia RNA that could be translated in vitro intopolypeptides immunologically related to the particles but alsoto be associated with reverse-transcriptase-like activity. There-fore, although the possibility cannot necessarily be excludedthat copia-like elements are precursors to RNA tumor viruses(26), it seems quite natural to imagine that copia-like movablegenetic elements originated from infection of a progenitor Dro-sophila by a retrovirus from which the present-day avian retro-

Proc. Nad Acad. Sci. USA 80 (1983)

Page 5: Close relationship between the long terminal repeats of avian

Proc. NatL Acad. Sci. USA 80 (1983) 3197

viruses were derived and that the retrovirus proviruses longsince entrapped in the Drosophila chromosomes have lost theiroriginal properties to various degrees in evolution. A recentfinding (6) that copia-like movable genetic elements are veryunstable in the course of evolution in spite of their increasedcopy numbers [e.g., 30 per haploid chromosome in the case of297 (1)] also may support the above-mentioned idea that copia-like elements are viral in origin.The degree of homology between the LTRs of 17.6 (or 297)

and AL-SV was estimated to be about 60% when AL-SV se-quence was used as a standard. About 73% of the LTR nu-cleotide sequence of 297 also was found to be explainable bythat of 17.6. On the other hand, 65% of the nucleotide se-quence of the LTR of the endogenous chicken retrovirus lo-cated at a site ev 1 corresponded to that of an exogenous avianretrovirus, AL-SV (ARAV 2-2) (18). It also has been speculatedthat the introduction of the endogenous retrovirus such as thatat ev 1 into the progenitor of the domestic chicken, Gallus gal-ls, occurred probably by infection of exogenous retrovirus (AL-SV) within the last million years (27). Thus, although we are notcertain of the rate of the structural changes in the LTRs of themovable genetic elements of Drosophila, it is likely that AL-SV and 297 (or 17.6) branched off before the introduction of theendogenous retroviruses into the progenitor chicken, whereasthe branching of 17.6 and 297 occurred more recently than thatbetween endogenous (ev 1) and exogenous (AL-SV) retrovi-ruses in chicken.

We thank Prof. Y. Takagi for encouragement and discussion. We alsoare grateful to K. Hattori and S. Inouye for restriction enzyme map-ping. This work was supported by grants from the Ministry of Edu-cation, Science and Culture of Japan to K. S.;W K. is on leave from andsupported by Fuji Oil Co., Ltd., whereas H.I. is on leave from KirinBrewery Co., Ltd.

1. Rubin, G. M., Brorein, W J., Jr., Dunsmuir, P., Flavell, A. J.,Levis, R., Strobel, E., Toole, J. J. & Young, E. (1981) Cold SpringHarbor Symp. Quant. BioL 45, 619-628.

2. Finnegan, D. J., Rubin, G. M., Young, M. W & Hogness, D. S.(1978) Cold Spring Harbor Symp. Quant. BioL 42, 1053-1063.

3. Shimotohno, K. & Temin, H. M. (1981) Cold Spring Harbor Symp.Quant. Biou 45, 719-730.

4. Shoemaker, C., Gott, S., Gilboa, E., Paskind, M., Mitra, S. W& Baltimore, D. (1981) Cold Spring Harbor Symp. Quant. BioL45, 711-717.

5. Kulguskin, V. V., Ilyin, Y. V. & Georgiev, G. P. (1981) NuceicAcidsRes. 9, 3451-3463.

6. Tchurikov, N. A., Ilyin, Y. V., Skryabin, K. G., Ananiev, E. V.,Bayev, A. A., Jr., Krayev, A. S., Zelentsova, E. S., Kulguskin, V.V., Lyubomirskaya, N. V. & Georgiev, G. P. (1981) Cold SpringHarbor Symp. Quant. Biou 45, 655-665.

7. Fink, G., Farabaugh, P., Roeder, G. & Chaleff, D. (1981) ColdSpring Harbor Symp. Quant. BioL 45, 575-580.

8. Eibel, H., Gafner, I., Stotz, A. & Philippsen, P. (1981) Cold SpringHarbor Symp. Quant. BioL 45, 609-617.

9. Will, B. M., Bayev, A. A. & Finnegan, D. J. (1981) J. MoL BioL153, 897-915.

10. Ikenaga, H. & Saigo, K. (1982) Proc NatL Acad. Sci. USA 79, 4143-4147.

11. Scherer, G., Tschudi, C., Perera, J., Delius, H. & Pirrotta, V.(1982)J. MoL BioL 157, 435-451.

12. Sutcliffe, J. G., Shinnick, T. M. & Lerner, R. A. (1981) Cold SpringHarbor Symp. Quant. BioL 45, 707-710.

13. Majors, J. E., Swanstrom, R., Delorbe, W J., Payne, G. S.,Hughes, S. H., Qrtiz, S., Quintrell, N., Bishop, J. M. & Varmus,H. E. (1981) Cold Spring Harbor Symp. Quant. BioL 45, 731-738.

14. Flavell, A. J. & Ish-Horowicz, D. (1981) Nature (London) 292, 591-595.

15. Shiba, T. & Saigo, K. (1983) Nature (London) 302, 119-124.16. Saigo, K., Millstein, L. & Thomas, C. A., Jr. (1981) Cold Spring

Harbor Symp. Quant. Biol 45, 815-827.17. Goldberg, M. L. (1979) Dissertation (Stanford Univ., Palo Alto,

CA).18. Skalka, A., Ju, G., Hishinuma, F., DeBona, P. J. & Astrin, S. (1981)

Cold Spring Harbor Symp. Quant. Biol 45, 739-746.19. Swanstrom, R., DeLorbe, W J., Bishop, J. M. & Varmus, H. E.

(1981) Proc. Natl Acad. Sci. USA 78, 124-128.20. Van Beveren, C., Van Straaten, F., Galleshaw, J. A. & Verma, I.

M. (1981) Cell 27, 97-108.21. Shimotohno, K., Mizutani, S. & Temin, H. M. (1980) Nature

(London) 285, 550-554.22. Temin, H. M. (1981) Cell 27, 1-3.23. Yamamoto, T., de Combrugghe, B. & Pastan, I. (1980) Cell 22,

787-797.24. Breathnach, R. & Chambon, P. (1981) Annu. Rev. Biochemn. 50,

349-383.25. Sprinzl, M. & Gauss, D. H. (1982) Nucleic Acids Res. 10, rl-r56.26. Temin, H. M. (1980) Cell 21, 599-600.27. Tereba, A., Chittenden, L. B. & Astrin, S. M. (1981)]. ViroL 39,

282-289.

Biochemistry: Kugimiya et al