sequence heterogeneity within the human alphoid repetitive ... · volume u number 5 1986 nucleic...

16
Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family P.Devilee, P.Slagboom, C.J.Comelisse 1 and P.L.Pearson Department of Human Genetics and 'Department of Pathology, University Medical Center, Leiden, The Netherlands Received 26 November 1985; Revised and Accepted 13 February 1986 ABSTRACT. We have cloned and determined the base-sequence and genome organization of two human chromosome-specific alphoid DNA fragments, designated LI.26, mapping principally to chromosomes 13 and 21, and LI.84, mapping to chromosome 18. Their copy number is estimated to be approximately 2,000 per haploid genome. LI.84 has a double-dimer organization, whereas LI.26 has a much less defined higher order tandem organization. Further, we present evidence that the restriction-site spacing within the alphoid DNA family is chromosome specific. From sequence analysis, clones LI.26 and LI.84 are found to consist of 5 and 4 tandealy duplicated 170 bp monomers. Cross-homology between the various monomers is 65-85%. The analysis suggests that the evolution of tandem-arrays does not take place via a defined 340 bp unit, as was inferred by others, but via circularly permutated monomers or multimers of the 170 bp unit. INTRODUCTION. Restriction enzyme analysis has shown that human DNA contains many families of repeated DNAs (1-4). These differ from one another with respect to genomic organization, repeat-lengths and copy number. The Alul-family, for example, comprises approximately 300,000 copies of a short (300 bp) sequence, inter- spersed among stretches of unique sequences or genes (4). In contrast, the Kpnl family is Interspersed throughout the genome in longer fragments with a lower repetition frequency (3). The alphoid DNA family (5), so termed because of its homology to the alpha component in the African Green Monkey (6) Is an example of a different type of organization, characterized by long arrays of tandemly repeated 170 bp units. It is commonly referred to as "satellite"-DNA, although it is distinct in sequence from any of the human satellite DNA peaks obtained after isopycnic centrifugations (7). In man, the alphoid family forms a pronounced 340 bp and 680 bp band in ethidium bromide stained gels of EcoRI digested genomic DNA. When partial digests are analyzed by Southern blotting, using the 340 bp frag- ment as probe, a "ladder" of bands is observed, the steps of which correspond to multiples of 170 bp (8). Densitometer scanning suggests that the 340 bp © I R L Press Limited, Oxford, England. 2059 at Leiden University on August 17, 2011 nar.oxfordjournals.org Downloaded from

Upload: others

Post on 27-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Volume u Number 5 1986 Nucleic Acids Research

Sequence heterogeneity within the human alphoid repetitive DNA family

P.Devilee, P.Slagboom, C.J.Comelisse1 and P.L.Pearson

Department of Human Genetics and 'Department of Pathology, University Medical Center, Leiden,The Netherlands

Received 26 November 1985; Revised and Accepted 13 February 1986

ABSTRACT.We have cloned and determined the base-sequence and genome organization of

two human chromosome-specific alphoid DNA fragments, designated LI.26, mappingprincipally to chromosomes 13 and 21, and LI.84, mapping to chromosome 18.Their copy number is estimated to be approximately 2,000 per haploid genome.LI.84 has a double-dimer organization, whereas LI.26 has a much less definedhigher order tandem organization. Further, we present evidence that therestriction-site spacing within the alphoid DNA family is chromosome specific.From sequence analysis, clones LI.26 and LI.84 are found to consist of 5 and 4tandealy duplicated 170 bp monomers. Cross-homology between the variousmonomers is 65-85%. The analysis suggests that the evolution of tandem-arraysdoes not take place via a defined 340 bp unit, as was inferred by others, butvia circularly permutated monomers or multimers of the 170 bp unit.

INTRODUCTION.

Restriction enzyme analysis has shown that human DNA contains many families

of repeated DNAs (1-4). These differ from one another with respect to genomic

organization, repeat-lengths and copy number. The Alul-family, for example,

comprises approximately 300,000 copies of a short (300 bp) sequence, inter-

spersed among stretches of unique sequences or genes (4). In contrast, the

Kpnl family is Interspersed throughout the genome in longer fragments with a

lower repetition frequency (3).

The alphoid DNA family (5), so termed because of its homology to the alpha

component in the African Green Monkey (6) Is an example of a different type of

organization, characterized by long arrays of tandemly repeated 170 bp units.

It is commonly referred to as "satellite"-DNA, although it is distinct in

sequence from any of the human satellite DNA peaks obtained after isopycnic

centrifugations (7). In man, the alphoid family forms a pronounced 340 bp and

680 bp band in ethidium bromide stained gels of EcoRI digested genomic DNA.

When partial digests are analyzed by Southern blotting, using the 340 bp frag-

ment as probe, a "ladder" of bands is observed, the steps of which correspond

to multiples of 170 bp (8). Densitometer scanning suggests that the 340 bp

© I R L Press Limited, Oxford, England. 2059

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 2: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

band represents O.75Z of the genome, corresponding to about 55,000 copies (8).

Recently, it has become apparent that the alphoid DNA family can be divided

into subfamilies, some of which may be characteristic of specific chromosomes.

Thus, the EcoRI-dimer, described by Manuelidis (9) is located predominantly in

the centromere regions of chromosomes 1,3,7,10 and 19. A 2.0 kb BamHI-fragment

is similarly specific for the X-chromosome (10), while a 5.5 kb EcoRI-fragment

characterizes the Y-chromosome (11). We have isolated two alphoid sequences,

designated LI.26 and LI.84, and have found them to be principally localized to

the pericentric regions of chromosomes 13 and 21 (LI.26) and chromosome 18

(LI.84) (12). In this article we present the sequence analyses of LI.26 and

LI.84 which show that they consist of 5 and 4 tanderaly organized alphoid sub-

units respectively, each approximately 170 bp long. Within the 170 bp units,

some regions appear conserved while others are more variable.

Comparison of chromosome specific members of the alphoid DNA family will

give insight into the evolutionary constraints Imposed on DNA sequences adja-

cent to the centromere.

MATERIALS AND METHODS.

DNA sources and preparations.

Genomic DNA was isolated from cell lines or lymphocytes as described (13).

Recombinants LI.26 and LI.84 were selected from a random human recombinant

DNA-library (14), containing EcoRI-inserts from DNA restricted to completion

cloned in plasmid pAT153. Plasmid-DNA was prepared according to the methods of

Maniatis et al. (15).

Cell lines.

Human-rodent somatic cell hybrids were obtained as described earlier (16).

A hamster hybrid cell line with the X-chromosome as the only retained human

material, was a kind gift of Dr. S. Goss, Dunn School of Pathology, Oxford.

Southern blotting and hybridizations.

Genomic DNA was digested with restriction enzymes as recommended by the

supplier, in a final volume of 20 /il. To ensure complete restriction, a three-

fold excess of enzyme-units was added. Digestion was monitored in a parallel

aliquot with phage lambda-DNA as internal marker. After three hours of incuba-

tion, the samples were Incubated 10 minutes at 65'C, and loaded and electro-

phoresed on 0.8% agarose gels in Tris-acetate. The separated DNA was blotted

onto nylon filters (Gene Screen, New England Nuclear) using standard proce-

dures (17). Overnight hybridization and subsequent washing of the filters was

performed at 65'C as described by Jeffreys and Flavell (18). The hybridiza-

2060

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 3: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

tion-mixture contained 20 mM Tris-HCl pH 7.5, 2 mM EDTA, 3x SSC (0.45 M NaCl,

0.045 H Na-citrate), 0.1 mg/ml salmon sperm DNA, lOx Denhardts Solution, (0.2%

ficol, 0.2% BSA, 0.2% polyvinylpyrolidon), 0.1Z SDS, 5% dextran-sulphate and 532

ng/ml P-nicktranslated probe (19). Exposure was at -70'C on Sakura film

backed by an intensifying screen.

Sequencing-strategy.

Both recombinants LI.26 and LI.84 were sequenced using the dideoxy chain

termination method (20). The inserts were recloned into the EcoRI-slte of

M13mp8 (21), and single-strand recombinant phages cultured, containing op-

posite insertstrands. Thus, 250 bp from each end could be sequenced. The inner

200 bp of LI.84 were sequenced from an EcoRI-Rsal fragment (fig.5) subcloned

in M13mpl0. Subcloning of LI.26 was as follows. The recombinant pAT153 plasmid

was linearized at the Hindlll site. The insert contains no Hindlll sites. In

the resulting linear fragment, the insert Is at one extreme end. This was

treated for various times with the exonuclease Bal-31 (Boehringer Mannheim),

and the DNA-fragments blunt-ended with DNA polymerase I (Klenow-fragment,

Boehringer). Subsequent digestion with EcoRI yielded EcoRI-blunt Insert frag-

ments progressively shortened from one EcoRI-site by approximately 150 bp.

These were also subcloned in M13mpl0 and single-stranded phages isolated (21).

The sequencing reactions were carried out with S-dATP (Amersham, 600

Ci/mmol) as label (22), using the New England Nuclear protocols. A 17-base

primer was the kind gift of dr. Van Boom, University of Leiden, The Nether-

lands. Deoxy-and dideoxy nucleotides were supplied by Boehringer Mannheim.

Sequencing-gels were dried on a BioRad slabgel-dryer and exposed overnight at

room-temperature on Sakura X-ray film.

RESULTS.

LI.26 and LI.84 belong to the alphoid DNA family.

Two independent dot-blot experiments are shown in fig.l, from which the

copy number of LI.26 is estimated to be approximately 2,000 per haploid

genome. It is unlikely that sequences with less than 95% homology will be

detected under the applied hybridization conditions (see Materials and

Methods; final washing of the filter in O.lx SSC). Densitometer scans (not

shown) indicate that approximately 60T of hybridizing signal is contained in

EcoRI fragments localized on chromosomes 13 and 21 (fig.4, and ref.12). The

results with LI.84 (not shown) were similar. When total human genomic DNA is

partially digested with EcoRI, blotted and hybridized with either LI.26 (fig.

2, panel A) or LI.84 (panel B) , a ladder of bands is formed early during

digestion. The lengths of these bands correspond to multiples of approximately

2061

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 4: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

a

1 m

2#

3 •

4 •

5 •

6

7

8

Figure 1. Dot-blot experiment using LI.26, recloned in M13mp8,as probe. The equivalent of 100 (1); 250 (2); 500 (3); 1,000(4); 2,500 (5); 5,000 (6); 7,500 (7) and 10,000 (8) copies ofthe insert-fragment of LI.26 per haploid genome was spotted induplo (a,b) on nylon Gene Screen membrane. One «g of totalgenomic DNA (46,XX) was spotted eight times as a reference (c).Probe was labeled by primer extension (ref.21). Filter wasexposed for 4 hr.

a b c d e f a b c d e

-12-

- ID-

'S -

- 6-

Flgure 2. Southern analysis of partial digests of 5 wg/lane total genomic DNAobtained with EcoRI, using LI.26 (panel A) or LI.84 (panel B) as probe. Num-bers between the panels indicate multiples of 170 bp. Extent of digestionincreases progressively from lane a (1/16 x complete) to f (2 x complete) inboth panels. Exposure time was two days.

2062

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 5: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

kb4.4-

3.3-

a

• *

b c d e f g h

-•in2.1-

1.1-

0.66->

Figure 3. Southern blot of total genomic DNA hybridized with LI.26.Restriction enzymes used are: TaqI (a); Xbal (b); Kpnl (c); Haelll (d);EcoRI (e); BamHI (f); Hindlll (g) and Bgl II (h). Exposure was overnight.

170 bp, indicating that LI.26 and LI.84 belong to the tandemly organized

alphoid DNA family. In completely digested samples, LI.26 and LI.84 hybridize

to the same ladder-pattern, but with different intensities per band. The

largest detectable multimer is a 16-mer in both instances; a fraction of

alphoid DNA remains resistant to EcoRI-restriction. Although the 340 bp band

is very pronounced in ethidium bromide stained gels, it hybridizes weakly with

either probe.

Apparently, LI.84 is organized predominantly as a tetramer; multiples

thereof (8-,12-,16-,20-mer) are detected early in the course of digestion

(Panel B, lane b) and the 4-mer and 8-mer are major bands in completely diges-

ted samples (Panel B, lane f). LI.26 does not cross-hybridize significantly

with the tetrameric higher order multimers of LI.84 (Compare lanes b, both

panels). Its organization is more complex: several longer multimers appear

simultaneously (Panel A, lane c) and are converted with different kinetics

into 8-, 5- and 4-mers. LI.26 contains a 5-mer, which Is 0.85 kb in length,

2063

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 6: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

M a b c B a b c a b c d e f

I2.1

1.1

kb

23-

1.5-

1.1-

0.7-

2.1-

i

-4F

Figure 4. Southern analysis of 10 /'g of hybrids Cl 2D (panel A) and 34-2-3 B3(panel B), containing chromosome X and chromosome 13 as their only retainedhuman material respectively, using LI.26 as probe. Digestions were with EcoRI(panel A, lane a; panel B, lane c), BamHI (panel A, lane c; panel B, lane a)or with both (panels A and B, lanes b). 'M' is marker. Panel C shows partialdigestion of total genomic DNA with BamHI (5 /Jg/lane). Extent of digestion in-creases progressively from lane a (1/16 x complete) to lane f (2 x complete).Exposure times: Panel A six days, Panel B three days, Panel C two days.

and LI.84 contains a 4-mer, 0.68 kb in length (see below). Digestion of

genomic DNA with other endonucleases further supports the tandem organization:

most enzymes produce ladders with the same fragment-lengths as EcoRI (fig.3).

Specific repeat-lengths reside on specific chromosomes.

Although LI.26 is mainly restricted to chromosomes 13 and 21 (12), it also

detects homologous alphoid sequences on the X-chromosome. The organization,

however, of alphoid sequences on chromosomes X and 13 is clearly different and

indicates that they followed a distinct evolutionary history. When LI.26 is

hybridized to a hybrid cell line containing a single human X-chromosome, it is

found that the tandem-structures are organized in large EcoRI fragments of

2064

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 7: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

2-10 kb (fig.4, panel A, lane a). In contrast, in a chromosome 13 hybrid,

EcoRI-fragment8 are mostly 0.68 and 0.85 kb in length, with a few multimers up

to 1.7 kb (panel B, lane c), consistent with the overall organization of LI.26

(fig.2). Similarly, BamHI sites are almost absent from LI.26 and its homologs

on chromosome 13 (fig.4, panel B, lane a), while the homologs on the X-chromo-

some all show up as BamHI-multimers of 2.1, 3.0 and 4.0 kb (panel A, lane c).

Subsequent digestion with EcoRI reduces these multimers somewhat and yields a

1.6 kb fragment (Panel A, lane b). Partial digests of genomic DNA with BamHI

(fig.4, panel C), shows that the organization of LI.26 homologs in the total

genome is similar to their organization on chromosome 13 or the X-chromosome:

either in very large fragments or in tandems of approximately 2 kb units.

Heterogeneity at the sequence level.

1. Organization.

The base sequence of both LI.26 and LI.84 is presented in fig.5; their

respective lengths are 849 bp and 684 bp. Both sequences reflect the tandemly

repeated organization of alphoid DNA. Most restriction-sites appear with a 170

bp spacing: Hinfl at positions 288, 629 and 798 in LI.26; Ddel at positions

43, 213, 379 and 547 in LI.84. A comparison with two monomeric EcoRI-units

reported by Wu and Manuelidis (1), termed here a-1 and a-2, shows that the

homology with these monomers starts at an EcoRI-like recognition sequence at

position 26 in LI.26 and position 39 in H.84 (elaborated in fig.6B). This

shift in EcoRI restriction sites results In the last 142 bp of both sequences

being homologous to the first 142 bp of a-1 and a-2. Similarly, the first 25

bp of LI.26 and LI.84 are homologous to the last 25 bp in both the a-repeats.

Between these regions H e several complete monomers, 4 in LI.26, 3 in LI.84.

Their lengths are 171 bp, or slightly less, the shortest unit being 166 bp

within LI.84 (position 209-374). However, the new phase of EcoRI sites with

respect to the a-1 and a-2 units does not break up the typical 170 bp spacing

of EcoRI sites In alphoid DNA. Fig.6A shows the position of average nucleotide

homology relative to EcoRI restriction sites in LI.26 and LI.84 relative to a

hypothetical tandem-structure of a-1 units.

An interesting feature of the sequence of LI.84 is the presence of a 14 bp

direct repeat at position 16 (arrows fig.5). If a-1 and/or a-2 were tandemly

repeated within LI.84, this small perfect direct repeat would be located at

the border between two of the repeated units, thereby disturbing their con-

tiguous organization. This direct repeat is the reason why LI.84 itself is

somewhat longer than 4 times 170 bp.

2065

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 8: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

L.1.2610 _A 20 ENDjf 1 T Q 40 50 60 70 V H 80 90

AATTCAAATA AAAGGTAGAC AGCAGCATTC TCAGAAATTG CTTTCTGATG TCTGCATTCA ACTCATAGAG TTGAAGATTC CCTTTCATAG

100 110 120 130 140 150 161 170 180AGCAGGTTTG AAACACTCTT TCTGGAGTAT CTGGATGTGG ACATTTGGAG CGCTTTGATG CCTACGGTGG AAAAGTAAAT ATCTTCCCAT

190 f2 ^ 210 220 T R T D 2*° 2 5 0 2 6 ° 2 7 °AAAAACGAGA CAGAAGGATT CTCAGAAACA AGTTTGTGAT GTGTGTACTC AGCTAACAGA GTGGAACCTT TCTTTTTACA GAGCAGCTTT

2 8 0 T H 3 0 0 3 1 0 3 2 0 3 3 0 3 4 0 3 5 0 X T 3 6 0GAAACTCTAT TTTTGTGGAT TCTGCAAATT GATATTTAGA TTGCTTTAAC GATATCGTTG GAAAAGGGAA TATCGTCATA CAAAATCTAG

f3 3 8 0 390 400 4 1 0 4 2 0 4 3 0 4 4 0 4 5 0ACAGAAGCAT TCTCACAAAC TTCTTTGTGA TCTGTGTCCT CAACTAACAG AGTTGAACCT TTCTTTTGAT GCAGCAATTT GGAAACACCT

4 6 0 4 7 0 480 490 500 510 520 x ^ 5 3 0 * 4

TTTGGTAGAA AATGTAAGTG GATATTTGGA TAGCTTAACG ATTTCGTTGG AAACGGGAAT ATCATCATCT AAAATCTAGA CAGAAGCACT

5 5 0 5 6 0 5 7 0 580 5 9 0 6 0 0 610 6 2 0 T HATTAAGAAAC TACTTGGTGA TATCTGCATT CAAGTCACAG AGTTGAACAT TCCCTTACTT TGAGCACGTT TGAAACACTC TTTTGGAAGA

6 4 0 6 5 0 660 670 680 690 700 ^5i"°'w^D ""''^ATCTGGAAGT GGACATTTGG AGCGCTTTGA TGCCTTTGGT GAAAAGGAAA CGTCTTCCAA TAAAAGCCAG ACAGAAGCAT TCTCAGAAAC

7 3 0 R T 7 4 0 7 5 0 760 7 7 0 7 8 0 7 9 0 T H 8 1 0TTGTTCGTGA TGTGTGTACT CAACTAAAAG AGTTGAACCT TTCTATTGAT AGAGCAGTTT TGAAACACTC TTTTTGTGGA TTCTGCAAGT

8 2 0 8 3 0 8 4 0 849GGATATTTGG ATTGCTTTGA GGATTTCGTT GGAAGCGGG

L1.8410 ENDJ 30 | 1 T o 50 60 70 80 90

AATTCATCAA ATTGCAGACT GCAGCGTTCA GACTGCAGCG TTCTGAGAAA CATCTTTGTG ATGTTTGTAT TCAGGACACC AGAGTTGAAC

100 IIO T H I 2 0 130 140 1 5 0 160 170 ISOATTCCCTATC ATAGAGCAGG TTTGAATCAC TCCTTTTGTA GTATCTGGAA GTGGACATTT GGAGGCTTTC AGGCCTATGT TGGAAAAGGA

190 2 0 0 f 2 T D 220 230 RT 240 T D 250 260 270

AATATCTTCC ATAACAACTA GACAGAAGCA TTCTCAGAAC TTATTTGAGA TGTGTGTACT CACACTAAGA GAATTGAACC ACCGTTTTGA

2 8 0 2 9 0 T H 310 3 2 0 3 3 0 3 4 0 3 5 0 3 6 0AGGAGCAGTT TTGAAACACT CTTTTTCTGG AATCTGCAM GTGGATATTT GGCTAGCTTT GGGGATTTCG CTGGAACGGA ATACATATAA

3 7 0 f 3 T D 390 400 410 420 430 440 450

AAAGCACACA GCAGCGTTCT GAGAAACTGC TTTCTGATCT TTGCATTCAA GTCAAAAGTT GAACACTCCC TTTCATAGAG CAGTCCTGAA

4 6 0 4 7 0 4 8 0 4 9 0 WD 5 1 0 5 2 0 5 3 0 T HACACTCTTTT GTAGTATCTG GAACTGGACT TTTGGAGCGC TTTCAGGGCT AAGGTGAAAA AGAAATATCT TCCCATAAAA ACTGGACAGA

#4<notfuilm«th> 560 570 580 590 600 610 620 630ATCATTCTCA GAAACTrGTT TATGCTGTAT CTACTCAACT AACAAAGTTG AACCTTTCTT TTGATAGAGC AGTTTTGAAA TGCTCTTTTT

T H MO 6 5 0 6 6 0 6 7 0 6 8 0GTGGAATCTG CAAGTGGATA TTTGGTTAGT TTTGAGGATT TCGTTGGAAG CGGG

Figure 5. Complete nucleotlde sequences of LI.26 and LI.84. Numbered arrowsabove the sequences indicate the s t a r t of homology to the 171 bp alphoidreported consensus sequence ( r e f . l ) . 'END' marks the end of homology to thelas t 25 bp of the l a t t e r . Res t r ic t ion-s i tes are indicated by f i l led t r i ang les .These a r e : AccI (A); Ddel (D); Hinfl (H); Rsal (R) and Xbal (X). A 14 bpdirect repeat at position 16 in LI.84 i s indicated by horizontal arrows.

2. CroBS-homologies.

We have aligned the monomer units of LI.26 and LI.84 and compared them to

several reported alphoid units (Fig.6B and Table I ) . These are a-l and a-2

(1) ; a-X, the consensus sequence from the human X-chromosome (23); a-Y, a

monomer or iginat ing from the Y-chromosome (11); and SPC-1, a monomer detected

on small polydisperse circular DNA (24) . The cross-homology between the units

2066

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 9: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

E

LIH E i t :

E ~

E

|26

39

171

—171 —

171

»

1,E E E

U 2 .E

E

U2 .! E

E

79

10 20 30 40 50 60 70 80 90 100GATTCTCAGA AACTCCTTTG TGATGTCTCC GTTCAACTCA CACAGTTTAA CCTTTCTTTT CATAGACCAG TTAGGAAACA CTCTGTTTCT AAACTCTCCA

LI

LI

LI

a-Xa-Y

SPC-1. 2 6 - 1

- 2- 3

- 5. 8 4 - 1

- 2- 3- 4

a-2a - la-Xa-Y

26—1-2- 3-4- 5

8 4 - 1-2- 3-4

CCTACTC

CC C ACCGCCCc

GG

C

T AG

G

G

110AGTGGATATT

A

A T

C

G

C

C

CT

TT CA

TT C C

AGT

ACTAC

TG C

ATTA

TC A

120CAGACCTCTT

TCAG G

TG H

TC G CT

T TTG

TG TAG

TG G G

TG TTG

TG-CAGG

TC CTAC

TC G G

TC TTACT

A

C

ATA

C

A C

T

A

CAT

r cT AC

T CC

T AC

T A

T

CT AC• • *

130

TTT

G

CT

AA

ACCA

•4

A

A

140TGACGCCTTC GTTGGAAACC

A C

A - C

T

CG

C

TmT

AAT G

AT TC

LI

ATAAT

AT

ATGAT

T

AT

AA

G

LI

CG

A

.26

-

.84

-

TA

A

T

A

G

A

GGGGC

CCG

GG

A GGG

150CCATT-TCTT

AV iA 1

START.

c

AC

A:

START:A

c-AA

G

AAA

-

^ G

K A

am LI .t A

\ A

- A - A -

K A

:EHB L I .

GA C

A C

TA C

GA

A

C C

c cACCC

m

A

AA

160CATATTATG-

C

- t

ACAAA

AA

I A

CAC A

CA A

26

AAAAAA

AAA

AATAATAAG

A CA A

C—

AA

CC A

84

CAAAAAAAA

C

G

G T

C T

T C

G GC

G

C AG

C

TA C

AC

.

170C T A J C A C A C A A

AC

c

c- -

c cG

C

CG

C

ACTTTM

GC

G

C

GT

T

T

TTTTTTTT

TTTTTT

TT

TT

G T TG A T A

TC GT A T

T C

G

r G

rrr cr ,

p o s i t i o n ( f i g . 5 )

1-2526-196

197-367368-536537-607608-849

1-2739-208

209-374375-542543-684

GG A

G A

C A

G A

CTACC T

G A

CC T

GTAGG A

GTJ

GC I

Figure 6. A. Position of average nucleotide homology relative to restrictionsites in LI.26, LI.84 and a-X relative to a hypothetical tandem-structure ofa-l units. B. Comparison of the monomer sequences of LI.26, LI.84, the humana-dimer (ref.l), a-X (ref.23),a-Y (ref.ll) and SPC-1 (ref.24). Comparison ismade relative to a-2. Only bases which differ from this sequence are shown.Deletions are indicated by (-), positions where more than three base changesoccur by (•). For maximum alignment, bases 18, 80, 243, 310 and the 14 bprepeat were deleted from LI.84, base 14 was deleted from LI.26. Monomer num-bers of LI.26 and LI.84 correspond to numbering in fig.5.

of our probes and a-l and a-2 was found to vary between 687 and 827 with an

average of 757 and 737 for LI.26 and LI.84 respectively (Table I). Slightly

lower homologies were noted between our probes and SPC-1 (727 and 707 resp.),

and a-Y (717 and 727 resp.), whereas the a-X sequence seems somewhat more

related at approximately 807 homology in both instances. Between LI.26 and

LI.84 there exists an overall cross-homology of 757, although much higher

homologies can be detected when smaller regions are compared (e.g. 927 between

2067

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 10: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

Table I. Sequence homologies between the monomers of LI.26 and LI.84, bothmonomers of the human a-dimer (ref.l), the consensus a-X monomer (ref.23),a monomer found on small polydisperse circular DNA (ref.24), and a-Y, amonomer derived from the human Y-chromosome (ref.ll). Numbers above thediagonal represent percent identity of the two compared sequences. Numbersbelow the diagonal represent mean cross-homology of the sequences that fallwithin the boxed region. Monomer numbers of LI.26 and LI.84 correspond tonumbering in fig.5.

LI.26/1

LI.26/2

LI.26/3

LI.26/4

LI.26/5

LI.84/1

LI.84/2

LI.84/3

LI.84/4

a-1

u-2

a-X

a-Y

SPC-1

1

-

2

71

-

75

73

76

80

71

72

LI.

3

72

80

-

+ 10

+ 3

+ 6

+ 5

+ 2

± 2

26

4

81

65

69

-

5

75

85

85

69

-

1

82

72

71

77

73

-

2

69

73

75

65

85

67

-

74

72

79

72

70

LI. 84

3

84

67

68

75

71

82

65

-

+ 3

+ 5

+ 3

+ 2

+ 3

4

72

80

82

66

90

71

82

68

-

a-1

76

73

70

73

75

77

71

75

73

-

a-2

75

76

77

69

82

73

68

71

77

73

-

a-X

82

77

81

75

85

80

76

79

82

78

84

-

a-Y

74

69

70

70

73

75

70

73

70

75

71

81

-

SPC-1

74

70

73

70

73

72

67

71

70

69

73

78

76

the last 100 bp of both sequences, not shown). Thus, after comparing overall

homologies, it appears that no specific tandem-sequence reported so far is

significantly more related to any of the others. This may be explained by the

scattered distribution of base substitutions among the fourteen compared

monomer sequences (fig.6B). About 70Z of all positions underwent two or less

base changes.

Within LI.84, the first full length monomer is 827 homologous to the third

and 67Z and 71Z to the second and fourth respectively. The second monomer

shares 82% homology to the fourth monomer. This distribution of homologies is

suggestive of a basic 340 bp horaology as proposed by Wu and Manuelidis (1) for

the consensus alphoid structure. In LI.26, however, the various cross-homolo-

gies do not show such a spacing pattern; the first full length monomer is 811

homologous to the fourth monomer, while the second monomer is 80Z homologous

to the third. These two units are also closely related to the fifth (not full

2068

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 11: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

length) monomer. If homologies of more than 80% are grouped, another kind of

spacing, represented by 'a-b-b-a-b' may be proposed.

3. Sequence-conservation.

Because of the scattered distribution of base substitutions, conserved

regions are not easily defined when all alphoid sequences are compared. Only

when more variable positions are first defined (dots in fig.6B), do some small

relatively more conserved regions become apparent. These include positions

3-13, 17-26, 42-51, 75-89, 95-111 and 140-148 and show a clustering of pos-

itions where no base change occurred at all.

DISCUSSION.

We have examined the genomic organization of two human alpha-satellite

DNA-sequences, designated LI.26 and LI.84. They were previously shown to map

predominantly to chromosomes 13, 21 and 18 respectively (12). Several lines of

evidence suggest that both sequences represent distinct subgroups of the

alphoid DNA family:

(a). Under our hybridization conditions, the copy-number of LI.26 and LI.84 is

about 2,000 per haploid genome each. Since the 340 bp EcoRI-fragment is

estimated to be present in about 55,000 copies (8), this indicates that each

probe detects a small subset of alphoid DNA sequences.

(b). Southern hybridizations to EcoRI-digested genomic DNA show that both

probes hybridize to the same series of 170 bp multimers, but each produces a

signal of different intensity per band. Further, LI.84 Is largely organized

into tetrameric units whereas LI.26 has a more complex organization,

(c). Sequence analysis of LI.26 and LI.84 shows that they each have diverged

about 257 from the 340 bp EcoRI-fragment reported by Wu and Manuelidis (1),

which is the reason for their poor hybridization to this band. LI.26 and LI.84

also show a 25% sequence divergence between one another.

It seems, therefore, that this family is a highly heterogeneous collection

of sequences, all variations on a 170 bp motif. The members diverge in

sequence composition, but remain clearly related (fig.6). Digestion with EcoRI

(or several other enzymes) results in a distribution of the heterogeneous

units among the bands that form the ladder. Consequently, each step in the

ladder consists of several DNA fragments of similar length, but with different

base sequences. According to our results, this variation may amount to 351

(Table I). Hybridization with a representative of a specific alphoid subfamily

under stringent conditions lights up only the most homologous multimers. Thus,

probe representatives of two different alphoid subfamilies may hybridize to

2069

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 12: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

the same band in the ladder, though not necessarily because of cross-homology

to each other, but due to comigration of different genomic alphoid sequences

to the same position. However, some hybridization may also occur from cross-

homology since closely related monomers are scattered throughout the various

tandem-structures (Table I).

Since some monomers of LI.26 are over 80% homologous to the X-chromosomal

consensus unit of Waye and Willard (23), it is not surprising to find LI.26

hybridizing to DNA of a hybrid cell line containing the X-chromosome as its

only retained human material. Although longer exposure times are needed (see

legend fig.4), the obtained restriction pattern closely resembles the reported

one obtained with a chromosome X-derived alphoid sequence (10). This suggests

that virtually all X-chromosomal alphoid DNA is organized in 2.0 kb BamHI

units as described, although we cannot fully exclude the presence of distinct-

ly organized divergent sequences left undetected by both probes. Further, we

showed that chromosome 13 specific alphoid DNA is distinct from the X-chromo-

some in its organization of restriction sites. Thus, the sequence-heterogenei-

ty within the alphoid family is distributed in a chromosome-specific manner

with a characteristic restriction site spacing for each enzyme and chromosome.

The speculation that these subfamilies play a role in discriminating chromo-

somes from one another (25,26), is therefore attractive.

The survival of alphoid DNA in the genome during evolution has led to

suggestions that it may serve in chromosome structure (25); nucleosome arr-

angement (27); and homologous chromosome recognition (reviewed in 28). As yet,

none of these alleged functions has been confirmed. Alternatively, it may be

an evolutionary "hitchhiker" with no special function (29). Sequence analysis

reveals some aspects of alphoid DNA evolution. We have found a 14 bp direct

repeat within LI.84 at the border of two units defined by Wu and Manuelidis.

This 14 bp repeat may be a remnant of an unequal cross-over event. A recently

proposed model (30) explains how short repeats or deletions of this type may

have originated. Within a Holliday-structure, mlspairing may occur, the result

of neighbouring sequence homology, or because of hairpin structures within a

single DNA strand. An Imperfect 14 bp stem-structure (AGAAACATCTTTGT at 46)

downstream of the 14 bp direct repeat may, by folding back, have been the

cause of the duplication. However, both investigated sequences are approx-

imately 25 bp out of phase compared to the 170 bp unit described by Wu and

Manuelidis (1). The question arises what are borders of the amplification unit

within LI.26 and LI.84 related sequences. It may be a sequence related to the

consensus 340 bp a-l/a-2 diner (1). This would explain the remarkable coinci-

2070

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 13: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

dence of the cross-over event in LI.84 with the a-l/a-2 unit-boundaries. It

would, however, not explain the tetrameric organization of LI.84 (fig.2),

which suggests that LI.84 as a whole is an amplification unit. It would also

be inconsistent with data of LI.26, which is suggestive of an a-b-b type of

suborganization, and those of Wave and Willard (23) who noticed a similar 79

bp out of phase phenomenon in their BamHI-defined 2.0 kb multimer (fig.6A),

but clearly demonstrated their sequence to be the amplification unit. In a

tandem array of 170 bp sequences, the start-point of any unit is, of course,

arbitrary. A unit-definition based on restriction sites is thus inappropriate.

Given the chromosome specific nature of the discussed sequences, it is reason-

able to propose that different chromosomes carry distinct amplification-units.

The out of phase phenomenon may be explained by the existence of extra-chromo-

somal circular satellite DNA (24). Formation and integration of these circles

through random homologous recombination events can explain both circular per-

mutations and conservation of the 170 bp unit.

Although many sequences within LI.26 and LI.84 resemble restriction sites

In that they contain one or two base changes relative to the true recognition

site, most restriction sites occur with an n(170) bp spacing (fig.5). Assuming

random mutation, this suggests that considerable homogeniration of sequences

is taking place continuously within the array, perhaps through gene conversion

or unequal crossing over, both meiotic and mitotlc (31).

Irrespective of the nature of the basic unit of alphoid DNA amplification,

the 170 bp regularity remains conserved. The existence of more highly conser-

ved regions within each monomer (fig.6) may either be a cause of the regular-

ity, or, alternatively, a consequence of it. Two of the conserved regions we

observed, notably positions 103-111 and 140-148, coincide with the binding

sites II and III of African Green Monkey alpha protein (32). It has been

suggested that "nucleosome phasing" may play a role in conservation of the 170

bp structure in alphoid DNA (27), although other data conflict with this

opinion (33). However, the observation that the alphoid sequences character-

ized to date demonstrate approximately 75Z homology to each other, irrespec-

tive of their chromosomal location, including sequences from the same chromo-

some, suggests a restriction in the degree of divergence permitted, which

remains unexplained at present.

Although we have shown in this study that certain alphoid sequences have

evolved in ways resulting in chromosome specific attributes, it is clear that

other chromosome specific alphoid sequences should be analyzed in order to

establish a general model of alphoid DNA evolution.

2071

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 14: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

ACKNOWLEDGEMENTS.

The authors would like to thank dr. A.M. Millington Ward and dr. G.-J.B.

van Ommen for helpful discussions and reviewing the manuscript and dr. F. Baas

and dr. H. van Ormondt for technical assistance during sequencing procedures

and computer analyses. This work was supported by the Netherlands Cancer

Foundation (Koningin Wilhelmina Fonds Grant nr. A83.21).

REFERENCES.

1. Wu, J.C. and Manuelidis, L. (1980) J. Mol. Biol. 142, 363-386.2. Shimizu, Y., Yoshida, K., Ren, Ch., Fujinaga, K., Rajagopalan and Chinna-

durai, G. (1983) Nature (Lond.) 302, 587-591.3. Shafit-Zagardo, B., Maio, J.J. and Brown, F.L. (1982) Nucl. Acids Res.

10, 3175-3193.4. Houck, CM., Rinehart, F.P. and Schmid, C.W. (1979) J. Mol. Biol. 132,

289-306.5. Maio, J.J., Brown, F.L. and Musich, P.R. (1981) Chromosoma (Berl.) 83,

103-125.6. Manuelidis, L. and Wu, J.C. (1978) Nature (Lond.) 276, 92-94.7. Manuelidis, L. (1978) Chromosoma (Berl.) 66, 1-21.8. Darling, S.M. , Crampton, J.M. and Williamson, R. (1982) J. Mol. Biol.

154, 51-63.9. Manuelidis, L. (1978) Chromosoma (Berl.) 66, 23-32.

10. Willard, H.F., Smith, K.D. and Sutherland, J. (1983) Nucl. Acids Res. 11,2017.

11. Wolfe, J., Darling, S.M., Erickson, R.P., Craig, I.W., Buckle, V.J.,Rigby, P.W.J., Willard, H.F. and Goodfellow, P.N. (1985) J. Mol. Biol.182, 477-485.

12. Devilee, P., Cremer, T., Slagboom, P., Bakker, E., Scholl, H.P., Hager,H.D., Stevenson, A.F.G., Cornelisse, C.J. and Pearson, P.L. Cytogen.Cell Gen., in press.

13. Hofker, M.H., Wapenaar, M.C., Goor, N., Bakker, B., Van Ommen, G.J.B. andPearson, P.L. (1985) Him. Genet. 70, 148-156.

14. Pearson, P.L., Bakker, E. and Flavell, R.A. (1982) Cytogen. Cell Gen. 32,308.

15. Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982) Molecular Cloning, alaboratory manual. Cold Spring Harbor Laboratory.

16. Berbschleb-Voogt, E., Grzeschik, K.-H., Pearson, P.L. and Meera Khan, P.(1981) Hum. Genet. 59, 317-323.

17. Southern, E.M. (1975) J. Mol. Biol. 98, 503-517.18. Jeffreys, A.J. and Flavell, R.A. (1977) Cell 12, 429-439.19. Rigby, P.W.J. et al (1977) J. Mol. Biol. 113, 237-251.20. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci.

USA 74, 5463-5467.21. Messing, J. (1983) Meth. Enzymol. 101, 20-78.22. Biggin, M.D., Gibson, T.J. and Hong, G.F. (1983) Proc. Natl. Acad. Sci.

USA 80, 3963-3965.23. Waye, J.S. and Willard, H.F. (1985) Nucl. Acids Res. 3, 2731-2743.24. Jones, R.S. and Potter, S.S. (1985) Nucl. Acids Res. 13, 1027-1042.25. Manuelidis, L. (1982) In: Genome Evolution, eds. G.A. Dover and R.B.

Flavell, Academic Press, New York, p. 263-285.26. Lee, T.N.H. and Singer, M.F. (1982) J. Mol. Biol. 161, 323-342.

2072

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 15: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

Nucleic Acids Research

27. Wu, K.C., Strauss, F. and Varshavsky, A. (1983) J. Mol. Biol. 170, 93-117.

28. Brutlag, D.L. (1980) Ann. Rev. Genet. 14, 121-144.29. Orgel, L.E., Crick, F.H.C. and Sapienza, C. (1980) Nature (Lond.) 288,

645-646.30. Millington Ward, A.M., Reuser, J.A.M., Scheele, J.Y., Van Lohuizen, E.J.,

Van Gorkum Van Diepen, I.R.M.C, Klasen, E.A. and Bresser, M. (1984) Mol.Gen. Genet. 193, 332-339.

31. Dover, G. (1982) Nature (Lond.) 299, 111-117.32. Strauss, F. and Varshavsky, A. (1984) Cell 37, 889-901.33. Smith, M.R. and Lieberman, M.H. (1984) Nucl. Acids Res. 12, 6493.

2073

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from

Page 16: Sequence heterogeneity within the human alphoid repetitive ... · Volume u Number 5 1986 Nucleic Acids Research Sequence heterogeneity within the human alphoid repetitive DNA family

at Leiden University on A

ugust 17, 2011nar.oxfordjournals.org

Dow

nloaded from