evolutionary analysis of a cluster of atp-binding cassette (abc) genes

14
Evolutionary analysis of a cluster of ATP-binding cassette (ABC) genes Tarmo Annilo, 1 * Zhang-Qun Chen, 2 * Sergey Shulenin, 1 Michael Dean 1 1 Human Genetics Section, Laboratory of Genomic Diversity, NCI-Frederick, Frederick, MD 21702, USA 2 Intramural Research and Support Program, SAIC-Frederick, Laboratory of Genomic Diversity, NCI-Frederick, Maryland 21702, USA Received: 9 August 2002 / Accepted: 17 September 2002 Abstract To study the evolutionary history of ATP-binding cassette (ABC) transporters in mammals, we have characterized a cluster of five ABCA-subfamily genes localized on mouse Chromosome (Chr) 11. The genes, named Abca5, Abca6, Abca8a, Abca8b, and Abca9, are arranged in a head-to-tail fashion in a cluster that spans about 400 kb of the genomic DNA, each gene occupying about 70 kb. The transcripts of these genes contain an open reading frame from 4863 (for Abca8a and Abca8b) to 4929 (for Abca5) nu- cleotides, and have distinct tissue-specific expres- sion pattern. The predicted proteins contain two transmembrane domains and two nucleotide binding domains, arranged similar to the other members of ABCA subfamily. Similarity of both the genomic organization and primary structure among the genes in this cluster suggests that the duplications gener- ating the cluster occurred relatively recently com- pared with most of the ABC genes in present-day mammalian genomes. For instance, the Fugu rubri- pes genome contains an ortholog for only one gene, Abca5, from this cluster. Phylogenetic and compar- ative sequence analysis reveals that after the diver- gence of rodent and primate lineages, at least one gene has been lost in each genome. In addition, we found that both mouse and human clusters show evidence of a number of gene conversions, in several cases involving intron sequences. Introduction The ATP-binding cassette (ABC) proteins utilize the energy of ATP hydrolysis to transport a wide variety of molecules across different cellular membranes (Dean et al. 2001). The substrates translocated by ABC transporters include peptides, lipids, sugars; but also chemotherapeulic drugs and hydrophobic com- pounds (Klein et al. 1999; Kuwano et al. 1999). A typical full-size ABC transporter contains two nu- cleotide-binding folds (NBF) and two transmembrane domains (TMD). The NBF of ABC transporters con- tains an ABC-specific signature sequence located upstream of Walker B motif (Hyde et al. 1990). Each of the TMDs that determine substrate-specificity of the transporter (Loo and Clarke 2001; Stride et al. 1999) typically includes a set of six membrane- spanning helices. More than half of the human ABC genes encode full-transporters. The remainder en- code half-transporters that form homo- or heterodi- mers to comprise a functional transporter. Current annotation classifies 48 human ABC proteins into seven subfamilies (Dean et al. 2001; Klein et al. 1999). The ABCA subfamily contains 12 relatively large full-transporters, several of them with a predicted size greater than 2000 amino acids. Two members of this subfamily have been linked to genetic disorders. The ABCA1 protein is located at the plasma membrane, where it participates in the transport of cholesterol onto high-density lipopro- tein (HDL) particles (Orso et al. 2000; Remaley et al. 1999). Mutations in the ABCA1 gene cause very low HDL plasma levels and buildup of cholesterol in macrophages (Brooks-Wilson et al. 1999; Orso et al. 2000; Remaley et al. 1999). The ABCA4 gene is ex- pressed only in rod photoreceptors, and mutations in this gene are responsible for several retinal dystro- phies (Allikmets 2000; Allikmets et al. 1997). In addition, 12 other ABC proteins are associated with a wide array of human diseases such as cystic fibrosis (Riordan et al. 1989); X-linked adrenoleuko- dystrophy, a neurodegenerative and adrenal defi- ciency disorder in late childhood (Mosser et al. 1993); several forms of progressive familial intrahepatic DOI: 10.1007/s00335-002-2229-9 Volume 14, 720 (2003) •Ó Springer-Verlag New York, Inc. 2003 7 *These authors contributed equally to this study. Sequence data from this article have been deposited with the DDBJ/EMBL/Gen- Bank Data Libraries under accession numbers AF491299, AF491842, and AF498360AF498362. Correspondence to: M. Dean, E-mail: [email protected]

Upload: cancer

Post on 04-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Evolutionary analysis of a cluster of ATP-binding cassette (ABC)genes

Tarmo Annilo,1* Zhang-Qun Chen,2* Sergey Shulenin,1 Michael Dean1

1Human Genetics Section, Laboratory of Genomic Diversity, NCI-Frederick, Frederick, MD 21702, USA2Intramural Research and Support Program, SAIC-Frederick, Laboratory of Genomic Diversity, NCI-Frederick, Maryland 21702, USA

Received: 9 August 2002 / Accepted: 17 September 2002

Abstract

To study the evolutionary history of ATP-bindingcassette (ABC) transporters in mammals, we havecharacterized a cluster of five ABCA-subfamilygenes localized on mouse Chromosome (Chr) 11.The genes, named Abca5, Abca6, Abca8a, Abca8b,and Abca9, are arranged in a head-to-tail fashion in acluster that spans about 400 kb of the genomic DNA,each gene occupying about 70 kb. The transcripts ofthese genes contain an open reading frame from 4863(for Abca8a and Abca8b) to 4929 (for Abca5) nu-cleotides, and have distinct tissue-specific expres-sion pattern. The predicted proteins contain twotransmembrane domains and two nucleotide bindingdomains, arranged similar to the other members ofABCA subfamily. Similarity of both the genomicorganization and primary structure among the genesin this cluster suggests that the duplications gener-ating the cluster occurred relatively recently com-pared with most of the ABC genes in present-daymammalian genomes. For instance, the Fugu rubri-pes genome contains an ortholog for only one gene,Abca5, from this cluster. Phylogenetic and compar-ative sequence analysis reveals that after the diver-gence of rodent and primate lineages, at least onegene has been lost in each genome. In addition, wefound that both mouse and human clusters showevidence of a number of gene conversions, in severalcases involving intron sequences.

Introduction

The ATP-binding cassette (ABC) proteins utilize theenergy of ATP hydrolysis to transport a wide variety

of molecules across different cellular membranes(Dean et al. 2001). The substrates translocated byABC transporters include peptides, lipids, sugars; butalso chemotherapeulic drugs and hydrophobic com-pounds (Klein et al. 1999; Kuwano et al. 1999). Atypical full-size ABC transporter contains two nu-cleotide-binding folds (NBF) and two transmembranedomains (TMD). The NBF of ABC transporters con-tains an ABC-specific signature sequence locatedupstream of Walker B motif (Hyde et al. 1990). Eachof the TMDs that determine substrate-specificity ofthe transporter (Loo and Clarke 2001; Stride et al.1999) typically includes a set of six membrane-spanning helices. More than half of the human ABCgenes encode full-transporters. The remainder en-code half-transporters that form homo- or heterodi-mers to comprise a functional transporter.

Current annotation classifies 48 human ABCproteins into seven subfamilies (Dean et al. 2001;Klein et al. 1999). The ABCA subfamily contains 12relatively large full-transporters, several of themwith a predicted size greater than 2000 amino acids.Two members of this subfamily have been linked togenetic disorders. The ABCA1 protein is located atthe plasma membrane, where it participates in thetransport of cholesterol onto high-density lipopro-tein (HDL) particles (Orso et al. 2000; Remaley et al.1999). Mutations in the ABCA1 gene cause very lowHDL plasma levels and buildup of cholesterol inmacrophages (Brooks-Wilson et al. 1999; Orso et al.2000; Remaley et al. 1999). The ABCA4 gene is ex-pressed only in rod photoreceptors, and mutations inthis gene are responsible for several retinal dystro-phies (Allikmets 2000; Allikmets et al. 1997).

In addition, 12 other ABC proteins are associatedwith a wide array of human diseases such as cysticfibrosis (Riordan et al. 1989); X-linked adrenoleuko-dystrophy, a neurodegenerative and adrenal defi-ciency disorder in late childhood (Mosser et al. 1993);several forms of progressive familial intrahepatic

DOI: 10.1007/s00335-002-2229-9 • Volume 14, 7–20 (2003) • � Springer-Verlag New York, Inc. 2003 7

*These authors contributed equally to this study. Sequence datafrom this article have been deposited with the DDBJ/EMBL/Gen-Bank Data Libraries under accession numbers AF491299,AF491842, and AF498360–AF498362.

Correspondence to: M. Dean, E-mail: [email protected]

cholestasis, a defect of bile component transport(Deleuze et al. 1996; Strautnieks et al. 1998); andsitosterolemia, a sterol transport disorder (Berge etal. 2000). In addition, several ABC genes are crucialto the development of drug-resistant tumors (Kuw-ano et al. 1999; Litman et al. 2001), a serious prob-lem for chemotherapy. While most of the ABC genesare dispersed in the mammalian genome, five ABCAgenes (ABCA5, ABCA6, and ABCA8–10) form aunique cluster on human Chr 17q24 (Arnould et al.2001). The only other known cluster of more thantwo genes is the cluster of three Abcb/Mdr genes onmouse Chr 5A1. Since the Abcb/Mdr genes haveevolved as a protection from naturally occurringtoxic compounds, their amplification may widen therange of actively excreted xenobiotics. The functionof the five clustered ABCA genes is unknown. Herewe describe the comparative analysis of the humanand mouse cluster of five ABCA subfamily geneswith the purpose of better understanding the evolu-tionary history and biological function of ABCAsubfamily transporters.

Materials and Methods

cDNA cloning and rapid amplification of cDNAends (RACE). Searches of public databases revealedseveral ESTs encoding possible novel mouse ABCAproteins. IMAGE clones 1890581, 1431341, and761201 were obtained and sequenced. Liver Mara-thon-ready cDNA, mouse multiple tissue cDNApanel, and Advantage 2 polymerase mix (all fromClontech, Palo Alto, Calif.) were used to clone thefull-length cDNAs. Oligonucleotide primers werepurchased from Life Technologies. PCR primerswere designed according to EST and predicted exonsequences. Primers atgF (5¢-GGCAACATGATCAAGAGAGAGAT-3¢), F1 (5¢-CAGGTTTTTGCTAAAATAAGAGG-3¢), R1 (5¢-CCTCTTATTTTAGCAAAAAGCCTG-3¢), F3 (5¢-GCTATGCCATGTCAGTTATTTTCA-3¢), R3 (5¢-CTTGCGATAAATGAATGAAATCAC-3¢) and AP1 (Clontech) were used for thecloning of Abca8b cDNA. Fragments were assem-bled by using the Seqman (DNAStar) program.Primers for Abca5 were: Abca5-F01 (5¢-AATTTCCGAGCTCCGTCACTTAC-3¢) and Abca5-R4 (5¢-GGCTGGGTGAGCATAAACATTCT-3¢). Primers forAbca9 were: Abca9-F1 (5¢-AAGTTCAGGATGAGAAGAGAGACC-3¢) and Abca9-R4 (5¢-CTCTTCCTGTGGGAGGAGCTTC-3¢).

Rapid amplification of cDNA ends (RACE) wasperformed with Marathon-Ready mouse liver cDNA(Clontech) according to the manufacturer’s protocol.Primers used for 5¢ RACE were: Mabca5_R01 (5¢-TGTTCTGGTCTGTCTCCAAACT-3¢), Mabca6_R01

(5¢-CTTGAGCAGAATCTTGTGCAGAAG-3¢), Ma-bca8a_R01, (5¢-TAAGTAGTATAAGTGCTGTGTACA-3¢), Mabca8b_R01, (5¢-GACTCCCTTTTCAGTCTCCATTTC-3¢), and Mabca9_R01, (5¢-ATGACTCTCACTGCATCTATCGAG-3¢). The PCR productswere purified by using GFX columns (AmershamPharmacia Biotech Inc) and were cloned into pCR2.1-Topo vector (Invitrogen) for sequencing. Sequencingreactions were performed with a DNA sequencingKit (Applied Biosystems) and were analyzed onan ABI 373A automated sequencer. GCG, BLAST(http://www.ncbi.nlm.nih.gov/BLAST), and BLAT(http://genome.ucsc.edu/) programs were used formanipulating the sequences and determining exon/intron structure.

Radiation hybrid mapping. Mouse ABC geneswere mapped by using the T31 mouse/hamster ra-diation hybrid panel (Research Genetics). PCR wasperformed with Amplitaq Gold Taq polymerase(Applied Biosystems) by using primers specific forAbca6 (Abca6-F, 5¢-TAGGACTGCATATGGGTCTG-3¢; Abca6-R 5¢-GTTCATTATCTGCTGGGTTATG-3¢) and Abca8b (Abca8b-F, 5¢-TGGAGACTGAAAAGGGAGTC-3¢; Abca8b-R, 5¢-GTTGCAGCTACCTTCTCCAT-3¢). Samples were heated at 94�C for10 min, followed by 35 cycles at 94�C for 30 s, 65�Cfor 15 s, 72�C for 30 s, and extended at 72�C for 5min. Reaction products were resolved on a 1.2% TAEagarose gel, and data were submitted to The JacksonLaboratory Mouse Radiation Hybrid Database athttp://www.jax.org/.

Computer analysis. Human and mouse genomicsequences were compared by using Pipmaker (http://nog.cse.psu.edu/pipmaker/) (Schwartz et al. 2000).Sequences were obtained from public (NCBI) andprivate (Celera) databases. From public databases, weused human BAC clones AC015844, AC005922,AC005495, and AC007763, localized at the 17q24.2.For the mouse contig, we used the public sequenceof BAC clones AC021574, AC023441, AC023939,AC024966, AL603792, AL662821, and AL671964.Repeat elements were masked by Repeatmasker(http://ftp.genome.washington.edu/cgi-bin/Repeat-Masker). For phylogenetic analysis, complete aminoacid sequences were aligned with CLUSTALW. Gapsin the alignment were removed with Bioedit, and1530 amino acid sites were used to generate theneighbor-joining tree by using MEGA2 (Kumar et al.2001). Stability of branches was tested with 1000bootstrap resamplings. The tree was visualized withTreeview (Page 1996). The proportions of synony-mous and non-synonymous substitutions were cal-culated by the Pamilo-Bianchi-Li method with the

8 T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION

MEGA2 program (Kumar et al. 2001). CpG islandswere identified with program CpGPlot/CpGReport(http://www.ebi.ac.uk/emboss/cpgplot/) with thefollowing parameters: Window size 400, Obs/Exp0.6, MinPC 50, and Length 200 or 500. Ratio of ob-served over expected frequency of CpG dinucleotideswas calculated as follows: CpGo/e = fCpG / fC · fG,where fC and fG are the frequencies of respectivenucleotides and fCpG is the frequency of CpG dinu-cleotide. Statistical tests of gene conversion eventswere performed by using the program GENECONVVs. 1.81 developed by Stanley Sawyer (http://www.math.wustl.edu/�sawyer). Accession numbersfor human sequences were: ABCA5, NM_018672;ABCA6, NM_080284; ABCA8, NM_007168;ABCA9, NM_080283; ABCA10, NM_080282.

mRNA expression analysis. Real-time PCR wasperformed with the ABI PRISM 7700, with SYBRGreen Quantitect kit (QIAGEN Inc.) and the mousemultiple tissue cDNA panel (Clontech) according tothe manufacturers recommendations. The primersfor real-time PCR were as follows: for Abca5, 5¢-TTGGCTTGTGTGGCAATCAC-3¢ (forward) and 5¢-CTGCAGGTAGGGCATGATTA-3¢ (reverse); forAbca6, 5¢-TAGGACTGCATATGGGTCTG-3¢ (for-ward) and 5¢-GTTCATTATCTGCTGGGTTATG-3¢(reverse); for Abca8a, 5¢-GACTCCACCTTTATGATTGTG-3¢ (forward) and 5¢-GTTGGTGAATATGACCCCCA-3¢ (reverse); for Abca8b, 5¢-ATGGAGCTACTGACCTGTCT-3¢ (forward) and 5¢-GTTGCAGCTACCTTCTCCAT-3¢ (reverse); for G3PDH, 5¢-ATGGGTGTGAACCACGAGAA-3¢ (forward) and 5¢-ATGGCATGGACTGTGGTCAT-3¢ (reverse). The size ofPCR products was in the range from 145 bp to 220bp. All PCR products were sequenced to confirm thespecificity of the amplification. cDNAs from Clon-tech Multiple Tissue Panel were diluted 10 times,and 2.5 lL of diluted cDNA was used in a 50-lLreaction mixture containing primers at 0.3 lM andMgCl2 at 2.5 mM concentration. Real-time PCR wasperformed at 50�C for 2 min, 95�C for 15 min; 95�Cfor 15 s, 56�C for 30 s, 72�C for 30 s, for 40 cycles,followed by 50�C for 15 s hold in order to suppressany fluorescence readings caused by the generationof primer-dimers.

For Northern analysis, an Abca9-specific probecorresponding to nucleotides 320...1666 was gener-ated by using primers Abca9-F1 (5¢-AAGTTCAGGATGAGAAGAGAGACC-3¢) and Abca9-R1 (5¢-ACTGGTTCCAGTGAGTCATTCAG-3¢). The probewas radioactively labeled with Ready-To-Go DNALabelling Beads and [a-P32]dCTP (Amersham/Phar-macia Biotech). The mouse multiple tissue RNAblot (Clontech), containing 2 lg of poly(A)+ RNA on

each lane, was hybridized according to the manu-facturer’s recommended protocol.

Results and Discussion

Characterization of five novel mouse Abca genes.By performing database searches, we identified sev-eral new mouse EST sequences that were similar toknown ABC transporters and appeared to representpreviously uncharacterized genes. We combined theclustering of ESTs with the analysis of public andprivate genomic DNA sequences to predict exon-intron structure of the new genes. Further analysis ofthis locus identified a total of five new mouse genesthat form a head-to-tail cluster covering about 400kb of the genomic DNA. From this information, wedesigned primers for PCR and RACE to characterizethe full-length cDNAs. The genes were namedAbca5, Abca6, Abca8a, Abca8b, and Abca9 ac-cording to the closest known human homolog. ForAbca6 cDNA cloning, we used the partial ORF oftwo clones, EST1890581 (AI265315) and EST1431241 (AA985977). Full-length Abca8b sequence wasobtained based on the EST761201 (AA387878).Coding regions for three remaining genes, Abca5, 9,and 8a, were also predicted in silico in a similar wayand then were confirmed experimentally. The exon-intron structure was determined by aligning cDNAto the genomic sequence. Characteristics of thenovel genes are shown in Table 1. Eighteen exonsout of 39 are identical in size in all five genes, andwith the exception of the untranslated part of thecDNA, only one exon, 27, has a different size inevery gene. In addition, all differences in exon sizesare multiples of 3 bp. This indicates that splice sitesare more conserved than exon sizes, and only thosenucleotide insertions and deletions that do not dis-rupt the open reading frame within the establishedsplicing pattern have been fixed in the course ofevolution.

Using a mouse radiation hybrid panel, we de-termined the chromosomal location of Abca6 andAbca8b. Both genes were found to be located be-tween the markers D11Mit336 and D11Mit100 onChr 11. Mapping data provide independent supportto the clustered arrangement of these genes.

With 5¢-RACE, an untranslated exon upstream ofthe coding region was identified for four of the fivegenes. The size of the first intron, which separatesthis non-coding exon from the first coding exon,varies to a great extent, being about 1.1 kb in thecase of Abca6, but as large as 14 kb in Abca8b.

All five genes have 38 coding exons, but encodeproteins with slightly different sizes (Table 2).Alignment of the predicated amino acid sequences is

T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION 9

Tab

le1.

Th

ege

nom

icorg

aniz

atio

nof

five

clu

ster

edm

ou

seA

bca

gen

es.

Ab

ca5

Ab

ca6

Ab

ca9

Ab

ca8a

Ab

ca8b

Exon

nu

mb

er

Exon

(bp)

Intr

on

(kb

)E

xon

(bp)

Intr

on

(kb

)E

xon

(bp)

Intr

on

(kb

)E

xon

(bp)

Intr

on

(kb

)E

xon

(bp)

Intr

on

(kb

)

1>191

8.2

>157

1.1

>200

1.9

n.d

.n

.d.

>205

>14

2120

1.2

107

1.9

146

2.5

102

1.4

102

0.6

2c

102

96

96

96

96

3205

1.0

205

0.9

208

2.6

205

3.4

205

0.8

4162

6.6

159

1.9

165

1.3

165

2.2

162

2.0

589

1.8

104

0.8

104

0.5

104

0.6

104

1.1

6230

3.0

227

2.4

227

2.8

227

3.9

227

1.3

7142

1.0

142

4.7

142

0.5

142

0.3

142

0.3

8189

1.8

186

2.7

186

2.2

186

2.9

186

0.7

9148

1.0

148

1.4

148

3.0

142

1.3

148

1.8

10

169

6.9

169

1.5

169

1.4

166

2.1

169

1.6

11

59

3.0

59

>10

59

0.2

59

0.3

59

0.6

12

111

1.2

111

0.3

111

1.2

111

0.9

111

1.3

13

176

6.4

176

0.2

176

0.3

I76

0.4

176

0.4

14

120

9.6

120

0.2

120

0.2

120

0.2

120

0.2

15

139

1.2

139

0.5

139

0.4

139

0.5

139

1.6

16

91

1.8

91

1.5

91

2.5

91

0.6

91

2.2

17

140

1.4

140

2.3

140

1.9

140

2.7

140

0.7

18

120

1.6

117

1.6

120

1.2

120

1.9

120

4.5

19

202

1.4

184

0.3

196

1.9

199

2.9

196

1.9

20

170

2.5

167

1.4

167

0.5

167

1.3

167

1.4

21

128

0.4

134

1.2

134

1.0

134

1.1

134

0.9

22

138

1.0

138

6.1

138

2.3

138

0.7

138

1.8

23

114

2.1

108

5.3

108

1.0

108

1.4

108

0.4

24

171

1.6

174

0.2

174

3.2

174

1.3

174

0.1

25

114

0.3

114

4.7

114

6.3

114

1.9

114

1.2

26

135

0.7

120

1.3

120

2.5

120

8.0

120

0.7

27

75

0.9

78

1.4

69

0.8

66

0.7

63

0.4

28

92

5.9

92

0.6

92

1.5

92

1.4

92

3.3

29

127

0.2

121

0.6

121

1.1

121

2.0

121

0.5

30

118

0.1

118

2.3

118

0.7

118

1.8

118

0.6

31

92

0.7

92

0.3

92

2.7

92

4.8

92

2.2

32

176

0.5

176

1.0

161

1.4

161

0.4

161

0.5

33

76

0.1

76

0.1

76

0.1

76

0.1

76

0.1

34

95

0.8

95

2.2

95

1.0

95

0.4

95

0.8

35

120

1.0

120

0.1

120

0.2

120

0.2

120

0.2

36

150

0.8

141

1.3

141

0.5

141

1.7

141

0.5

37

80

0.9

80

1.5

80

4.0

80

1.2

80

0.9

38

56

0.7

56

0.1

56

0.4

56

0.4

56

0.1

39

>317

>251

>1900

>205

>340

39c

108

102

99

99

102

Row

sw

ith

exon

nu

mber

s2c

and

39c

des

ign

ate

the

codin

gport

ion

of

the

resp

ecti

ve

exon

.If

the

size

of

the

exon

isco

nse

rved

inat

leas

tth

ree

gen

es,it

issh

ow

nin

bold

face

.n

.d.,

not

det

erm

ined

.

10 T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION

shown on Fig. 1. All five proteins are full-transport-ers, containing two highly conserved NBFs and twoTMDs. However, according to the secondary struc-ture prediction by using the TMHMM program, thefirst transmembrane helix of each TMD is separatedfrom the set of other helices by a large extracellular/exocytoplasmic loop. Analysis of the location oftransmembrane helices of the other members ofABCA subfamily indicates that they may also have asimilar structural arrangement. Indeed, the presenceof the large extracellular/exocytoplasmic domain hasbeen experimentally supported for ABCA1 (Tanakaet al. 2001) and ABCA4 (Bungert et al. 2001). Thisloop is larger at the N-terminal half of the protein,and in the case of proteins identified in this study, ithas a size about 170 aa. For ABCA1 and ABCA4, theN-terminal loop is approximately 600 aa. Since thevery N-terminal sequence (amino acids 1–50) con-taining the first transmembrane helix is conserved inall ABCA subfamily proteins, the length of the firstlarge extracellular loop accounts for most of thevariability of the overall protein size in this sub-family.

Comparative analysis of human and mouseABCA cluster. To compare the genomic organiza-tion of ABCA cluster between the two mammalianspecies, we assembled a contig of mouse genomicDNA totaling 408,875 bp as well as a 467,862-bpcontig of human DNA by using sequences frompublic and private databases. Comparative analysisbetween the mouse Abca cluster and the corre-sponding human syntenic locus was performed withthe Pipmaker program (Schwartz et al. 2000). Withthe default settings, a typical result for a cluster ofrelated genes was obtained, where each gene alignswith every other gene in the cluster. To eliminatethese multiple matches, the ‘‘chaining’’ or ‘‘singlecoverage’’ option was applied (Fig. 2A). The resultingdot-plot shows two breaks in the alignment, sug-gesting lineage-specific gene loss at these positions.Otherwise, homology can be found along the wholecluster. While identity at the coding region rangesfrom 76% for Abca6 to 86% for Abca5, the non-re-

petitive portions of the introns also show a detect-able degree of identity.

The overall G+C content in the ABCA clusterregion is similar in two species (37.8% in mouse and36.4% in human), but lower than the human ge-nome-wide average of 41% (Lander et al. 2001). Inaccordance with general under-representation ofCpG dinucleotides, the average CpG percentage(0.8% for mouse and 0.7% for human) is about one-fifth of what would be expected (3.6% for mouse and3.3% for human) based on these G+C fractions.

The 5¢ ends of all housekeeping genes and manytissue-specific genes are located within CpG islands,where the occurrence of CpG dinucleotides is closerto the predicted frequency (Cross and Bird 1995). Weperformed a search of CpG islands with the CpG-Report program. We found only one conserved non-repeat CpG island over the 500-bp length threshold.This CpG island surrounds the first, non-coding ex-on of the ABCA5/Abca5 gene starting 817 bp up-stream of the first splice-donor site in human, 658 bpin mouse, and extending about 500 bp into the firstintron.

The length of this CpG island, as calculated byusing a 500-bp threshold and parameters specified inMaterials and methods, is 1346 bp in human and1162 bp in mouse. The G+C content of this island is63% and 66%, and the ratio of observed over expectedCpG dinucleotides is 0.96 and 0.85, in mouse andhuman, respectively.

In mouse, the first exon of Abca8b is also locatedwithin a CpG-rich sequence, but this region is onlyabout 200 bp in length and, compared with theAbca5-associated CpG island, has a much lowerG+C content and observed-to-expected CpG ratio(56% and 0.74, respectively). In addition, the ho-mologous region in the human, flanking the firstexon of ABCA8, is CpG-depleted (observed/expectedratio 0.2) despite a 68% identity between the twospecies.

This raised the question whether the Abca8b-associated CpG island is in the process of decaying,and maybe the homologous region in ABCA8 alsocontained a CpG island relatively recently. Since the

Table 2. Comparison of the structural characteristics of the clustered Abca genes with Abca1.

Gene Gene size (kb) Coding exons Amino acidsClosest human homolog and identity/similarity at the amino acid level

Abca1 124 49 2261 ABCA1: 94/96%Abca5 65 38 1642 ABCA5: 89/93%Abca6 79.5 38 1624 ABCA6: 66/80%Abca8a 66 38 1620 ABCA8: 67/78%Abca8b 61 38 1620 ABCA8: 75/84%Abca9 67 38 1623 ABCA9: 78/87%

Accession numbers for Abca1 were: mouse gene, AF287263; mouse protein, P41233; human protein, AAK43526.

T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION 11

Fig. 1. (Continued)

12 T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION

deprivation of CpG dinucleotides is caused by theconversion of methylated CpG to TpG and CpA di-nucleotides, we tested whether we can detect over-representation of TpG and CpA dinucleotides in thisregion. Indeed, we found that in this particular re-gion of Abca8b, TpG dinucleotides are found 1.3times and CpA dinucleotides 1.5 times more often

than expected, based on frequencies of every singlenucleotide. In the homologous region of ABCA8, theobserved/expected ratios for TpG and CpA dinucle-otides are even higher, 1.7 and 1.3, respectively. Al-though an overrepresentation of TpG and CpAdinucleotides does not prove that they derive fromconversion of CpG dinucleotides, it supports the

Fig. 1. The amino acid alignment of five novel mouse ABC proteins with human ABCA5. Identical residues are shaded inblack, similar residues in gray. Dashes represent gaps. The Walker A, B, and ABC signature motifs are underlined andlabeled respectively. Transmembrane regions are indicated by helices below the alignment. Since prediction of individualmembrane-spanning helices varies between different proteins, only the well-resolved first segment of each TMD is shownseparately.

T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION 13

hypothesis that CpG island located at the 5¢ end ofAbca8b is decaying, and the corresponding region inABCA8 has experienced erosion of formerly existingCpG island. The decay of the CpG island may be thereflection of the undergoing change in the expressionpattern of ABCA8 and Abca8b genes.

One repeat-associated CpG island, 743 bp inlength (G+C content 66% and observed/expected ra-tio 0.63), was identified within intron 12 of the hu-man ABCA10 gene. The repetitive nature of thissequence was confirmed by Repeatmasker analysisand also by the great number of hits that resultedfrom similarity searches against the human genomicsequence.

Phylogenetic analysis To examine the evolu-tion of the ABCA subfamily, we aligned 14 ABCAsubfamily proteins from human and mouse (five ofthem described in this study), and 1530 amino acidsites were used for phylogenetic analysis. A neigh-bor-joining tree (Fig. 4B) shows that the Chr 17/Chr11 clustered genes form a well-defined clusterat the phylogenetic level. Although both humanand mouse clusters contain five ABCA subfamilymembers, not all of them have a one-to-one ortho-logous relationship.

Primary structure comparison and phylogeneticanalysis suggest the following scenario in the gen-eration of this cluster: The first gene duplication ofABCA5, the oldest gene in this cluster, generated theancestor of all other genes in the cluster, which bythe second duplication generated ABCA6/10 andABCA8/9 ancestors. After the following duplicationthat generated ABCA8 and ABCA9, another dupli-cation of ABCA8 occurred not long before the di-vergence of the primate and rodent lineages,producing two ABCA8-like genes (ABCA8A andABCA8B). After the divergence of the primate androdent lineages, the Abca8a ortholog was lost in theprimate genome, and present-day ABCA8 is actuallythe ortholog of rodent Abca8b. In addition, only thehuman genome contains ABCA10 at the presenttime. Since phylogenetic analysis dates the origina-tion of ABCA10 before the split of the primate androdent lineages, the respective ortholog has mostprobably been lost from the rodent genome. Thesegene loss events have resulted in the present set ofgenes where both species have one gene (ABCA10 inhuman and Abca8a in mouse) whose ortholog isabsent in the other species.

Recently, the first Fugu rubripes genome as-sembly became available (http://www.jgi.doe.gov/

Fig. 2. The gene content of the mouse and human ABCA cluster is different. (A) Dot plot comparison of human (x-axis) andmouse (y-axis) genomic DNA harboring cluster of ABCA genes. Except at the positions where gene loss in one lineage hascreated a break in alignment, conservation of the genomic sequence is observed across the entire locus. (B) The phylo-genetic relationship of 15 human, mouse, and Fugu proteins from ABCA subfamily. The scale bar indicates an evolu-tionary distance in substitutions per amino acid. Bootstrap values (in percentage) out of 1000 replications are indicated.Alignment was generated with CLUSTALW, edited manually by using BioEdit, and subjected to neighbor-joining analysis.

14 T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION

fugu/index.html). To find the potential fish ortho-logs of the human Chr 17 genes, we performed asearch against the Fugu genome and found thatABCA5 is the only gene from this cluster that has anortholog in the Fugu rubripes genome, located onscaffold 2102. This supports the hypothesis thatABCA5 is the founder gene of the cluster and thatthe expansion of the cluster occurred after the splitof the teleost lineage.

It is interesting to mention that ABCA5 is alsothe most conserved protein within this cluster withan identity of 89% between mouse and human.Comparison of proteins shows a different degree ofconservation (Table 2) between different pairs of or-

thologs: while ABCAl/Abcal peptide sequences are94% identical in mouse and human, ABCA9/Abca9and ABCA8/Abca8b are only 78% and 75% identical,respectively.

Gene conversion among ABCA genes. Whilerecently duplicated genes may accumulate substi-tutions and diverge more rapidly than other genes inmultigene families, gene conversion is a processthat homogenizes the sequences. If a region or exonof a multigene family member is more closely re-lated to another gene in the same species than tothe orthologous gene in different species, this couldbe the result of a gene conversion event, where a

Fig. 3. Phylogenetic analysis depictingdifferent relationships of full ABCAcDNAs and nucleotidc-binding domains.Neighbor-joining trees were constructedby using Kimura 2 parameter distances forfull coding sequences (A) or exons 11–15,corresponding to the nucleotide-bindingdomain 1 (B) and exons 30–35, corre-sponding to the nucleotide-bindingdomain 2 (C).

T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION 15

region of one gene has been replaced by the DNAsequence from another gene (Ohta 2000; Slightomet al. 1980). If the converted region is relativelylarge, this may result in ambiguous relationships

among gene family members (White and Crother2000). When phylogenetic analysis is performedwith full cDNAs, sequences form orthologous pairs,with ABCA8 showing the closest relationship to

Table 3. Gene conversion events between the cDNAs transcribed from the human and mouse ABCA cluster as identifiedby the program GENECONV.

Gene BC KA Sequence 1 Sequence 2

names P value Begin End Begin End Length Poly Dif

Abca6;Abca9 1.60e-07 4113 4174 4113 4174 62 37 0Abca6;Abca9 1.53e-07 4093 4193 4093 4193 101 64 5Abca8a;Abca8b 3.60e-13 1871 2036 1901 2066 166 94 1Abca8a;Abca8b 1.05e-11 1759 2116 1789 2146 358 207 24Abca8a;Abca8b 2.70e-07 4194 4280 4191 4277 87 55 0Abca8a;Abca8b 1.48e-07 4194 4304 4191 4301 111 66 1ABCA8;ABCA9 8.91e-04 4189 4375 4198 4384 187 111 5ABCA8;ABCA9 1.41e-02 4498 4621 4507 4630 124 79 3ABCA8;ABCA9 1.25e-20 3964 4867 3973 4876 904 579 66Abca8a;Abca9 7.16e-05 4540 4652 4549 4661 113 77 5

Option gscale was used to allow different number of mismatches between converted regions. Nucleotide numbering starts from the ATGstart codon. BC KA Pvalue, Karlin-Altschul P-values that are Bonferroni-corrected for multiple sequence comparisons; Sequencel andSequence2, sequences identified in the first column, ‘‘Gene names’’; Begin, first nucleotide of the converted region; End, last nucleotide ofthe converted region; Length, length of the converted region in nucleotides; Poly, number of polymorphic sites within the identifiedfragment in the multiple alignment; Dif, number of differences between the two sequences within the identified fragment.

Fig. 4. (Continued)

16 T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION

Abca8b (Fig. 3A). But the relationship of NBD se-quences is different. Two human genes, ABCA8 andABCA9, in NBD2 (exons 30–35), and two mousegenes, Abca8a and Abca8b, in both NBD1 (exons11–15) and NBD2, show closer relationship to eachother than any of them to the sequence from thesame locus from a different species. Such disparityprompted us to systematically analyze potentialgene conversions between tandemly clusteredABCA genes. Indeed, several regions with statisti-cally significant similarity supporting a hypothesisof gene conversion were detected by analyzing thecoding sequences with the program GENECONV(http://www.math.wustl.edu/�sawyer; Table 3).With a purely statistical approach, the boundaries ofthe detected gene conversion regions vary depend-ing on the number of mismatches allowed. Directcomparison of genomic sequence was used to fur-

ther delineate the boundaries of converted regions(which in a few cases involved intronic sequences).The high identity in non-coding regions aroundsome converted exons supports the hypothesis thatgene conversion rather than selection accounts forthe homogenizalion of the ABCA genes in thiscluster.Abca6 and Abca9: Between Abca6 and Abca9, a 101-bp fragment of exon 32 containing five mismatchesshow highly significant evidence of conversion. Or-thologous sequences differ at 19 and 18 positions inthis region for Abca6 and Abca9, respectively. Whenno mismatches are allowed, the identified region isreduced to 62 nucleotides. In contrast, both ortholo-gous pairs have nine differences within this 62-bpsequence.Abca8a and Abca8b: Two different regions showingstrong evidence of gene conversion were identified

Fig. 4. Alignment of two genomic regions where paralogous genes show greater similarity than orthologous pairs, sug-gesting gene conversion between the paralogous genes. (A) Region surrounding exons 33 and 34. (B) Exon 38, intron 38 andpart of the exon 39. Coding DNA is in uppercase, non-coding DNA in lowercase. Sequences that are highly similarbetween the two genes in one organism as a result of putative conversion are shaded. Double slashes indicate the regionwhere a large and very diverged region of the genomic DNA was removed from the alignment.

T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION 17

for these mouse paralogs. One region includes exons14 to 16, and the second region starts at exon 33.The region of highest identity surrounds exon 15and includes 8 bp upstream and 35 bp downstreamof the exon itself, for a converted region of 201 bp,with only one nucleotide difference between thetwo genes. Interestingly, this is a non-synonymoussubstitution in the codon that specifies leucinc inAbca8a and phenylalanine in Abca8b. However,since all other proteins from this cluster have one ofthese two amino acids in that position, the substi-tution can be considered conservative. Althoughidentity in exons 14 and 16 is not as high as in otherconverted regions (88.5% and 90%, respectively,compared with 99% in exon 15 and 100% in exon33), Abca8a and Abca8b are still the most con-served gene pair in exon 14 and are related moreclosely to each other than to ABCA8 in exon 16.This could reflect relatively old conversions events;the sequences have accumulated ten synonymousand four non-synonymous substitutions in exon 14,and five synonymous and non-synonymous substi-tutions in exon 16. Around exon 33, the putativeconverted region also includes 87 bp of the preced-ing intron (82% identity vs. 65% identity betweenAbca8b and ABCA8) and a short intron betweenexons 33 and 34 (90% vs. 55% identity in 91 bpbetween Abca8b and ABCA8). Alignment of therespective genomic sequence is shown in Fig. 4A.Since exon 34 does not exhibit the same degreeof identity between Abca8a and Abca8b, the con-verted region probably extends only partially intothis exonABCAS and ABCA9: Several exons at the 3¢ end ofthese genes show evidence of gene conversion. Themost obvious is the region containing exons 33 and34, but also exon 36. GENECONV identifies exons33 and 34 (Fig. 4A) as possibly converted regionswith five differences and most of exon 36 with threedifferences. Allowing more mismatches identifies aneven larger region, starting in the middle of exon 31and extending to the end of the coding region (7.3%differences in 904 bp). As in the case of Abca8a andAbca8b, intron 33 (73 bp in ABCA8 and ABCA9) isalso involved in conversion (Fig. 4A), showing thesame degree of identity as exons (96%, 98%, and 96 %for exons 33 and 34 and intron 33, respectively). Inexon 36, ABCA8 and ABCA9 are 96.5% identical(five differences), while identity with mouse ortho-logs is only 87% and 83% for ABCA8 and ABCA9,respectively.

Another region where intronic sequences havealso been involved in conversion includes the lastcxons (38 and 39) of ABCA8 and ABCA9 (Fig. 4B).The short intron (about 120 bp) between these exons

has 85% identity. However, the converted regiondoes not extend into the 3¢ non-coding region.

In exon 32, the identity between ABCA8 andABCA9 is not as high (90%) but still higher thanfor orthologous pairs (78% and 83% for ABCA8 andABCA9, respectively). In exon 35, these two hu-man genes share more than 90% identity, but thisis not much higher than the identity of ABCA9orthologs (87%). In exon 37, sequence comparisondoes not suggest that it has been involved inconversion.Abca9 and Abca8a: Part of exon 36 shows apparentgene conversion between Abca9 and Abca8a, wherethese paralogs share 94% identity in 128 bp com-pared with 84% between ABCA9/Abca9 orthologs.

A number of gene conversions have been re-ported in the well-characterized globin gene family,which is a well-studied model for the evolution of amultigene family (Papadakis and Patrinos 1999;Slightom et al. 1980). Gene conversion is thoroughlystudied also in the major histocompatibility com-plex genes (Marlinsohn et al. 1999). A critical anal-ysis of the methods to detect gene conversions hasrecently been published by Drouin et al. (1999). Inthe present study, we applied the phylogeneticanalysis to different regions of coding sequences ofABCA subfamily cluster genes from mouse andhuman and showed that trees produced by twoNBDs are different from each other as well as fromthe tree obtained with full-length cDNA (Fig. 2 A).Then, the statistical analysis was performed to findunusually long regions of identity between the par-alogous genes (Table 3). Finally, by comparing thegenomic sequences we identified three short introns(two in human and one in mouse) that have probablybeen converted as a block along with the flankingexons (Fig. 4A and 4B).

Tissue-specific expression pattern of mouseAbca genes. Figure 5 shows the tissue-specific ex-pression pattern of five mouse Abca genes. Thehighest expression of Abca5 was found in testis. ForAbca6, the highest expression was observed in liver,and its moderate expression was found in heart,lung, brain, spleen, testis is and embryo. Abca8b isexpressed in heart, brain, lung, liver, and skeletalmuscle. Abca8a has the highest expression in lung;its moderate expression was found in heart, liver,skeletal muscle, testis, and embryo. For Abca9,highest expression in heart was confirmed byNorthern blot. In addition, the EST database wasscreened to find any tissue-specific representation ofthe characterized genes. For Abca9, 31 ESTs wereidentified, two-thirds of them derived from mam-mary gland. For Abca5, 27 ESTs with no single pre-

18 T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION

valent source tissue are deposited. For the otherthree genes, we found less than 20 ESTs from varioussources for each gene. The observation that the fivegenes have different, tissue-specific expression pat-tern points to the suggestion that after the expansionof the cluster, the genes have evolved to performdifferent physiological functions.

In conclusion, we describe the organization ofthe cluster of five mouse Abca genes and, from thecomparative analysis, propose the evolutionary his-tory of this cluster. All genes exhibit a distinct tis-sue-specific expression pattern, and only Abca5 isassociated with the CpG island that is conservedbetween mouse and human. Finally, we found thatseveral regions in this cluster both in mouse and

human, have been homogenized by gene conversionevents.

Acknowledgments

Portions of the data analysis were performed at theAdvanced Biomedical Computing Center, NCI-Frcderick. The content of this publication does notnecessarily reflect the views or policies of the De-partment of Health and Human Services, nor doesmention of trade names, commercial products, ororganizations imply endorsement by the U.S. Gov-ernment. This project has been funded in whole or inpart with Federal funds from the National CancerInstitute, under Contract N01-CO-5600.

Fig. 5. mRNA expression of five mouse Abcagenes. (A) Real-time PCR for relative quantitationof mRNA expression of mouse Abca5, Abca6,Abca8a, and Abca8b in multiple tissue panel.Skeletal muscle cDNA in Clontech’s multipletissue cDNA panel was used to construct astandard curve. All real-time PCR data werenormalized by correction to G3PDH real-timePCR data. The highest expression data in thepanel for each gene were applied a value of 100;other tissues were compared with the samplewith highest expression. Each column representsthe average of three amplification reactions. Thestandard deviation in all columns is between 2and 10% (not shown). (B) mRNA expression forthe mouse Abca9 gene.

T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION 19

References

1. Allikmets R (2000) Simple and complex ABCR: geneticpredisposition to retinal disease. Am J Hum Genet 67,793–799

2. Allikmets R, Singh N, Sun H, Shroyer NF, HutchinsonA et al. (1997) A photoreceptor cell-specific ATP-binding transporter gene (ABCR) is mutated in reces-sive Stargardt macular dystrophy. Nat Genet 15,236–246

3. Arnould I, Schriml L, Prades C, Lachtermacher-Tri-unfol M, Schneider T et al. (2001) Identification andcharacterization of a cluster of five new ATP-BindingCassette transporter genes on human chromosome17q24: a new sub-group within the ABCA sub-family.GeneScreen 1, 157–164

4. Berge KE, Tian H, Graf GA, Yu L, Grishin NV et al.(2000) Accumulation of dietary cholesterol in sitost-erolemia caused by mutations in adjacent ABC trans-porters. Science 290, 1771–1775

5. Brooks-Wilson A, Marcil M, Clee SM, Zhang LH,Roomp K et al. (1999) Mutations in ABC1 in Tangierdisease and familial high-density lipoprotein deficien-cy. Nat Genet 22, 336–345

6. Bungert S, Molday LL, Molday RS (2001) Membranetopology of the ATP binding cassette transporter ABCRand its relationship to ABC1 and related ABCA trans-porters: identification of N-linked glycosylation sites. JBiol Chem 276, 23539–23546

7. Cross SH, Bird AP (1995) CpG islands and genes. CurrOpin Genet Dev 5, 309–314

8. Dean M, Rzhetsky A, Allikmets R (2001) The humanATP-binding cassette (ABC)-transporter superfamily.Genome Res 11, 1156–1166

9. Deleuze JF, Jacquemin E, Dubuisson C, Cresteil D,Dumont M et al. (1996) Defect of multidrug-resistance3 gene expression in a subtype of progressive familialintrahepatic cholestasis. Hepatology 23, 904–908

10. Drouin G, Prat F, Ell M, Clarke GD (1999) Detectingand characterizing gene conversions between multi-gene family members. Mol Biol Evol 16, 1369–1390

11. Hyde SC, Emsley P, Hartshorn MJ, Mimmack MM,Gileadi U et al. (1990) Structural model of ATP-binding proteins associated with cystic fibrosis,multidrug resistance and bacterial transport. Nature346, 362–365

12. Klein 1, Sarkadi B, Varadi A (1999) An inventory of thehuman ABC proteins. Biochim Biophys Acta 1461,237–262

13. Kumar S, Tamura K, Jakobsen IB, Nei M (2001)MEGA2: molecular evolutionary genetics analysissoftware. Bioinformatics 17, 1244–1245

14. Kuwano1 M, Toh S, Uchiumi T, Takano H, Kohno Ket al. (1999) Multidrug resistance-associated proteinsubfamily transporters and drug resistance. AnticancerDrug Des 14, 123–131

15. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MCet al. (2001) Initial sequencing and analysis of the hu-man genome. International Human Genome Sequenc-ing Consortium. Nature 409, 860–921

16. Litman T, Druley TE, Stein WD, Bates SE (2001) FromMDR to MXR: new understanding of multidrug resis-tance systems, their properties and clinical signifi-cance. Cell Mol Life Sci 58, 931–959

17. Loo TW, Clarke DM (2001) Defining the drug-bindingsite in the human multidrug resistance P-glycoproteinusing a methanethiosulfonate analog of verapamil,MTS-verapamil. J Biol Chem 276, 14972–14979

18. Martinsohn JT, Sousa AB, Guethlein LA, Howard JC(1999) The gene conversion hypothesis of MHC evo-lution: a review. Immunogenetics 50, 168–200

19. Mosser J, Douar AM, Sarde CO, Kioschis P, Feil R et al.(1993) Putative X-linked adrenoleukodystrophy geneshares unexpected homology with ABC transporters.Nature 361, 726–730

20. Ohta T (2000) Evolution of gene families. Gene 259,45–52

21. Orso E, Broccardo C, Kaminski WE, Bottcher A, LiebischG, et al. (2000) Transport of lipids from golgi to plasmamembrane is defective in Tangier disease patients andAbc1-deficient mice. Nat Genet 24, 192–196

22. Page RD (1996) TreeView: an application to displayphylogenetic trees on personal computers. ComputAppl Biosci 12, 357–358

23. Papadakis MN, Patrinos GP (1999) Contribution ofgene conversion in the evolution of the humanbeta-like globin gene family. Hum Genet 104, 117–125

24. Remaley AT, Rust S, Rosier M, Knapper C, Naudin L,et al. (1999) Human ATP-binding cassette transporter 1(ABC1): genomic organization and identification of thegenetic defect in the original Tangier disease kindred.Proc Natl Acad Sci USA 96, 12685–12690

25. Riordan JR, Rommens JM, Kerem B, Alon N, RozmahelR (1989) Identification of the cystic fibrosis gene:cloning and characterization of complementary DNA.Science 245, 1066–1073

26. Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer Cet al. (2000) PipMaker—a web server for aligningtwo genomic DNA sequences. Genome Res 10,577–586

27. Slightom JL, Biechl AE, Smithies O (1980) Human fetalG gamma- and A gamma-globin genes: complete nu-cleotide sequences suggest that DNA can be exchangedbetween these duplicated genes. Cell 21, 627–638

28. Strautnieks S, Bull LN, Knisely AS, Kocoshis SA,Dahl N et al. (1998) A gene encoding a liver-specificABC transporter is mutated in progressive familialintrahepatic cholestasis. Nat Genet 20, 233–238

29. Stride BD, Cole SP, Deeley RG (1999) Localizationof a substrate specificity domain in the multidrug re-sistance protein. J Biol Chem 274, 22877–22883

30. Tanaka AR, Ikeda Y, Abe-Dohmae S, Arakawa R,Sadanami K et al. (2001) Human ABCA1 contains alarge amino-terminal extracellular domain homolo-gous to an epitope of Sjogren’s syndrome. BiochemBiophys Res Commun 283, 1019–1025

31. White ME, Crother BI (2000) Gene conversions mayobscure actin gene family relationships. J Mol Evol 50,170–174

20 T. ANNILO ET AL.: ABC GENE CLUSTER EVOLUTION