comparative analysis of a brassica bac clone containing several … · 2015-07-28 · comparative...

14
Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate genes with its corresponding Arabidopsis sequence Muqiang Gao, Genyi Li, Bo Yang, W. Richard McCombie, and Carlos F. Quiros Abstract: We compared the sequence of a 101-kb-long bacterial artificial chromosome (BAC) clone (B21H13) from Brassica oleracea with its homologous region in Arabidopsis thaliana. This clone contains a gene family involved in the synthesis of aliphatic glucosinolates. The A. thaliana homologs for this gene family are located on chromosome IV and correspond to three 2-oxoglutarate-dependent dioxygenase (AOP) genes. We found that B21H13 harbors 23 genes, whereas the equivalent region in Arabidopsis contains 37 genes. All 23 common genes have the same order and orien- tation in both Brassica and Arabidopsis. The 16 missing genes in the broccoli BAC clone were arranged in two major blocks of 5 and 7 contiguous genes, two singletons, and a twosome. The 118 exons comprising these 23 genes have high conservation between the two species. The arrangement of the AOP gene family in A. thaliana is as follows: AOP3 (GS-OHP) – AOP2 (GS-ALK) – pseudogene – AOP1. In contrast, in B. oleracea (broccoli and collard), two of the genes are duplicated and the third, AOP3, is missing. The remaining genes are arranged as follows: Bo-AOP2.1 (BoGSL-ALKa) – pseudogene – AOP2.2 (BoGSL-ALKb) – AOP1.1 – AOP1.2. When the survey was expanded to other Brassica accessions, we found variation in copy number and sequence for the Brassica AOP2 homologs. This study confirms that extensive rearrangements have taken place during the evolution of the Brassicacea at both gene and chro- mosomal levels. Key words: Brassica oleracea, B. rapa, comparative genomics, glucosinolates. Résumé : Les auteurs ont comparé la séquence d’un clone BAC (B21H13), long de 101 kb et provenant du Brassica oleracea, avec la séquence de la région homologue chez l’Arabidopsis thaliana. Ce clone contient une famille de gènes impliqués dans la synthèse de glucosinolates aliphatiques. Les homologues de ces gènes chez l’A. thaliana sont situés sur le chromosome IV et correspondent à trois gènes codant pour la 2-oxoglutarate-dépendante dioxygénase (gènes AOP). Les auteurs ont trouvé que le clone B21H13 contient 23 gènes tandis que le segment correspondant chez Arabi- dopsis en compte 37. Les 23 gènes en commun sont dans le même ordre et la même orientation chez les deux espèces. Les 16 gènes absents du clone BAC du brocoli sont organisés en deux blocs majeurs de cinq et sept gènes contigus, en plus d’une paire et de deux gènes isolés. Les 118 exons qui composent les 23 gènes communs montrent une grande conservation entre les deux espèces. L’arrangement des gènes AOP chez l’A. thaliana est le suivant : AOP3 (GS- OHP)– AOP2 (GS-ALK) – pseudogène – AOP1. Chez le B. oleracea (brocoli et chou à rosette), deux des gènes sont dupliqués et le troisième gène, AOP3, est absent. Les gènes sont agencés de la façon suivante : Bo-AOP2.1 (BoGSL- ALKa) – pseudogène – AOP2.2 (BoGSL-ALKb)– AOP1.1 AOP1.2. Lorsque l’examen de cette région a été étendu à d’autres accessions du genre Brassica, les auteurs ont observé de la variation quant au nombre de copies et quant à la séquence des homologues d’AOP2. Cette étude confirme qu’il y a eu des réarrangements importants au cours de l’évolution des crucifères tant aux niveaux génique que chromosomique. Mots clés : Brassica oleracea, B. rapa, génomique comparée, glucosinolates. [Traduit par la Rédaction] 679 Gao et al. Genome 47: 666–679 (2004) doi: 10.1139/G04-021 © 2004 NRC Canada 666 Received 4 September 2003. Accepted 23 February 2004. Published on the NRC Research Press Web site at http://genome.nrc.ca on 24 July 2004. Corresponding Editor: F. Belzile. M. Gao, G. Li, 1 B. Yang, and C.F. Quiros. 2 Department of Vegetable Crops, University of California, Davis, CA 95616, USA. W.R. McCombie. Cold Spring Harbor Laboratory, Genome Research Center, Woodbury, NY 11797, USA. 1 Present address: Department of Plant Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada. 2 Corresponding author (e-mail: [email protected]).

Upload: others

Post on 17-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

Comparative analysis of a Brassica BAC clonecontaining several major aliphatic glucosinolategenes with its corresponding Arabidopsissequence

Muqiang Gao, Genyi Li, Bo Yang, W. Richard McCombie, and Carlos F. Quiros

Abstract: We compared the sequence of a 101-kb-long bacterial artificial chromosome (BAC) clone (B21H13) fromBrassica oleracea with its homologous region in Arabidopsis thaliana. This clone contains a gene family involved inthe synthesis of aliphatic glucosinolates. The A. thaliana homologs for this gene family are located on chromosome IVand correspond to three 2-oxoglutarate-dependent dioxygenase (AOP) genes. We found that B21H13 harbors 23 genes,whereas the equivalent region in Arabidopsis contains 37 genes. All 23 common genes have the same order and orien-tation in both Brassica and Arabidopsis. The 16 missing genes in the broccoli BAC clone were arranged in two majorblocks of 5 and 7 contiguous genes, two singletons, and a twosome. The 118 exons comprising these 23 genes havehigh conservation between the two species. The arrangement of the AOP gene family in A. thaliana is as follows:AOP3 (GS-OHP) – AOP2 (GS-ALK) – pseudogene – AOP1. In contrast, in B. oleracea (broccoli and collard), two ofthe genes are duplicated and the third, AOP3, is missing. The remaining genes are arranged as follows: Bo-AOP2.1(BoGSL-ALKa) – pseudogene – AOP2.2 (BoGSL-ALKb) – AOP1.1 – AOP1.2. When the survey was expanded to otherBrassica accessions, we found variation in copy number and sequence for the Brassica AOP2 homologs. This studyconfirms that extensive rearrangements have taken place during the evolution of the Brassicacea at both gene and chro-mosomal levels.

Key words: Brassica oleracea, B. rapa, comparative genomics, glucosinolates.

Résumé : Les auteurs ont comparé la séquence d’un clone BAC (B21H13), long de 101 kb et provenant du Brassicaoleracea, avec la séquence de la région homologue chez l’Arabidopsis thaliana. Ce clone contient une famille de gènesimpliqués dans la synthèse de glucosinolates aliphatiques. Les homologues de ces gènes chez l’A. thaliana sont situéssur le chromosome IV et correspondent à trois gènes codant pour la 2-oxoglutarate-dépendante dioxygénase (gènesAOP). Les auteurs ont trouvé que le clone B21H13 contient 23 gènes tandis que le segment correspondant chez Arabi-dopsis en compte 37. Les 23 gènes en commun sont dans le même ordre et la même orientation chez les deux espèces.Les 16 gènes absents du clone BAC du brocoli sont organisés en deux blocs majeurs de cinq et sept gènes contigus, enplus d’une paire et de deux gènes isolés. Les 118 exons qui composent les 23 gènes communs montrent une grandeconservation entre les deux espèces. L’arrangement des gènes AOP chez l’A. thaliana est le suivant : AOP3 (GS-OHP) – AOP2 (GS-ALK) – pseudogène – AOP1. Chez le B. oleracea (brocoli et chou à rosette), deux des gènes sontdupliqués et le troisième gène, AOP3, est absent. Les gènes sont agencés de la façon suivante : Bo-AOP2.1 (BoGSL-ALKa) – pseudogène – AOP2.2 (BoGSL-ALKb) – AOP1.1 – AOP1.2. Lorsque l’examen de cette région a été étendu àd’autres accessions du genre Brassica, les auteurs ont observé de la variation quant au nombre de copies et quant à laséquence des homologues d’AOP2. Cette étude confirme qu’il y a eu des réarrangements importants au cours del’évolution des crucifères tant aux niveaux génique que chromosomique.

Mots clés : Brassica oleracea, B. rapa, génomique comparée, glucosinolates.

[Traduit par la Rédaction] 679

Gao et al.

Genome 47: 666–679 (2004) doi: 10.1139/G04-021 © 2004 NRC Canada

666

Received 4 September 2003. Accepted 23 February 2004. Published on the NRC Research Press Web site at http://genome.nrc.caon 24 July 2004.

Corresponding Editor: F. Belzile.

M. Gao, G. Li,1 B. Yang, and C.F. Quiros.2 Department of Vegetable Crops, University of California, Davis, CA 95616, USA.W.R. McCombie. Cold Spring Harbor Laboratory, Genome Research Center, Woodbury, NY 11797, USA.

1Present address: Department of Plant Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada.2Corresponding author (e-mail: [email protected]).

Page 2: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

Introduction

Arabidopsis thaliana (125 Mb) and rice (Oryza sativa,430 Mb) were targeted as the two model plants for dicotsand monocots, respectively, for whole-genome sequencingmostly because of their small genome size. The genomic se-quencing of A. thaliana has been completed (AGI 2000) andthe sequences of all five chromosomes have already been re-ported (http://www.arabidopsis.org/). With these sequencesin hand, an important task is to analyze their colinearity tothose of crop plants. This analysis is critical not only foridentifying genes of economic importance for crop improve-ment, but also to study their function. Additionally, this in-formation will shed light on how the species have divergedfrom a common ancestor.

The Brassicacea encompass approximately 340 generaand more than 3300 species (Hall et al. 2002). These speciesprobably diverged from a common ancestor approximately40 to 50 million years ago (Koch et al. 2001). Brassica andArabidopsis are two prominent genera in this family, ofwhich Brassica includes many important vegetables and oil-seed crops. These two genera diverged approximately 14.5to 20.4 million years ago (Yang et al.1999) and are beingsubjected to studies of comparative genomics at two levels:(i) genetic and physical mapping and (ii) DNA sequencing.Comparative mapping and sequencing have disclosed gen-eral conservation of gene content and collinearity betweenA. thaliana and Brassica species. However, this conservationis incomplete, owing to extensive chromosomal rearrange-ments expected from the different chromosome numberscontained in these species (Kowalski et al. 1994;Lagercrantz 1998; Lan et al. 2000; O’Neill and Bancroft2000; Ryder et al. 2001; Quiros et al. 2001; Babula et al.2003; Li et al. 2003; Lukens et al. 2003).

The B. oleracea bacterial artificial chromosome (BAC) cloneselected for this study is from broccoli and contains a genefamily involved in the synthesis of aliphatic glucosinolates(GSL). GSL are a prominent and diverse group of secondarymetabolites in crucifers including Brassica and Arabidopsis(Halkier 1999; Rask et al. 2000). Many glucosinolate deriva-tives have significant effects in human health and agricul-ture. The GSL gene BoGSL-ALK, has been cloned andcharacterized by Li and Quiros (2003). The A. thalianahomologs for this gene family are located on chromosomeIV and correspond to three duplicated 2-oxoglutarate-dependent dioxygenase genes named AOP1, AOP2, andAOP3 (Hall et al. 2001; Kliebenstein et al. 2001). AOP2 isthe homolog for BoGSL-ALK, which controls the conversionof methylsulfinylalkyl glucosinolates to the alkenyl form.Thus it is one of the major genes controlling content ofglucoraphanin (4-methylsulphinylbutyl glucosinolate) inBrassica crops. The hydrolysis of this glucosinolate yieldsthe isothiocyanate sulforaphane, which is a strong inducer ofphase II detoxification enzymes known to be of major im-portance in cancer protection. (Fahey et al. 1997; Nastruzziet al. 1996; Zhang et al. 1996). This product accumulates inbroccoli owing to the presence of a non-functional allele ofBoGSL-ALK (Li and Quiros 2003). AOP3 seems to corre-spond to another major gene in the aliphatic glucosinolatepathway, and its probable function is the conversion ofmethylsulfinylalkyl glucosinolates to their hydroxyalkylform. The function of AOP1 is unknown (Kliebenstein et al.

2001). The analysis of the sequence of BAC B21H13 shedslight on the organization of these important genes inB. oleracea and others genes that could be useful for the ge-netic improvement of this economically important species.

Methods and materials

Screening of BAC clones containing the BoGSL-ALKThe BoGSL-ALK gene was map-based cloned in

B. oleracea relying on a BAC library constructed with thebroccoli doubled-haploid line Early Big-10. After screeningthe library for clones harboring a marker associated to theBoGSL-ALK gene, we identified BAC clone B21H13 (Li andQuiros 2003, previously reported by oversight as B13H21).

Sequencing of BAC B21H13Brassica BAC B21H13 was sequenced and the data sub-

mitted to GenBank as accession No. AC122543 (McCombieet al. 2002). DNA from this BAC was extracted with a stan-dard alkaline lysis preparation method (Sambrook et al.1989), sheared with a nebulizer and nitrogen at 41 kPa wasused to generate a concentrated smear ~3–5 kb in length.The sheared fragments were blunt ended with mung beannuclease and dephosphorylated. Fragments ranging from 3–5 kb in size were isolated and ligated into a pBluescript KS+plasmid. The clones were sequenced from both directionsusing big dye terminator chemistry and run on an ABI Prism3700 capillary sequencer (Applied Biosystems, Foster City,Calif.). Base calling and quality assessment were done usingPHRED (Ewing and Green 1998), assembled by PHRAP,and edited with CONSED (Gordon et al. 1998). Gaps werefilled by a combination of primer walking and shotgun se-quencing of subclones with extremes at both sides of the se-quencing gaps. Final error rate was estimated usingCONSED.

Sequence analysis and gene predictionThe BAC sequence was analyzed for protein-coding genes

with the gene-prediction software for Arabidopsis: GenScan(Burge and Karlin 1997), GlimmerM (Salzberg et al.1999),and MZEF (Zhang 1997). Sequences of B21H13 (Brassica)and T4I9 and F4C12 (Arabidopsis) were aligned usingBLAST 2.2.5 (Altschul et al. 1997). The alignment resultwas viewed using ACT (www.sanger.ac.uk/software/act), aDNA sequence comparison viewer based on Artemis(Rutherford et al. 2000). The score used for ACT was 40.The sequence was also compared with Arabidopsis ESTs,cDNAs, and CDS using BLAST and FASTA with the NCBIand AGI database to analyze conservation of genes betweenthe two species. The research was done during July 2003 andthe last modified dates of all the gene models were 2–7 May2003.

The output on exon presence generated by the gene-prediction programs was combined with the alignment ofsegments between Brassica B21H13 and Arabidopsis. Theconserved regions (exons) were run through translatedBLAST (BLASTX) and were translated into proteins to ad-just AG–GT exon–intron boundaries to the open readingframes. The complete predicted cDNAs were determinedfrom the annotated structure of the gene.

© 2004 NRC Canada

Gao et al. 667

Page 3: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

© 2004 NRC Canada

668 Genome Vol. 47, 2004

Confirmation of duplication of the BoGSL-ALK geneTo confirm duplication of the BoGSL-ALK gene, we de-

signed primers based on the putative promoter region andexon 1 of each putative duplicate to amplify broccoli BACB21H13 and genomic DNA from collard. These two seg-ments in the duplicate copies of BoGSL-ALK differ by76 nucleotides in size (Fig. 1). The sequences of the primersused were as follows: odd48, 5′-TTCCATCATTTACTTTCT-CAG-3′; and odd12, 5′-TTGAATATCCAGTGTAAGGTT-3′.Additionally, we expanded this survey to amplify theBoGSL-ALK duplicates from genomic DNA of otherB. oleracea and B. rapa accessions and of two wild speciesrelated to B. oleracea (Table 1). After amplification, theproducts were fractionated by electrophoresis in a 1.5% w/vagarose gel.

Phylogenetic analysesThe alignment of the genes related to glucosinolate

biosynthesis in B21H13 and their AOP homologs (AOP1,AOP2, and AOP3) from different Arabidopsis species andecotypes was made with Clustal X (version 1.81) (Thomp-son et al. 1997). The maize homolog for these genes,AF540907 (Frey et al. 2003), was used as the outgroup.Bootstrapping of a neighbour-joining tree was done with1000 bootstrap trials and displayed with the Treeview pro-gram (Page 1996).

Results

Identification of protein-coding genes in B. oleraceaThe analysis of the broccoli BAC clone B21H13 revealed

that it is approximately 101.5 kb-long and contains 23 com-plete protein-coding genes (Fig. 2). All of these genes haveorthologs in A. thaliana, which are found in the partiallyoverlapping BAC clones T4I9 (1.28 Mb–1.38 Mb) andF4C12 (1.36 Mb–1.50 Mb) located on chromosome IV. The

Fig. 1. Diagram showing the locations of primers odd48 and odd12 on BoGSL-ALKa and BoGSL-ALKb.

No. Species CropCultivar orbreeding line

BoGSL-ALKcopy

1.B1654 B. oleracea var. acephala Flowering kale ‘Red on green’ ALKb2.B1771 B. oleracea var. acephala Flowering kale F1 ‘Red Feather’ ALKb3.B1772 B. oleracea var. acephala Flowering kale F1 ‘Rose Bouquet’ ALKb4.A82–13 B. oleracea var. botrytis Cauliflower DH line ALKb5.A155–18 B. oleracea var. botrytis Cauliflower DH line ALKb6.A1–118 B. oleracea var. botrytis Cauliflower DH line ALKb7.B119 B. oleracea var. botrytis Cauliflower 342 ‘Snowball’ ALKb8.B208 B. oleracea var. botrytis Cauliflower 343 ‘Self blanching’ ALKb9.B1806 B. oleracea var. botrytis Cauliflower ‘Minuteman’ ALKb10.B1814 B. oleracea var. botrytis Cauliflower ‘Snowball’ ALKb11.B265 B. oleracea var. botrytis Purple Cauliflower ‘Cauliflower Purple’ ALKa,b12.B479 B. oleracea var. alboglabra Chinese kale ‘White flower kale’ ALKa,b13.B115 B. oleracea var. acephala Collard ‘George’ ALKa,b14.B10–141 B. oleracea var. italica Broccoli DH line ALKa,b15.B12–31 B. oleracea var. italica Broccoli DH line ALKa,b16.B15–200 B. oleracea var. italica Broccoli DH line ALKa,b17.B008 B. oleracea var. italica Broccoli ‘Topper 43-70’ ALKa,b18.B1815 B. oleracea var. italica Broccoli ‘Legacy’ ALKa,b19.B488 B. rapa subsp. pekinensis Chinese cabbage ‘Matsushima’ ALKa20.02B167 B. rapa subsp. pekinensis Chinese cabbage DH line: RI16 ALKa21.02B234 B. rapa subsp. pekinensis Chinese cabbage DH line: Y216-55 ALKa22.2B246 B. rapa subsp. pekinensis Chinese cabbage DH line: BZ26 ALKa23.B218 B. rapa subsp. rapifera Turnip ‘White Lady’ ALKc24.B493 B. rapa subsp. rapifera Turnip ‘Yori Spring’ ALKc25.99B153 B. macrocarpa Wild ALKa26.99B11 B. villosa. subsp. tinei Wild ALKd27.99B13 B. villosa. subsp. drepanensis Wild ALKLb,d

Table 1. BoGSL-ALK duplicates in different accessions of Brassica oleracea, two related wild species, andBrassica rapa.

Page 4: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

© 2004 NRC Canada

Gao et al. 669

region corresponding to B21H13 in Arabidopsis is approxi-mately 119.5 Kb and contains 37 protein-coding genes. Noadditional similarity was found between the sequence ofBAC B21H13 and those Arabidopsis BAC T4I9 and F4C12outside the 23 homologs. Since there is no exon-predictingsoftware specially designed for Brassica, we relied on theexisting programs designed for Arabidopsis sequences.Genscan predicted 19 genes containing 81 exons,GlimmerM predicted 59 genes and 186 exons, and MZEFpredicted only 53 exons. After alignment and comparisonwith Arabidopsis gene sequences and with other plant genesequences were done, 23 genes containing 118 exons wereresolved in B21H13. Genscan and GlimmerM were the mostsuccessful ones, correctly predicting 59 and 61 exons out of118, respectively. MZEF only correctly predicted 24 exons.cDNA sequences were the most effective as assigning exonswhen the programs failed owing to the presence of base sub-stitutions, insertions, and deletions affecting splicing sites.

Besides the conservation of exons in the two species,there were conserved segments of various lengths in the pro-moters and 3′ UTR regions for some of the genes. For exam-ple, there were as many as 12 conserved fragments betweenBo-20 and Bo-21, ranging from 40 to 172 bp in length, andthree conserved fragments between Bo-19 and Bo-20, thelongest being 311 bp. Other conserved fragments were alsoobserved in the UTRs of Bo-2–Bo-3, Bo-9–Bo-10, Bo-10–Bo-11, and Bo-13–Bo-14 (Fig. 2).

Comparison of B21H13 and its the correspondingregion in Arabidopsis

Table 2 summarizes the main features of the broccoliB21H13 sequence compared with its A. thaliana counterpartand the whole genome of the latter species as a reference.The most striking difference was on gene density, where theBrassica clone contained on average a gene every 4.4 kb,which approaches the average gene density of theA. thaliana genome (The Rice Chromosome 10 SequencingConsortium 2003). However, the Arabidopsis segment corre-sponding to the Brassica BAC clone has a higher than aver-age gene density, one gene for every 3.2 kb, owing in part tothe presence of shorter gene spacers.

High level of DNA sequence conservationAll 23 genes detected in B21H13 have high sequence con-

servation with their Arabidopsis counterparts (Tables 3 and4; Fig. 2). The highest DNA sequence identity was 92%, ob-served for homologs Bo-4 and At-4 (transducin). Fifteengenes have sequence identities ranging from 80% to 89%.Only seven genes have lower identities, of the order of 72%to 79%. Considering all the 118 protein-coding exons inB21H13, 19 exons (16%) have high DNA sequence identityranging from 90% to 100% with their correspondingArabidopsis exons; 70 exons (59%) have sequence identityranging from 80% to 89%; 24 (20%) have DNA sequenceidentity from 70% to 79%. The rest of the exons (4%) haveidentities lower than 70%.

Gene content in the Brassica BAC and in itscorresponding Arabidopsis region

All 23 protein-coding genes found in B21H13 are ar-ranged in the same order and orientation in Brassica (Bo-1

to Bo-23) and Arabidopsis (At-1 to At-23) (Fig. 3). Twelvegenes have forward and 11 have reverse orientation. Themain discrepancy between the corresponding sequences ofboth species is the absence of 16 genes in B. oleracea. Mostof the missing genes in B. oleracea are found organized intotwo clusters of five and seven genes in Arabidopsis (Ta-ble 3; Fig. 3). The rest are one twosome and two singletons.Only in the spacer between Bo-17 and Bo-18, where a blockof five A. thaliana genes is missing, is there evidence of theprevious presence of one of these genes in ancestral times.The spacer sequence between Bo-17 and Bo-18 contains thelast three exons of gene At4g03205. The first two exons ofthis gene are deleted, as well as the complete set of fourother genes found in this interval in Arabidopsis.

Duplication of the glucosinolate genesThe three copies of the AOP genes in A. thaliana are ar-

ranged as follows: At-AOP3(GS-OHP) – At-AOP2(GS-ALK) – pseudogene – At-AOP1. On the other hand, insteadof the triplet present in A. thaliana, in the B21H13 broccoliBAC, there are two pairs of duplicates for two of the AOPhomologs (AOP1 and AOP2), and AOP3 is absent. The firstpair corresponds to the BoGSL-ALK gene and is equivalentto the Arabidopsis AOP2 gene. The translated sequences ofthe BoGSL-ALK duplicates are practically identical, showingonly one base difference in exon 1 (T→ C transition, at theposition of 34 031 bp), and therefore we named themBoGSL-ALKa (Bo-6A) and BoGSLALKb (Bo-6B). Further,both copies are non-functional owing to the presence of the2-bp deletion in exon 2 (Li and Quiros 2003). The DNA seg-ments containing Bo-6A (31 596 – 34 096 bp) and Bo-6B(20 468 – 22 968 bp), are completely duplicated in this BACclone (Fig. 2). The distance from the putative promoter re-gion corresponding to primer odd48 to the start codon onexon 1 for BoGSL-ALKa and BoGSL-ALKb is 214 and 290bp, respectively (Fig. 1). The amplification of these seg-ments in both the broccoli BAC clone and genomic DNAwith primers odd48 and odd12 yielded two bands of 491 and567 bp (Fig. 4). Other three-primer sets designed for this re-gion also produced double bands in both collard and broc-coli DNA (data not shown). This result confirmed that theduplicated genes were real and not the result of sequencingor cloning artifacts. Then we used the same primers to am-plify the same region in other Brassica accessions in whichBoGSL-ALK is known to be functional, such as collard,kales, white cauliflower, Brassica macrocarpa, and Brassicarapa (Chinese cabbage and turnip), or non-functional, suchas Brassica villosa (Li and Quiros 2003, and unpublished).We detected polymorphism for the BoGSL-ALK ampliconsin terms of size and number regardless of functionality. Al-most half of the tested accessions showed a single BoGSL-ALK band (Fig. 4). Three flowering kales and white cauli-flower displayed the larger band, corresponding to BoGSL-ALKb, whereas B. macrocarpa and four accessions of Chi-nese cabbage had the smaller one, corresponding to BoGSL-AlKa. The two turnips and one accession of B. villosa hadonly one band, but of intermediate migration. The size ofthis band was slightly larger in B. villosa than in B. rapa(Fig. 4). Partial sequencing of the band from turnip revealedthat the segment spanning from the putative promoter(primer odd48) to exon 1 (primer odd12) was 510 bp. We

Page 5: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

© 2004 NRC Canada

670 Genome Vol. 47, 2004

Fig

.2.

AC

Tpl

otde

mon

stra

ting

cons

erve

dse

quen

cere

gion

sbe

twee

nB

rass

ica

and

Ara

bido

psis

.T

henu

mbe

rssh

owth

ele

ngth

(in

bp)

ofea

chse

quen

ce.

The

sequ

ence

onto

pis

B.

oler

acea

BA

CB

21H

13;

the

sequ

ence

atth

ebo

ttom

corr

espo

nds

toA

.th

alia

naov

erla

ppin

gB

AC

sT

4I9

and

F4C

12.

The

loca

tion

ofth

eA

OP

hom

olog

sin

both

spec

ies

are

show

n.T

hecr

oss

link

sin

thes

ege

nes

resu

ltfr

omth

eir

cons

erve

dre

gion

s.

Page 6: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

called this version of the gene BoGSL-ALKc. In the wildspecies B. villosa, where BoGSL-ALK is non-functional,there is a fourth variant, BoGSL-ALKd, displaying a 531-bpband that includes four additional nucleotides in exon 1 notpresent in the other three versions of the BoGSL-ALK genes.All other accessions have the duplication observed in broc-coli (Table 1; Fig. 2). The 5′ UTR region of the four BoGSL-ALK variants is remarkably similar, and it is not possible toascertain which is ancestral.

Sequencing of BoGSL-ALKa and BoGSL-ALKb from col-lard, where BoGSL-ALK is functional (Li and Quiros 2003),revealed that both copies do not have the 2-bp deletion inexon 2 observed in the non-functional alleles of broccoli.

The second pair of duplicates in the broccoli BAC clone,the orthologs to Arabidopsis AOP1, were named Bo-AOP1aand Bo-AOP1b (Fig. 2). The sequence identity of the exonsbetween At-AOP1 (At-7) and Bo-AOP1a (Bo-7A) and Bo-AOP1b (Bo-7B) is 82.3% and 81.9%, respectively. The iden-tity of the exons between Bo-AOP1a (Bo-7A) and Bo-AOP1b (Bo-7B) is 90.2%, higher than the identity betweenthe different species. The two pseudogenes in the corre-sponding gene families are unrelated in the two species andare also unrelated to the AOP genes.

Structure of the AOP genes in Arabidopsis and broccoliBAC B21H13

To gain an insight into the evolution of the AOP1, AOP2and AOP3 genes, we compared the structure of these threegenes in A. thaliana ecotype Columbia-0 with the counter-parts for the first two genes from broccoli: BoGSL-ALKa,BoGSL-ALKb, Bo-AOP1a, and Bo-AOP1b. A homolog forAOP3 has not yet been reported in Brassica. AOP1 andAOP2 have three exons in both Arabidopsis and Brassica,whereas AOP3 has four exons in Arabidopsis. After align-ment, we found that the corresponding exons of all seven ofthese genes have high DNA sequence identity (Fig. 5; Ta-

ble 5). Exon 1 has the highest conservation among all genes.In general, the identity of the two duplicate copies in Bras-sica is the highest. These have higher identity to AOP1 thanthis gene has to AOP2. In general, the identity levels ofAOP1, AOP2, and AOP3 with each other were similar. Exon2 in all the AOP genes and their Brassica homologs could bebroken down in two to three sections according to their loca-tion in exon 2: 2a (87%–70% identity), 2b (88%–70% iden-tity), and 2c (70%–59%) (Fig. 5). Section 2a, comprising170 bp, was common to all homologs. Section 2b was alsopresent in all homologs, but it displayed different arrange-ments. In AOP1 and its Brassica homologs, section 2b wasnext to section 2a. In AOP2 and its two Brassica homologs,section 2c was next to 2a, followed by 2b. Section 2b has alength of 170 bp in all homologs, except for Brassica Bo-AOP1a and Bo-AOP1b, where this section is 161 bp long.Furthermore, in AOP3, section 2b is present as an independ-ent entity from the other two sections, forming an additionalexon, exon 3, in this gene. Exon 3 conserves the 170-bplength present in most of the other homologs. Section 2c isvariable in size, except for Brassica homologs BoGSL-ALKaand BoGSL-ALKb.

Phylogenetic analyses of the AOP genesThe tree generated by Clustal X included sequences from

GenBank for the three AOP homologs from variousA. thaliana ecotypes, A. lyrata and A. halleri, as well asB. oleracea. Clustal X resolved a tree with two main clus-ters, one split in two subgroups including AOP2 and theirBrassica homologs and AOP3, and the other cluster includ-ing AOP1 and its homologs. In agreement with the structuraldata for the AOP genes, the first cluster includes all thegenes containing the extra exonic section 2c, which is pres-ent in the genes of the two subgroups. Within the AOP1cluster, the Brassica homologs form their own branch show-ing higher proximity to the A. lyrata homolog than to the

© 2004 NRC Canada

Gao et al. 671

Arabidopsiscorresponding region

Brassica BAC cloneB13H21 (GenbBankacc. No.AC122543) 21 genes All 37 genes

Arabidopsiswhole genomea

Sequence length 101.5 kp 119.5 kp 117.3MbG+C Content

Overall 37.1% 35.9% 35.9%Protein-coding DNA 46.4% 43.4% 43.8% 44.0%Non-coding region 32.8% 31.7% 32.6%

Total no. of genes 23 21 37 29 084Average gene size (bp) 2086 2067 1802 1975Average gene density (bp/gene) 4415 3231 4008Average no. exons/gene 5.1 5.3 4.8 4.9Average exon size (bp) 264 269 237 275Average no. introns per gene 4.1 4.3 3.8 3.9Average intron size (bp) 176 137 167 163Average spacer size (bp) 2425 — 1468 —

aArabidopsis data extracted from The Rice Chromosome 10 Sequencing Consortium (2003).

Table 2. Features of Brassica BAC clone B13H21 compared with its corresponding Arabidopsisthaliana segment and to the whole genome of the latter species.

Page 7: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

© 2004 NRC Canada

672 Genome Vol. 47, 2004

A. thaliana and A. halleri homologs. A similar situation wasobserved for the AOP2 subgroup, where the Brassicahomologs occupy a single branch closer to A. lyrata than tothe A. thaliana homologs. The maize homolog AF540907occupies a separate branch in the tree (Fig. 6).

Other genes in B21H13The exon number of the other Brassica genes on BAC

B21H13 and their corresponding Arabidopsis homologs isconserved, showing the typical high-identity values for thesequences of these two species (Table 4 and Fig. 2). Thefunction for most of the Arabidopsis homologs on this BACis unknown. However, there are a few other genes of inter-est, like At-3 (At4g03010) and At-19 (At4g03260), whichbelong to the leucine-rich repeat protein family. At-14(At4g033140) is a member of a short-chain acyl-CoAdehydrogenase/reductase family protein; At-11 (At4g03115)is a member of a mitochondrial carrier protein family. At-8(At4g03090) and At-18 (At4g03250) correspond to homo-domain proteins. At-10 (At4g03110) is an RNA-binding pro-tein. At-4 (At4g03020) is one of the six members of thetransducin/WD-40 protein family. At-12 (At4g03120) is amember of a proline-rich protein family containing proline-

rich extension domains. At-16 (At4g03190) is an F-boxGRR1-like protein. At-21 (At4g03280) corresponds to aRieske FeS protein (component of cytochrome b6F complex(EC 1.10.99.1)). cDNA sequences have been obtained forthe last six genes supporting their structure and function.

In addition to these homologs, we found 10 simple se-quence repeat (SSR) loci near or inside some of the protein-coding genes in B21H13. Five of them are dinucleotideSSRs, (AC)8 in intron 2 of Bo-2, (AG)8 in intron 13 of Bo-8,(AG)9 105 bp upstream of Bo-9, (CT)9 upstream of Bo-14,and (AG)9 in intron 5 of Bo-17. The following fourtrinucleotide SSRs were also found: (CTT)6 present in exon1 of Bo-2, (AAC)7 in intron 3 of Bo-11, and (CCT)7 and(TCA)11 in exon 2 of Bo-13. One mixed SSR consisting ofdinucleotide and trinucleotide repeats, (AGG)6(ATG)6(AC)1(ATG)2 was found in exon 4 of Bo-18. Only the lastmixed SSR was conserved in Arabidopsis, the others wereunique to the B. oleracea BAC.

Discussion

Microcolinerity between Brassica and ArabidopsisOur results comparing corresponding sequences of

Arabidopsis

Gene No. Description cDNA Exons* Introns* Spacer

At-1. At4g02990 Hypothetical Partial 1626/1 0/0 —At-2. At4g03000 Hypothetical Partial 2595/2 152/1 818At-3. At4g03010 LRR protein No found 1188/1 0/0 2602At-4. At4g03020 Transducin Full length 1482/9 1287/8 1028At-5. At4g03030 Kelch repeat containing F-box Partial 1329/1 0/0 890At4g03040 Hypothetical Not found 279/3 1779/2 3927At4g03050 AOP3 (GS-OHP) Full length 870/3 470 1650At-6. At4g03060 AOP2 (GS-ALK) Full length 1277/3 608/2 5717

At-7. At4g03070 AOP1 Full length 969/3 210/2 4710

At4g03080 Phospho-ser/thr phosphatase Partial 1646/21 3586/20 314At-8. At4g03090 NDX1 Partial 2634/14 1809/13 1546At-9. At4g03100 rac GTPase activating protein Partial 1275/4 260/3 3223At-10. At4g03110 RNA-binding protein Full length 1326/9 1338/8 534At-11. At4g03115 Mitochondrial carrier protein Partial 903/8 1217/7 4254At-12. At4g03120 Proline-rich protein Full length 624/5 961/5 262At-13. At4g03130 Hypothetical Not found 2298/7 777/6 239At4g03135 tRNA protein Not found (72/1) (0/0) 248At-14. At4g03140 Short-chain dehydrogenase Partial 840/3 222/2 1634At-15. At4g03150 Expressed protein Full length 558/2 241/1 43At4g03153 Unknown protein Not found (648/2) (96/1) 340At4g03156 Hypothetical Not found (234/2) (549/1) 413At4g03160 Hypothetical Not found (579/2) (128/1) 1194At4g03165 Hypothetical No found (453/3) (462/2) 692At4g03170 Hypothetical No found (753/1) 0/0 540At4g03175 Kinase-related Not found (420/4) (258/3) 602At4g03180 Expressed Partial (558/2) (817/1) 424At-16. At4g03190 F-box protein Full length 1758/3 192/2 0At-17. At4g03200 Expressed Full length 2457/15 1814/14 1818

Note: Spacer, spacer length (bp) from the previous gene stop codon to the listed gene start codon.*Length (bp)/number.

Table 3. List and properties of orthologs in Brassica BAC B21H13 and it corresponding region in Arabidopsis thaliana.

Page 8: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

B. oleracea (~101 kb) and A. thaliana (~119 kb) supportprevious reports on the general conservation of gene struc-ture, order, and content between Brassica and Arabidopsisbased on hybridization (Cavell et al. 1998; Lan et al. 2000;O’Neil and Bancroft 2000; Bancroft 2001) and on DNA se-quence analysis (Quiros et al. 2001). This conservation ismostly incomplete and segmental, accounting for numerousgenomic rearrangements resulting in frequent gene reshuf-fling and duplication. Our findings are not unexpected, con-sidering that B. oleracea has almost twice the chromosomenumber of A. thaliana and they are divergent enough to beclassified in different taxonomic tribes. The absence of 16out of 37 genes (43%) in Brassica suggests that blocks ofgenes have often been moved during evolution of theBrassicaceae. Gene absence in Brassica segments homolo-gous to Arabidopsis has been reported before by O’Neil andBancroft (2000). According to their findings, the most likelyscenario is that these genes have not been lost but are lo-cated somewhere else in the Brassica genome. Not havingsequence information for this segment in other related spe-cies, it is impossible to infer what might be the ancestralgene arrangement. However, the presence of three relicexons of gene At4g03205 between Bo17 and Bo18 suggests

that at least some of the five missing genes in that segmentexisted previously in B. oleracea. The high level ofinterspecies conservation for the orthologs found in B21H13in terms of exon number, size, and sequence was within therange observed for other genes previously compared (Quiroset al. 2001). However, none of the three exon-predictionprograms designed for A. thaliana used in the analysis of thesequences was able to find all the exons in the genes. Themost effective method to identify nearly all the exons wassequence comparison, including cDNAs, when available.Similar results were reported for barley BACs when usingrice for comparison (Dubcovsky et al. 2001).

Evolution of AOP genesIt is becoming evident that the genes involved in aliphatic

GSL biosynthesis are organized in gene families. For exam-ple, the MAM genes coding for isopropyl malate synthaseshave at least four members, two repeated in tandem on chro-mosome V sharing a nucleotide similarity of 87%, and twoothers on different arms of chromosome I. At least one ofthese genes is involved in GSL side-chain elongation (Cam-pos de Quiros et al. 2000, Kroymann et al. 2001). Two otherkey genes in the pathway, determining the short (3 and

© 2004 NRC Canada

Gao et al. 673

Brassica

Gene No. Exons Introns Spacer Exons identiy (%)

Bo-1 1590/1 0/0 — 80.3Bo-2 2127/2 363/1 2278 75.5Bo-3 1197/1 0/0 2917 83.9Bo-4 1476/9 1065/8 1363 92.8Bo-5 1272/1 0/0 1178 79.5DeletedDeletedBo-6A. Bo-ALKa 1318/3 572/2 3535 72.6Bo-6B. Bo-ALKb 1318/3 572/2 9239 72.6Bo-7A. Bo-AOP1.1 966/3 430/2 3644 82.3Bo-7B. Bo-AOP1.2 966/3 439/2 775 81.9DeletedBo-8 2568/14 1658/13 534 79.7Bo-9 1257/4 715/3 1801 82.5Bo-10 1272/9 1722/8 450 85.6Bo-11 828/9 1043/7 1403 78.9Bo-12 585/5 638/4 272 80.7Bo-13 2109/7 765/6 203 76.5DeletedBo-14 819/3 1159/2 1645 85.5Bo-15 537/2 352/1 59 80.1DeletedDeletedDeletedDeletedDeletedDeletedDeletedBo-16 1758/3 1004/2 3739 83.6Bo-17 2433/15 2064/14 1261 87.1

Page 9: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

4 carbons) versus long (more than 5 carbons) side chains,CYP79F1 and CYP79F2, are in tandem on chromosome I inA. thaliana. Both of them share the same gene structure anda similarity of 91% in their DNA sequences. These twogenes have probably evolved by tandem gene duplication,sharing 89% amino acid sequence identity (Chen et al.2003). The AOP genes follow a similar pattern, where threemembers are in tandem on chromosome IV in A. thaliana.This pattern is partially conserved in B. oleracea, where

AOP3 is missing but BoGSL-ALK, the homolog to AOP2, ispresent in one or two copies, depending on the crop. It couldbe argued that this duplication is universal in B. oleraceaand that the accessions displaying a single band resultingfrom DNA amplification might have two identical copies forthis gene. However, this possibility is unlikely since there isa pseudogene between the duplicates and their 5′UTRs havedifferent sizes. The existence of at least two other versionsof BoGSL-ALK in two other species indicates that the dupli-

© 2004 NRC Canada

674 Genome Vol. 47, 2004

Gene Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Exon 7

At4G02990 1626Bo-1 1590 (72%)At4g03000 2026 569Bo-2 1623 (70%) 539 (86%)At5G03010 1188Bo-3 1197 (84%)At4g03020 438 120 78 177 78 179 159Bo-4 432 (82%) 120(95%) 78(91%) 177(93%) 78 (85%) 179 (90%) 159 (86%)At4g03030 1329Bo-5 1272 (76%)At4g03060 378 638 261Bo-6a 368 (77%) 689 (68%) 261 (75%)At4g03060 378 638 261Bo-6b 368 (77%) 689 (68%) 261 (75%)At4g03070 386 340 249Bo-7a 386 (82%) 331 (82%) 249 (82%)At4g03070 386 340 249Bo-7b 386 (83%) 331 (83%) 249 (79%)At4g03090 144 156 84 129 219 132 93Bo-8 150 (70%) 159 (72%) 84 (78%) 129 (82%) 219 (85%) 132 (85%) 87 (89%)At4g03100 338 121 351 465Bo-9 329 (78%) 121 (88%) 351 (88%) 441 (76%)At4g03110 178 83 52 126 80 180 334Bo-10 178 (86%) 83 (88%) 52 (97%) 126 (88%) 80 (91%) 180 (86%) 295 (80%)At4g03115 136 117 62 237 162 54 38Bo-11 100 (81%) 117 (70%) 62 (87%) 237 (87%) 162 (89%) 54 (81%) 38 (92%)At4g03120 8 43 42 229 302Bo-12 8 (100%) 43 (86%) 42 (92%) 217 (85%) 254 (79%)At4g03130 58 327 1352 267 81 112 101Bo-13 70 (67%) 318 (47%) 1184 (70%) 267 (89%) 81 (86%) 112 (82%) 77 (65%)At4g03140 35 319 486Bo-14 14 (30%) 319 (88%) 483 (84%)At4g03150 291 267Bo-15 279 (73%) 258 (85%)At4g03190 455 493 810Bo-16 455 (85%) 493 (81%) 810 (84%)At4g03200 294 73 22 79 33 149 157Bo-17 270 (80%) 73 (97%) 22 (100%) 79 (91%) 33 (91%) 149 (88%) 157 (80%)At4g03250 25 606 80 474 75 98 73Bo-18 25 (88%) 585 (87%) 80 (85) 456 (79%) 78 (75%) 77 (83%) 70 (84 %)At4g03260 51 1243 147 593Bo-19 51 (84%) 1123 (87%) 147 (82%) 578 (81%)At4g03270 201 84 96 208 131 189Bo–20 210(82%) 84 (94%) 96 (86%) 208 (87%) 131 (88%) 177 (78%)At4g03280 33 251 118 183 105Bo–21 33 (91%) 257 (83%) 118 (89%) 183 (94%) 105 (87%)

Table 4. Exon size (in bp) comparison of Arabidopsis thaliana and Brasica oleracea orthologs (identity (%)).

Page 10: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

cation of these paralogs was followed by mutation and geneconversion. Similar events can be postulated for the AOP1genes, although their duplication is probably older thanthose of BoGSL-ALK (AOP2), judging by their lower se-quence identity.

Arabidopsis and Brassica diverged at least 14.5 millionyears ago from a common ancestor. Before the separation ofthe two lineages, the AOP gene underwent at least two tan-dem duplications generating the triplet AOP3-AOP2-AOP1.

This arrangement has been conserved in A. thaliana; in theBrassica lineage, AOP1 and AOP2 each have undergone an-other round of or tandem duplications, whereas AOP3 prob-ably has been moved to another location in the genome. TheAOP2 duplicates in Brassica, BoGSL-ALKa and BoGSL-ALKb, must be a recent event judged by the high sequencesimilarity (99.9%) of the duplicates and lack of fixation inall the accessions tested for these species. Upon inspectionof the exonic structure of all three genes, AOP1, AOP2, and

© 2004 NRC Canada

Gao et al. 675

Exon 8 Exon 9 Exon 10 Exon 11 Exon 12 Exon 13 Exon 14 Exon 15

61 19261 (91%) 192 (89%)

153 283 138 70 61 471 495162 (86%) 283 (83%) 138 (82%) 70 (80%) 61 (80%) 444 (7%) 495 (79%)

144 149144 (88%) 134 (77%)

9797 (94%)

241 242 264 111 225 204 177 186241 (87%) 242 (89%) 264 (88%) 111 (89%) 225 (86%) 204 (88%) 177 (86%) 186 (90%)

Page 11: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

AOP3, and their Brassica homologs, assuming that theintronless maize AOP homolog is ancestral, one can specu-late that AOP1 is closer to the ancestral gene. AOP1 might

then have originated the other two homologs in a stepwisefashion. AOP2 may have originated by insertion of the se-quence corresponding to section 2c in exon 2; AOP3 might

© 2004 NRC Canada

676 Genome Vol. 47, 2004

Fig. 3. Order and orientation of genes contained in the B. oleracea BAC B21H13 (Bo) and its corresponding region in Arabidopsis(At). The squares above the line represent genes with forward orientation, whereas those under the line represent reverse orientation.Unnumbered squares in Arabidopsis indicate that these genes are absent in B21H13.

Fig. 4. Polymorphism on copy number and size of BoGSL-ALK genes detected by amplification with primers odd48 and odd12. Thesmaller band (491 bp) corresponds to BoGSL-ALKa, the larger band (567 bp) to BoGSL-ALKb. The bands of intermediate migrationcorresponds to BoGSL-ALKc (510 bp) found only in some accessions of B. rapa and BoGSL-ALKd (531 bp) in B. villosa. Lane num-ber corresponds to accessions listed in Table 2. M is for 100 bp markers.

Fig. 5. Structure of AOP1, AOP2, AOP3 homologs in A. thaliana and B. oleracea. Conserved exons and exons sections share the samegraphic pattern. Size in base pairs.

Page 12: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

© 2004 NRC Canada

Gao et al. 677

Iden

tity

(%)

At-

AO

P1

Bo-

AO

P1a

Bo-

AO

P1b

At-

AO

P2

BoG

SL-A

LK

aB

oGSL

-AL

Kb

12a

2b3

12a

2b3

12a

2b3

12a

2c2b

31

2a2c

2b3

12a

2c3

4

Bo-

AO

P1a

Exo

n1

82E

xon

2a83

Exo

n2b

82E

xon

382

Bo-

AO

P1b

Exo

n1

8393

Exo

n2a

8186

Exo

n2b

8788

Exo

n3

8090

At-

AO

P2

Exo

n1

7773

71E

xon

2a79

7374

Exo

n2c

7674

76E

xon

2b67

6969

Exo

n3

Bo-

AO

P2a

Exo

n1

7570

7278

Exo

n2a

7873

7183

Exo

n2c

63E

xon

2b75

7472

81E

xon

366

7069

75B

o-A

OP

2bE

xon

175

7072

7810

0E

xon

2a78

7371

8310

0E

xon

2c63

100

Exo

n2b

7574

7281

100

Exo

n3

6670

6975

100

At-

AO

P3

Exo

n1

7772

7382

7979

Exo

n2a

8073

7075

7979

Exo

n2c

7170

7059

59E

xon

372

8171

71E

xon

473

7472

8178

78

Tab

le5.

Iden

tity

ofco

rrre

spon

ding

exon

sor

exon

sect

ions

for

A.

thal

iana

(At)

AO

Pge

nes

and

thei

rho

mol

ogs

inB

.ol

erac

ea(B

o).

Page 13: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

© 2004 NRC Canada

678 Genome Vol. 47, 2004

have originated from this gene by acquiring a second inser-tion, thereby splitting exon 2 by insertion of the sequencecorresponding to intron 2.

These observations confirm the plasticity of the genomesin the Brassicaceae where extensive rearrangements havetaken place during the evolution of the species of these fami-lies, at both gene and chromosomal levels.

Acknowledgements

We are indebted to Dr. Lidia Nascimento from Cold

Spring Harbor Laboratory for reception and sequencing co-ordination of BAC clone B21H13, and to Drs. RogerChetelat and Daniel Kliebenstein for reading the manuscript.We are also indebted to Dr. Jong-Min Baek from theGenomic Facility on campus for his help running the ACTprogram and to Dr. Daniel Potter for his help with theClustal X program. Research was supported by USDA–IFAFS grant No. 00-52100-9683, “Development of GenomicTools and Resources for Brassica”. The sequencing wasfunded under NSF grant NSF DBI 9813578, “A Genetic Ap-proach to Ordered Sequencing of Arabidopsis”.

References

AGI 2000. Analysis of the genome sequence of the flowering plantArabidopsis thaliana. Nature (London), 408: 796–815.

Altschul, S.F., Thomas, L.M., Schaffer, A.A., Zhang, J., Zheng, Z.,Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic Acids Res. 25: 3389–3402.

Babula, D., Kaczmarek, M., Barakat, A., Delseny, M., Quiros,C.F., and Sadowski, J. 2003. Chromosomal mapping of Brassicaoleracea based on ESTs from Arabidopsis thaliana: complexityof the comparative map. Mol. Genet. Genomics, 268: 656–65.

Bancroft, I. 2001. Duplicate and diverge: the evolution of plant ge-nome microstructure. Trends Genet. 17: 89–93.

Burge, C., and Karlin, S. 1997. Prediction of complete gene struc-tures in human genomic DNA. J. Mol. Biol. 268: 78–94.

Campos De Quiros, H., Magrath, R., Mccallum, D., Kroymann, J.,and Scnabelrauch, D. 2000. Keto acid elongation andglucosinolate biosynthesis in Arabidopsis thaliana. Theor. Appl.Genet. 101: 429–437.

Cavell, A.C., Lydiate, D.J., Parkin, I.A.P., Dean, C., and Trick, M.1998. A 30 centimorgan segment of Arabidopsis thaliana chro-mosome 4 has six collinear homologues within the Brassicanapus genome. Genome, 41: 62–69.

Chen, S., Glawischnig, E., Jorgensen, K., Naur, P., Jorgensen, B.,Olsen, C.E., Hansen, C.H., Rasmussen, H., Pickett, J.A., andHalkier, B.A. 2003. CYP79F1 and CYP79F2 have distinct func-tions in the biosynthesis of aliphatic glucosinolates inArabidopsis. Plant J. 33: 923–937.

Dubcovsky, J., Ramakrishna, W., SanMiguel, P.J., Busso, C.S.,Yan, L., Shiloff, B.A., and Bennetzen, J.L. 2001.Comparativesequence analysis of colinear barley and rice bacterial artificialchromosomes. Plant Physiol. 125: 1342–1353.

Ewing, B., and Green, P. 1998. Base-Calling of automated se-quencer traces using phred. II. Error probabilities. Genome Res.8: 186–194.

Fahey, J.W., Zhang, Y., and Talalay, P.1997. Broccoli sprouts: anexceptionally rich source of inducers of enzymes that protectagainst chemical carcinogens. Proc. Natl. Acad. Sci. USA, 94:10 367 – 72.

Frey, M., Huber, K., Park, W.J., Sicker, D., Lindberg, P., Meeley,R.B., Simmmons, C.R., Yalpani, N., and Gierl, A. 2003. A 2-oxoglutarate-dependent dioxygenase is integrated in DIMBOA-biosynthesis. Phytochem. 62: 371–376.

Gordon, D., Abajian, C., and Green, P. 1998. CONSED — a graph-ical tool for sequencing finishing. PCR Meth. Appl. 8: 195–202.

Halkier, B.A. 1999. Catalytic reactivities and structure/function re-lationships of cytochrome P450 enzymes. Review 113.Phytochemistry, 43: 1–21.

Hall, A., Kozma-Bognár, L., Tóth, R., Nagy, F., and Millar, A.J.2001. Conditional circadian regulation of PHYTOCHROME Agene expression. Plant Physiol. 127: 1808–1818.

Fig. 6. Neighbor-joining tree of AOP genes in B. oleracea andArabidopsis generated by Clustal X program. The Arabidopsissequences were obtained from GenBank. NM116541.1:A. thaliana, Columbia-0, AOP1, At4g03070; AF418264:A. thaliana, Tac, AOP1; AF418266: A. thaliana, wei-0, AOP1;AF418265: A. thaliana, Tsu-1, AOP1; AF418264: A. lyrata,AOP1; AF418252: A. halleri, AOP1; NM116540.3: A. thaliana,Columbia-0, AOP2, At4g03060, AF418223: A. thaliana, Tac,AOP2; AF418222: A. thaliana, Di-1, AOP2; AF417858:A. thaliana, Cvi, AOP2; AF418240: A. thaliana, Landsberg,AOP2; AF418239: A. lyrata, AOP2; NM116539.3: A. thaliana,Columbia-0, AOP3, At4g03050, AF418282: A. thaliana, Tac,AOP3; AF418275: A. thaliana,Da(1)-12, AOP3; AF418274:A. thaliana, Cvi, AOP3; AF418281: A. thaliana, Pi-0, AOP3;AF417859: A. thaliana, Landsberg, AOP3; AF418276:A. thaliana, Ei-2, AOP3. The maize sequence AF540907 wasused as the outgroup. It corresponds to an AOP homolog. Barscale represents number nucleotide substitution per site.

Page 14: Comparative analysis of a Brassica BAC clone containing several … · 2015-07-28 · Comparative analysis of a Brassica BAC clone containing several major aliphatic glucosinolate

Hall, A., E, Fiebig, A., and Preuss, D. 2002. Beyond theArabidopsis genome: opportunities for comparative genomics.Plant Physiol. 129: 1439–47.

Kliebenstein, D.J., Lambrix, V.M., Reichelt, J., Gershenzon, M.,and Mitchell-Olds, T. 2001. Gene duplication in the diversifica-tion of secondary metabolism: tandem 2-oxoglutarate-dependentdioxygenases control glucosinolate biosynthesis in Arabidopsis.Plant Cell, 13: 681–693.

Koch, M., Haubold, B., and Mitchell-Olds, T. 2001. Molecular sys-tematics of the Brassicaceae: evidence from coding plastidicmatK and nuclear Chs sequences. Am. J. Bot. 88: 534–544.

Kowalski, S.P., Lan, T.-H., Feldmann, K.A., and Paterson, A.H.1994. Comparative mapping of Arabidopsis thaliana and Bras-sica oleracea chromosomes reveals islands of conserved organi-zation. Genetics, 138: 499–510.

Kroymann, J., Textor, S., Tokuhisa, J.G., Falk, K.L., and Bartram,S. 2001. A gene controlling variation in Arabidopsisglucosinolate composition is part of the methionine chain elon-gation pathway. Plant Physiol. 127: 1077–1088.

Lagercrantz, U. 1998. Comparative mapping between Arabidopsisthaliana and Brassica nigra indicates that Brassica genomeshave evolved through extensive genome replication accompa-nied by chromosome fusions and frequent rearrangements. Ge-netics, 150: 1217–1228.

Lan, T.-H., DelMonte, T.A., Reischmann, K.P., Hyman, J.,Kowalski, S.P., McFerson, J., Kresovich, S., and Paterson, A.H.2000. An EST-enriched comparative map of Brassica oleraceaand Arabidopsis thaliana. Genome Res. 10: 776–788.

Li, G., and Quiros, C.F. 2003. In planta side-chain glucosinolatemodification in Arabidopsis by introduction of dioxygenaseBrassica homolog BoGSL-ALK. Theor. Appl. Genet. 106: 1116–1121.

Li, G., Gao, M., Yang, B., and Quiros, C.F. 2003. Gene to genealignment between the Brassica and Arabidopsis genomes bytranscriptional mapping. Theor. Appl. Genet. 107: 168–180.

Lukens, L., Zou, F., Parkin, I., Lydiate, D., and Osborn, T. 2003.Comparison of the Brassica oleracea genetic map with theArabidopsis thaliana physical map. Genetics, 164: 359–372.

McCombie, W.R., de la Bastide, M., Spiegel, L., Preston, R.,Ferraro, K., et al. 2002. Genomic sequence for Brassicaoleracea, clone B21H13, complete sequence. NCBI submissionAC122543. Genomic sequence gi:21166171. Available fromhttp://www.ncbi.nlm.nih.gov/

Nastruzzi, C., Cortesi, R., Esposito, E., Menegatti, E., Leoni, O.,Iori, R., and Palmieri, S. 1996. In vitro cytotoxic activity of

some glucosinolate-derived products generated by myrosinasehydrolysis. J. Agric. Food Chem. 44: 1014–1021.

O’Neill, C.M., and Bancroft, I. 2000. Comparative physical map-ping of segments of the genome of Brassica oleracea var.alboglabra that are homoeologous to sequenced regions of chro-mosomes 4 and 5 of Arabidopsis thaliana. Plant J. 23: 233–243.

Page, R.D.M. 1996. TREEVIEW: an application to display phylo-genetic trees on personal computers. Comput. Appl. Biosci. 12:357–358.

Quiros, C.F., Grellet, F., Sadowski, J., Suzuki, T., Li, G., andWroblewski, T. 2001. Arabidopsis and Brassica comparativegenomics: sequence, structure and gene content in the ABI1-Rps2-Ck1 chromosomal segment and related regions. Genetics,157: 1321–1330.

Rask, L., Andreasson, E., Ekbom, B., Eriksson, S., Pontoppidan,B., and Meijer, J. 2000. Myrosinase: gene family evolution andherbivore defense in Brassicaceae. Plant Mol. Biol. 42: 93–113.

Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajan-dream, M.A., and Barrell, B. 2000. Artemis: sequence visualiza-tion and annotation. Bioinformatics, 16: 944–945.

Ryder, C.D., Smith, L.B., Teakle, G.R., and King, G.J. 2001. Con-trasting genome organization: two regions of the Brassicaoleracea genome compared with collinear regions of theArabidopsis thaliana genome. Genome, 44: 808–817.

Salzberg, S.L., Pertea, M., Delcher, A.L., Gardner, M.J., andTettelin, H. 1999. Interpolated Markov models for eukaryoticgene finding. Genomics, 59: 24–31.

The Rice Chromosome 10 Sequencing Consortium. 2003. In-depthview of structure, activity, and evolution of rice chromosome 10.Science (Washington, D.C.), 300: 1566–69.

Thompson, J.D., Gibson, T.J., Plewnial, F., Jeanmougin, F., andHiggons, D.G. 1997. The CLUSTAL X windows interface: flex-ible strategies for multiple sequence alignment aided by qualityanalysis tools. Nucleic Acids Res. 24: 4876–82.

Yang, Y.W., Lai, K.N., Tai, P.Y., and Li, W.H. 1999. Rates of nu-cleotide substitution in angiosperm mitochondrial DNA se-quences and dates of divergence between Brassica and the otherangiosperm lineages. J. Mol. Evol. 48: 597–604.

Zhang, M.Q. 1997. Identification of protein coding regions in thehuman genome based on quadratic discriminant Analysis. Proc.Natl. Acad. Sci. USA, 94: 565–568.

Zhang, Y., Wade, K.L., Prestera, T., and Talalay, P. 1996. Quantita-tive determination of isothiocyanates, dithiocarbamates, carbondisulfide, and related thiocarbonyl compounds by cyclo-condensation with 1,2-benzenedithiol. Anal. Biochem. 239:160–167.

© 2004 NRC Canada

Gao et al. 679