profiling the developing jatropha curcas l. seed transcriptome by pyrosequencing
TRANSCRIPT
Profiling the Developing Jatropha curcas L. SeedTranscriptome by Pyrosequencing
Andrew J King & Yi Li & Ian A Graham
Published online: 28 January 2011# Springer Science+Business Media, LLC. 2011
Abstract Jatropha curcas L. has received much attentionrecently as a potential oilseed crop for the production ofrenewable oil. Despite the interest in this crop, relativelylittle is known on the molecular biology of this speciescompared with more established oilseed crops. To gain amore detailed understanding of the processes involved indeposition of oil and protein within Jatropha seeds, weconducted high-throughput sequencing analysis of thetranscriptome of developing J. curcas seeds using 454sequencing. A single sequencing run yielded 195,692sequences (46 Mbp) of raw sequence data. Assembly ofthis sequence data produced 12,419 contigs and 17,333singletons. BLASTX searches of the contigs revealed thatstorage proteins were the most abundant transcripts.Oleosins, ribosomal proteins, metallothioneins and lateembryogenesis abundant proteins were also highly repre-sented. Curcin, a type-I ribosome-inactivating protein,accounted for 0.7% of the transcriptome. No transcriptsfor type-II ribosome-inactivating proteins were found,suggesting that these are not present in the seeds of J.curcas. To test the power of 454 sequencing compared toconventional gene sequencing as a tool for gene discovery,a search of the homologues for genes involved in the
conversion of sucrose to triacylglycerol was conducted.Hits for all the known genes in this process were obtained.Pyrosequencing of the J. curcas developing seed tran-scriptome has provided a valuable increase in the amount ofsequence data currently available for this species. Thesequence data will be of great use to those engaged in J.curcas research and crop improvement.
Keywords Biodiesel . Expressed sequence tags . Jatrophacurcas . Developing seeds
Introduction
In recent years, much interest has been generated in thepotential of Jatropha curcas L. as a perennial oilseed crop forcultivation in tropical and sub-tropical regions. Many planta-tions are now being established in Asia, Africa and LatinAmerica for the production of biodiesel, and J. curcas istherefore likely to become an increasingly important oilseedcrop over the next decade [1, 2]. J. curcas also has a historyof cultivation under a number of less intensive agriculturalsystems. It can be used as a fencing crop, as a shade crop forplants such as vanilla or for controlling soil erosion [3].
Despite the interest in this species, relatively little ispresently known about the molecular biology of the plantcompared with other crop species [2]. As use of J. curcas asan industrial scale crop is only a recent development, thereis considerable scope for the agronomic improvement of thespecies through plant breeding and biotechnology.
Expressed sequence tag (EST) databases are a valuable toolfor sampling the transcriptome of a particular organism ortissue and providing insight into the biological processes. ThecDNA data generated by such studies can be used in furthergene expression studies such as microarrays or qPCR.
Electronic supplementary material The online version of this article(doi:10.1007/s12155-011-9114-x) contains supplementary material,which is available to authorized users.
A. J. King :Y. Li : I. A. Graham (*)Centre for Novel Agricultural Products, Department of Biology,University of York,Heslington York YO10 5DD, UKe-mail: [email protected]
A. J. Kinge-mail: [email protected]
Y. Lie-mail: [email protected]
Bioenerg. Res. (2011) 4:211–221DOI 10.1007/s12155-011-9114-x
Additionally, cDNA sequences are useful resources for theidentification of coding regions in genomic DNA. A numberof EST databases have been established for developing seeds,including the model species Arabidopsis thaliana [4], andcrops such as castor [5, 6] and sesame [7], and speciesproducing unusual fatty acids such as Momordica chariantaand Impatiens balsamina [8]. These EST databases havebeen obtained using conventional dye-terminator sequencingand contain between 743 [6] and 10,522 [4] single passreads. Dye-terminator sequencing has also recently beenused to produce 7,320 and 5,929 ESTs from developing andgerminating seeds of J. curcas, respectively [9]. However,obtaining further depth of sequence data using dye-terminator sequencing is prohibitively expensive.
In recent years, both throughput and speed of tran-scriptome sequencing projects has improved vastly throughthe use of new sequencing technologies such as 454pyrosequencing [10–12]. To gain further insight into theseed biology of J. curcas, we constructed an EST databaseusing 454 sequencing. We compare the effectiveness of thetwo sequencing approaches for identifying genes involvedin key metabolic processes such as lipid biosynthesis.
Methods
Lipid Analysis
Seeds were collected from manually pollinated J. curcasplants. The mass of the seeds were recorded, and the seedswere then lyophilized. The lyophilized material was groundto a fine powder and FAMES analysis was performed asdescribed previously [13].
RNA Extraction, cDNA Synthesis and 454 Sequencing
RNA was extracted from developing seeds of J. curcasusing a CTAB/lithium acetate procedure [14]. cDNA wasthen synthesised using the Dualsystems Biotech EasyClonecDNA library construction kit (Schlieren, Switzerland).cDNA amplification was performed using 16 cycles oflong-distance PCR. Primers were removed using anInvitrogen cDNA size fractionation columns (Invitrogen,Carlsbad, CA, USA). Five micrograms of cDNA from threedifferent developmental stages were then pooled and sent toCogenics (Meylan Cedex, France) for 454 sequencingusing the GLS-FLX platform.
Sequence Assembly, Analysis and BLAST Searching
The raw sequence reads were stripped off the primersequences with custom Perl scripts. The high qualitysequences which were longer than 40 nucleotides and
contained less than 3% unknown (N) residues were furtherselected and subsequently assembled into contiguoussequences using CAP3 DNA sequence assembly programwith default parameters [15]. The assembled contigs wereannotated locally using a BLAST 2.0 search [16] of theNCBI non-redundant peptide database with the BLASTXalgorithm. To identify specific sequences relating to genesinvolved in lipid biosynthesis, we also conductedTBLASTN searches of the J. curcas transcriptome datasetusing peptide sequences corresponding to genes listed onthe Arabidopsis Lipid Gene Database [17]. To estimate thenumber of sequences containing full-length clones, startcodons, and stops codons, 100 sequences were selected atrandom and the sequence alignments obtained usingBLASTX were compared. This analysis was performed intriplicate, and values are reported as the mean±SEM.
Results and Discussion
454 Sequencing of Developing J. curcas Seeds
The seeds of J. curcas are endospermic—i.e. the bulk of thestorage reserves are deposited within the endosperm ratherthan the embryo. In castor, another endospermic seed of theEuphorbioaceae, endosperm development has a develop-ment pattern where there is an initial free-nuclear stagewhich progresses to cellularization and maturation. After aninitial phase in which seeds grow rapidly to full-size, lipiddeposition begins during the cellularization stage where theendosperm becomes distinct [18]. In order to selectdeveloping stages in which oil deposition is occurring, weanalysed the lipid content of J. curcas seeds at variousintervals after pollination. Total lipid content was deter-mined and compared to that of mature seeds (SupplementaryFigure 1). The analysis indicated that seed development in J.curcas follows a similar pattern to seed development inRicinus communis, with oil deposition occurring after theseeds become full-sized. We selected stages at 58, 63 and70 days after pollination for the 454 sequencing library.cDNAwas prepared from mRNA extracted from developingseeds of these three developmental stages and pooled inequal amounts. A single half-run on the GLS-FLX sequenceryielded 195,692 sequences with an average length of 234 bp(46 Mbp). After trimming and removal of low quality reads,187,314 sequences with an average length of 220 bp(41 Mbp) were assembled using CAP3 program to produce12,419 contigs and 17,333 singletons (29,752 uniquesequences). Most of the contigs were composed of relativelyfew sequences (median=3, mean=13.7). The mean andmedian contig lengths (excluding singletons) were found tobe 401 and 322 bp, respectively (Fig. 1a and b). The shortcontig length, and relatively small number of EST per
212 Bioenerg. Res. (2011) 4:211–221
contigs, and the large number of singleton sequencesindicates that the transcriptome sampling had not reachedsaturation. Therefore, many genes are likely to be repre-sented by more than one contig or singleton. A BLASTXsearch was performed on the consensus sequence of theassembled contigs against the NCBI non-redundant peptidedatabase. A BLASTX hit with an E value less than 10−10
was obtained for 6,942 (56%) of the 12,419 contigs(Supplementary Table 1). A diagrammatic representation ofthe most abundantly represented classes is shown in Fig. 2,with a summary of the 50 most abundantly represented genesshown in Table 1. As anticipated from other developing seedEST databases [4–7], storage proteins are the most abun-dantly expressed genes in the developing seeds of J. curcas,accounting for 24% of the transcriptome, and the six mostabundantly represented sequences were all storage proteins.Ribosomal proteins were the next abundantly expressedtranscripts, accounting for 4.3% of the transcriptome.Oleosins account for 2.8% of the transcriptome. Transcriptswere detected for five oleosin genes, three of which havepreviously been deposited within GenBank (SupplementaryTable 2). Other abundantly represented sequences includemetallothioneins (1.7%) and late embryogenesis/seed matu-ration related proteins (0.7 %). Analysis of a random subsetof the contigs revealed that 26.0±1.0% contained a startcodon and 34.7±1.9% contained a stop codon. Full-length coding regions were detected in 15.0±2.3% of thecontigs.
Ribosome-Inactivating Proteins of J. curcas
A single isoform of curcin [19], a type-I ribosome-inactivating protein, was present in the EST database asthe tenth most abundantly represented sequence (GenBankTSA accession EZ417711), accounting for 0.7% of the totaltranscriptome. Transcripts for curcin 2 [20] were not
present. Unlike ricin from castor, the type-I RIPs of J.curcas lack a lectin cell binding domain and are thusthought to be only mildly cytotoxic upon ingestion [21, 22].Although curcins have historically been compared to ricin,it is not known whether the seeds of J. curcas do in factcontain type-II RIPs [2]. As there are currently plans tocultivate millions of hectares of land with J. curcas,presence of a type-II RIP within the seeds could haveserious consequences in instances of accidental ingestion ordeliberate poisoning [23]. As few as two castor beans cancause death in humans if ingested orally, and around 8%clinically reported cases of accidental castor bean ingestioncases are fatal [23, 24], especially where access to medicalhelp is limited. To determine whether the seeds of J. curcasmay contain a type-II RIP, we performed a TBLASTNsearch using the ricin precursor from R. communis. Notype-II RIPs were found. Although this does not confirmthe absence of type-II RIPs in J. curcas, the lack of anydetectable transcripts in the seed EST database stronglysuggests that seeds do not contain such proteins. Theabsence of type-II RIPs is therefore a positive factor for thedevelopment of this species as a crop.
Storage Lipid Biosynthesis
In oilseeds, sucrose is converted into triacylglycerol via aseries of compartmentalised reactions. Sucrose is firstconverted into pyruvate through glycolysis. This occurs inboth the cytosol and plastids, as both glucose-6-phosphateand phosphoenol pyruvate can be imported into the plastid.Starch granules in the plastid may also be converted intopyruvate via the plastidial glycolytic pathway. Pyruvate isthen converted into acetyl-CoA. De novo fatty acidbiosynthesis then occurs in the plastid via the elongationsystem which involves three different ketoacyl synthases.Saturated and monounsaturated fatty acids are then
103 37
150
8227
9515
5873
352
037
724
918
713
210
148 41 42 26 22 13 19
0
1000
2000
3000
4000
5000
6000
<101
201-
300
401-
500
601-
700
801-
900
1001
-110
0
1201
-130
0
1401
-150
0
1601
-170
0
>180
0
Contig length (bp)
No
. of
con
tig
s
4736
2258
1290
826
554
388
273
223
158
782
524
225
123
48 110
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
2 3 4 5 6 7 8 9 1011
-20
21-5
051
-100
101-
200
201-
1000
>100
0
Sequences per contig
No
. of
con
tig
s
a bFig. 1 Distribution of contiglengths (bp) and sequences percontig. a Size ranges of the12,419 contigs in bp. b Thenumber of individual sequencesreads per contig
Bioenerg. Res. (2011) 4:211–221 213
exported from the plastid (i.e. palmitate (16:0), palmitoleate(16:1), stearate (18:0) and oleate (18:1)) after being hydro-lysed by acyl–acyl carrier protein (ACP) thioesterases [25]and subsequently converted to coenzyme-A (CoA) estersby plastidial long-chain CoA synthetases [26] The acyl-CoAs then serve as donors for glycerolipid biosynthesis viaa series of reactions in the endoplasmic reticulum involvingthe Kennedy pathway [27]. Although the processes involv-ing the conversion of sucrose to triacylglycerol are not fullyunderstood, gene sequences for many of the enzymesrequired have been identified [17]. We searched the J.curcas EST database for all the nuclear encoded genesinvolved in the conversion of sucrose to TAG (Table 2 andFig. 3). Homologues corresponding to all A. thaliana genesknown to be involved in these processes were present.Transcripts corresponding to the cytosolic glycolysispathway were particularly abundant, with glyceraldehyde-3-phosphate dehydrogenase, aldolase and phosphoglyceratekinase all accounting for ≥0.1% of the transcriptome.Transcripts for the plastidial glycolytic pathway were lessabundant, and representatives for all enzymes in the plastidialoxidative pentose phosphate pathway were also presentsuggesting that both are operational in developing J. curcasseeds. ESTs corresponding to all steps involved in de novofatty acid biosynthesis and export from the plastid werepresent. Transcripts for enzymes of the Kennedy pathwaywere much less abundant than those of the glycolysis and denovo fatty acid biosynthesis pathways. Only three ESTs werepresent for the previously cloned diacylglycerol acyltransfer-ase I (GenBank accession DQ278448). One EST was alsoidentified for a putative type-2 diacylglycerol acyltransferase[28]. The relatively low level of ESTs for the final steps oftriglyceride biosynthesis has previously been observed in
other seed EST databases. Only three transcripts for DGATswere found in the developing seed database of A. thaliana[4] whilst no DGAT1 or DGAT2 transcripts have beenreported in the EST databases of R. communis [5, 6].Although the relative lack of DGAT transcripts in developingseeds of J. curcas is quite surprising, it should be noted thatDGAT1 proteins levels are known to be post-transcriptionally regulated in castor [29]. Triacylglycerolcan also be formed by the transacylation of diacylglycerolby phospholipids [30]. Six ESTs were detected for a putativePDAT.
In addition to the previously identified oleate desaturaseof J. curcas (GenBank accession ABA41034/EZ409947),we identified transcripts for a second oleate desaturase(EZ414061). In total, 161 ESTs were detected for oleatedesaturase, but only seven corresponded to linoleatedesaturase. This is consistent with the low concentrationsof linolenic acid (typically <0.5 %) that has been reportedin J. curcas oil [2].
Phorbol Ester Biosynthesis
Although there is considerable interest in the use of J.curcas seeds as a source of renewable oil, the seed mealfrom this species contain phorbol-esters which limits its useas an animal feed [2]. Phorbol-esters are tigliane diterpe-noids, and the first committed step in the biosynthesis ofthe tigliane is likely to be the conversion of a 20-carbonisoprenoid diphosphate, geranylgeranyl diphosphate(GGPP), into a macrocyclic diterpenoid by a terpenesynthase. The isoprenoid precursors may be provided byeither the cytosolic mevalonate pathway or the plastidialmethylerythritol phosphate (MEP) pathway [31]. Analysisof the J. curcas EST database revealed that transcripts werepresent for all steps of both these pathways (SupplementaryTable 3). However, relatively few transcripts were detected,with between 1 and 10 ESTs for each step in the MEPpathway. Although it is not clear whether phorbol-esters areproducts of the mevalonate or MEP pathway, it is likely thatthe MEP pathway is involved. So far, all known plantditerpene synthases involved in the biosynthesis of second-ary metabolites are located in the plastid [32–34]. Inaddition to being a precursor for a diverse range ofsecondary metabolites, GGPP is a precursor for a range ofother compounds in plants, including chlorophylls, preny-lated proteins and gibberellins. The GGPP precursors usedfor these compounds are primarily derived from the MEPpathway [35, 36]. It should be noted however that A.thaliana contains GGPP synthases located in variouscompartments in the cell, including the plastid, mitochondriaand ER [37]. The J. curcas seed EST database containedonly one EST corresponding to geranylgeranyl pyrophos-phate synthase (Supplementary Table 3), which appeared to
Fig. 2 Diagrammatic representation of the most abundantly expressedgene sequences in the J. curcas EST database. Functional classificationof EST sequences the J. curcas seed transcriptome based on BLASTXannotations
214 Bioenerg. Res. (2011) 4:211–221
Table 1 Fifty most abundantly represented transcripts in developing seeds of J. curcas
Annotation Organism E value GenBank ESTs %
1 Legumin-like protein Ricinus communis 1e-179 AAF73007 9,341 5.0
2 11S globulin seed storage protein 2 precursor Sesamum indicum 1e-112 Q9XHP0 8,542 4.6
3 11S globulin precursor isoform 2B Ficus pumila 1e-135 ABK80753 8,269 4.4
4 2s albumin precursor Ricinus communis 2e-25 P01089 7,756 4.1
5 2s albumin precursor Ricinus communis 3e-16 P01089 4,944 2.6
6 2s albumin precursor Ricinus communis 9e-17 P01089 3,672 2.0
7 Oleosin Corylus avellana 2e-41 AAO65960 3,247 1.7
8 Metallothionein-like protein Gossypium hirsutum 2e-18 AAV74186 2,668 1.4
9 BURP Medicago truncatula 9e-64 ABE82234 2,032 1.1
10 Curcin precursor Jatropha curcas 1e-162 AAL58089 1,336 0.7
11 Protease inhibitor/seed storage/lipid transfer protein Arabidopsis thaliana 1e-12 NP_194817 1,250 0.7
12 Oleosin 1 Camellia oleifera 4e-43 ABF57559 988 0.5
13 Late embryogenesis abundant protein D-7 Gossypium hirsutum 1e-8 P13939 982 0.5
14 Oleosin Ricinus communis 4e-38 AAR15171 904 0.5
15 Protease inhibitor/seed storage/lipid transfer protein Arabidopsis thaliana 3e-8 NP_194817 827 0.4
16 Vicilin-like protein Anacardium occidentale 1e-138 AAM73730 639 0.3
17 2S albumin precursor Ricinus communis 2e-15 P01089 575 0.3
Litchi chinensis 0.0 ABF00115 571 0.3
19 2S albumin precursor Ricinus communis 2e-15 P01089 532 0.3
20 Protease inhibitor/seed storage/lipid transfer protein Arabidopsis thaliana 3e-22 NP_188456 514 0.3
21 Glutaredoxin Vernicia fordii 4e-40 O81187 489 0.3
22 Hypothetical protein Homo sapiens 1e-5 XP_001714526 479 0.3
23 Translationally controlled tumour protein homolog Hevea brasiliensis 6e-73 Q9ZSW9 461 0.2
24 Hypothetical protein Medicago truncatula 2e-60 ABE77875 441 0.2
25 No significant homology N/A N/A N/A 439 0.2
26 Metallothionein-like protein type 3 Carica papaya 2e-26 Q96386 420 0.2
27 Thiazole biosynthetic enzyme Citrus sinensis 1e-146 O23787 397 0.2
28 Fructose-bisphosphate aldolase-like protein Solanum tuberosum 1e-179 ABC01905 393 0.2
29 Cationic peroxidase Nelumbo nucifera 1e-162 ABN46984 374 0.2
30 Metallothionein-1 like protein Oenanthe javanica 6e-6 AAB70560 371 0.2
31 No significant homology N/A N/A N/A 357 0.2
32 Alpha tubulin 1 Pseudotsuga menziesii 0.0 AAV92352 352 0.2
33= Cystatin-like protein Arabidopsis thaliana 2e-27 AAM64661 348 0.2
33= LEA protein in group 3 Arabidopsis thaliana 1e-30 BAA11017 348 0.2
35 Foot protein 1 variant 3 Perna canaliculus 2e-11 AAY29135 326 0.2
36 β-tonoplast intrinsic protein Arabidopsis thaliana 2e-99 NP_173223 314 0.2
37 Aquaporin Ricinus communis 1e-125 CAE53881 307 0.2
38 Calmodulin 4 Daucus carota 3e-79 AAQ63461 305 0.2
39 Gamma-thionin Phaseolus vulgaris 5e-21 CAL68581 279 0.1
40 Annexin-like protein RJ4 Fragaria x ananassa 1e-118 P51074 274 0.1
41 Glyceraldehyde 3-phosphate dehydrogenase Daucus carota 1e-165 AAR84410 267 0.1
42 S-adenosylmethionine synthetase 1 Catharanthus roseus 0.0 Q96551 261 0.1
43 Thioredoxin H-type (TRX-H) Ricinus communis 1e-47 Q43636 257 0.1
44 Foot protein-4 variant-1 Mytilus californianus 4e-19 ABC84184 245 0.1
45 BURP domain-containing protein Bruguiera gymnorrhiza 8e-61 BAB60849 239 0.1
46 Protein disulfide-isomerase precursor Ricinus communis 0.0 Q43116 235 0.1
47 Unknown protein Arabidopsis thaliana 2e-75 NP_199381 234 0.1
48= Unknown protein Arabidopsis thaliana 3e-75 NP_199381 231 0.1
48= NAD-dependent malate dehydrogenase Prunus persica 1e-176 AAL11502 231 0.1
48= Cold acclimation-induced protein Morus mongolica 3e-7 AAZ82815 231 0.1
GenBank annotation obtained from a BLASTX search of the non-redundant GenBank database. In most instances, the GenBank entry with thelowest E value is shown. In instances where GenBank entries lack an informative annotation, a more informative entry with a similar E value isshown
Bioenerg. Res. (2011) 4:211–221 215
18 Elongation factor 1α
Tab
le2
ESTscorrespo
ndingof
enzymes
invo
lved
intheconv
ersion
ofsucroseinto
triacylglycerolin
developing
seedsof
J.curcas
Conversionof
sugars
into
triacylglycerol
Step
Protein
Arabido
psisthaliana
Jatrop
hacurcas
contigs
Hits
%
Plasm
amem
brane—
sucrosetransport
1Sucrose
transporter,plasmamem
brane
At1g22710,At1g09960
EZ408931
(3),1singleton
40.002
Cytosol—
glycolysis
2aSucrose
synthase
At4g0
2280,At3g4
3190
,At5g2
0830
EZ4127
30(132
)EZ409118
(79),EZ4184
93(14),
EZ40
8538
(7),EZ4125
60(3),EZ41
7361
(3),
EZ4115
75(3),3sing
letons
244
0.130
2bNeutral
invertase
At4g34860,At1g56560,At4g09510,
At5g2
2510
EZ4115
76(22),EZ41
4627
(3),EZ40
9268
(3),
EZ4113
25(2),6sing
letons
360.019
3Fructokinase
At3g5
9480,At5g5
1830
EZ4094
42(45),EZ41
9234
(4),EZ4183
98(4),
EZ40
8118
(3),EZ41
7644
(2)
580.031
4UDP-glucose
pyrophosphatase
At5g17310
EZ409644
(46),EZ416162
(9),2singletons
570.030
5Pho
sphoglucom
utase
At1g7
0730
EZ4150
46(59),1sing
leton
600.032
6Phosphoglucoseisom
erase
At5g42740
EZ417899
(12),EZ417840
(3),2singletons
170.009
7aPho
sphofructokinase,PPidependent,α-sub
unit
At1g2
0950
EZ4148
61(20),EZ41
6216
(12),1sing
leton
330.018
7bPho
sphofructokinase,PPidependent,β-sub
unit
At1g1
2000
EZ4174
66(22),EZ40
8944
(9),1sing
leton
320.017
8Fructose-1,6-bispho
sphatase
At1g4
3670
EZ4098
29(7),EZ40
9223
(3),1sing
leton
110.006
9Pho
sphofructokinase,ATPdepend
ent
At4g2
6270,At5g4
7810
,At4g2
9220
EZ4132
82(10),EZ41
9322
(2),EZ4138
09(2),
2sing
letons
160.008
10Ado
lase
At2g3
6460
EZ4170
77(393
),3sing
letons
396
0.211
11Triosephosphate
isom
erase
At3g55440
EZ411669
(84),EZ419272
(8)
920.049
12Glyceraldehyd
e-3-ph
osph
atedehy
drog
enase
At3g0
4120,At1g1
3440
EZ4188
03(267
),EZ4141
74(166),EZ40
8884
(5),
EZ415041
(4),6singletons
448
0.260
13Pho
sphoglyceratekinase
At1g7
9550
EZ4150
03(200
)20
00.107
14Pho
sphoglycerom
utase
At1g0
9780,At3g0
8590
,EZ4156
34(77),EZ41
7060
(7)
840.045
15Eno
lase
At2g3
6530,At2g2
9560
EZ4088
79(154
),EZ4146
91(16),EZ4137
51(6),
EZ407590
(5),3singletons
184
0.098
16Pyruv
atekinase
At3g5
2990,At5g6
3680
,At5g5
6350
EZ4168
04(69),EZ41
5965
(66),EZ4175
71(25),
EZ41
4131
(10),EZ4083
15(4),EZ41
2706
(3),
EZ41
7199
(2),EZ4101
36(2),1sing
leton
182
0.097
Plastidialmem
brane—
hexose
andtriose
translocation
17Glucose-6-phosphate
translocator
At5g54800,At1g61800
EZ414407
(20),EZ407316
(8),EZ412437
(5),
2sing
letons
350.018
18Triosephosphate
translocator
At5g46110
EZ409314
(4),EZ413060
(2),2singletons
80.004
19Pho
sphoenolpy
ruvate
translocator
At5g3
3320,At3g0
1550
EZ4187
77(16),EZ41
6154
(4)
200.010
20Pyruv
atetranslocator
?N/A
?
Plastid–starch
metabolism
21Pho
sphoglucom
utase
At5g5
1820
EZ4114
16(11),1sing
leton
120.006
22a
ADP-glucose
pyrophosphorylase,
largesubunit
At4g39210,At1g27680
2singletons
20.001
216 Bioenerg. Res. (2011) 4:211–221
Tab
le2
(con
tinued)
Conversionof
sugars
into
triacylglycerol
Step
Protein
Arabido
psisthaliana
Jatrop
hacurcas
contigs
Hits
%
22b
ADP-glucose
pyroph
osph
orylase,
smallsubunit
At5g4
8300
EZ4097
31(4)
40.00
2
23a
Starchsynthase,soluble
At4g18240,At5g24300
EZ418005
(8),EZ417219
(3),2singletons
130.007
23b
Starchsynthase,granulebound
At1g32900
1singleton
1<0.001
24Starchbranchingenzyme
At2g36390,At5g03650
EZ412460
(11),EZ409986
(5),EZ408038
(3),
EZ417876
(2),3singletons
240.01
2
25Alpha-glucanphosphorylase
At3g46970,At3g29320
EZ417617
(4),EZ408368
(3),EZ413639
(2),
7sing
letons
160.00
9
26Isoamylase-type
debranchingenzyme
At2g39930,At4g09020,At1g03310
EZ414064
(4),EZ410452
(2),EZ415438
(2),
EZ418838
(2),EZ414406
(2),2singletons
140.00
7
27Hexokinase
At4g2
9130,At1g4
7840
EZ4098
40(8),EZ4115
68(3),EZ41
7173
(2),
2sing
letons
150.00
8
Plastid—
glycolysis
28Pho
sphoglucoseisom
erase
At4g2
4620
EZ4184
19(14),EZ4119
67(5),EZ41
6342
(4),
1sing
leton
240.01
3
29Phosphofructokinase,ATPdependent
At5g61580
EZ409668
(3)
30.002
30Fructose-1,6-bispho
sphatase
At3g5
4050
EZ4073
09(3)
30.00
2
31Aldolase
At2g0
1140
,At4g3
8970
EZ4127
57(41),EZ41
0272
(17),EZ4089
25(4),
1sing
leton
630.03
4
32Triosephosphate
isom
erase
At2g21170
EZ414807
(9),EZ414069
(2),1singleton
120.006
33Glyceraldehyde-3-phosphatedehydrogenase
At1g16300
EZ411341
(88)
880.047
34Pho
sphoglyceratekinase
At3g1
2780
EZ4120
71(4),EZ41
2862
(2)
60.00
3
35Pho
sphoglycerom
utase
?N/A
?
36Eno
lase
At1g7
4030
EZ4102
77(5),EZ40
8005
(2),EZ4177
05(2)
90.00
5
37Pyruv
atekinase
At3g2
2960,At5g5
2920
,At1g3
2440
EZ4165
70(42),EZ4119
75(10),EZ41
2722
(3),
EZ41
7791
(2),EZ4146
23(2),2sing
letons
610.03
3
Plastid
–ox
idativepentoseph
osph
atepathway
38Glucose-6-pho
sphate
1-dehy
drog
enase
At5g4
0760,At1g0
9420
,At3g2
7300
EZ4136
26(3),EZ41
5974
(3),EZ4093
37(2),
EZ41
5084
(2),EZ4186
36(2),3sing
letons
150.00
8
396-ph
osphog
lucolactonase
At5g2
4400
EZ4147
57(49)
490.02
6
40Gluconate-6-phosphate
dehydrogenase
At1g64190,At3g02360
EZ418999
(48),EZ418049
(19),EZ418763
(17),
EZ41
7122
(4),EZ4097
09(3),EZ40
7558
(2),
2sing
letons
950.05
1
38Glucose-6-pho
sphate
1-dehy
drog
enase
At5g4
0760,At1g0
9420
,At3g2
7300
EZ4136
26(3),EZ41
5974
(3),EZ4093
37(2),
EZ41
5084
(2),EZ4186
36(2),3sing
letons
150.00
8
41Ribose-5-ph
osph
ateisom
erase
At3g0
4790
EZ4082
97(8),EZ40
9515
(7),EZ4090
93(2)
170.00
9
42Ribulose-5-phosphate-3-epim
erase
At5g61410
EZ417906
(5),EZ409987
(2)
70.004
43Transketolase
At2g4
5290,At3g6
0750
EZ4146
96(7),EZ41
9056
(2),EZ4112
57(2),
EZ41
7305
(2),1sing
leton
140.00
7
44Transaldolase
At1g1
2230
EZ4170
76(25),EZ40
7732
(7)
320.01
7
Bioenerg. Res. (2011) 4:211–221 217
Tab
le2
(con
tinued)
Conversionof
sugars
into
triacylglycerol
Step
Protein
Arabido
psisthaliana
Jatrop
hacurcas
contigs
Hits
%
Plastid
–fatty
acid
biosynthesis
45a
Pyruv
atedehy
drog
enaseE1α
At1g0
1090
EZ4161
72(22),EZ41
8184
(14)
360.019
45b
Pyruv
atedehy
drog
enaseE1β
At2g3
4590
EZ4075
13(46)
460.025
45c
Pyruv
atedehy
drog
enaseE2
At3g2
5860,At1g3
4430
EZ4116
66(9),EZ4193
64(8),EZ40
9742
(6),
EZ40
9692
(6),EZ4119
24(5),EZ4110
67(3),
3sing
letons
400.021
46a
Acetyl-CoA
carboxylase,
biotin
carboxyl
carrierproteinsubunit(CAC1a)
At5g1
6390
EZ4074
02(42),EZ40
7836
(8),1sing
leton
510.027
46b
Acetyl-CoA
carboxylase,
biotin
carboxylasesubunit
(CAC2)
At5g3
5360
EZ4097
08(9),EZ41
4205
(3),1sing
leton
130.007
46c
Acetyl-CoA
carboxylase,
caroboxyltransferasesubunit
(CAC3)
At2g3
8040
EZ4159
13(6),EZ4115
29(4),EZ41
8040
(2),1
sing
leton
130.007
47Ketoacyl-ACPsynthase
At1g6
2640
EZ4085
51(5),EZ41
4997
(2),3sing
letons
100.005
48Ketoacyl-ACPreductase
At1g24360
EZ414660
(34)
340.018
493-Hydroxyacyl-A
CPdehydratase
At5g10160,At2g22230
EZ415979
(31)
310.017
50Enoyl-A
CPreductase
At2g05990
EZ416240
(30),EZ416721
(3)
330.018
51Ketoacyl-ACPsynthase
IAt5g4
6290
EZ4140
40(30),EZ40
9421
(9),EZ4108
62(7),
EZ41
8649
(4),EZ4108
15(4)
540.029
52Ketoacyl-ACPsynthase
IIAt1g7
4960
EZ4194
30(15),EZ41
2129
(7),EZ4184
91(5),
1sing
leton
280.015
53Steroyl-A
CPdesaturase
At3g0
2630,At2g4
3710
EZ4182
73(56),EZ41
2266
(44),EZ4155
69(22),
EZ41
6762
(21),EZ4189
00(6),3sing
letons
152
0.081
54Acyl-ACPthioesterase
At4g1
3050,At1g0
8510
EZ4194
62(73),EZ41
8948
(4),EZ4143
85(2),
1sing
leton
800.043
55Acyl-CoA
synthetase
(plastid
outerenvelope)
At1g7
7590,At2g0
4350
EZ4175
98(4),EZ40
8632
(3),3sing
letons
100.005
Endoplasm
icreticulum
—Kennedy
pathway
56Glycerol-3-phosphateacyltransferase
At3g11430,At4g00400,At5g06090
EZ417873
(2),2singletons
40.002
57Lysophosphatid
icacid
acyltranserferase
At3g57650,At1g75020,At3g18850
EZ417701
(3),EZ408857
(2),EZ413603
(2),1
sing
leton
80.004
58Pho
sphotid
icacid
phosph
atase
At1g1
5080
1sing
leton
10.001
59Diacylglycerolcholinephosphotransferase
At3g25585
EZ415027
(4)
40.002
60Oleatedesaturease
At3g12120
EZ409947
(88),EZ414061
(77),2singletons
167
0.089
61Linoleate
desaturase
At2g29980
EZ407668
(2),EZ419293
(2),3singletons
70.004
62Phospholip
iddiacylglycerol
acyltransferase
At5g13640,At3g44830
EZ407935
(2),3singletons
50.003
63DiacylglycerolacyltransferaseI
At2g19450
EZ412185
(3)
30.002
63DiacylglycerolacyltransferaseII
At3g51520
1singleton
1<0.001
218 Bioenerg. Res. (2011) 4:211–221
be similar to a plastidial isoform (At4g36810, [37]). Tran-scripts were also detected for the small subunit of geranyl(geranyl)diphosphate synthase. These proteins are catalyti-cally inactive themselves, but moderate the activity of GGPPsynthases to confer geranyl diphosphate (GPP) synthaseactivity [38].
The conversion of GGPP into the tigliane diterpenerequires a diterpene synthase. In angiosperms, other than thediterpene synthases involved in gibberellin (and otherphytohormone biosynthesis), only a few diterpene synthaseshave been characterised to date, which include casbene andneocembrene synthases from R. communis and otherEuphorbiaceae [32, 39]. Phylogenetic analysis of plantterpene cyclases reveals that they can be divided into sixsubfamilies based on the chain length of the substrates used,involvement in primary or secondary metabolism andtaxonomy [40]. Angiosperm (Magnoliophyta) sesquiterpenecyclases (using FPP as substrate) and casbene synthase allbelong to family A (TpsA). The casbene synthase sequencescontain a putative N-terminal plastid transit peptide which isabsent from the sesquiterpene cyclases. Analysis of the J.curcas EST database revealed a number of ESTs from theTpsA gene family (Supplementary Table 3). However, theyappear to be most similar to sesquiterpene cyclases ratherthan plastidial diterpene synthases.
Comparison in Efficiency of Gene DiscoveryUsing Dye-Terminator and Pyrosequencing
Direct comparisons of the efficiency of gene discoverybetween our dataset and that provided by Costa et al. [9] areproblematic, as different developmental stages were select-ed. At present, there are no standard descriptors of seedstages of J. curcas, but based on the relative abundance ofsequences for storage proteins, oleosins and “late embryo-genesis” related sequences, the data set provided by Costaet al. [9] is likely to include earlier developmental stagesthan studied in this report. The 41 Mbp of 454 sequencedata produced in this study provided 12,419 contigs and17,333 (29,752 unique sequences). The assembledsequence data obtained from both developing and germi-nating seed libraries produced by Costa et al. contained atotal of 7 Mbp of data and yielded 1,606 contigs and 5,677singletons (7,283 singletons) [9]. The increased depth ofcoverage obtained from 454 sequencing permitted thediscovery of a larger number of genes involved in keybiological processes such as lipid biosynthesis. Forexample, we were able to obtain sequences correspondingto all stages of plastidial fatty acid biosynthesis. Sequencesfor ketoacyl-ACP synthase III, acyl-ACP thioesterase and3-hydroxyacyl-ACP dehydratase were not detected in the
SUCROSE
UDP-Glc
1
2
Glc-1-P
Glc-6-P
Fruc-6-P
Fruc-1,6-P
fructose
3
4
5
6
DHAP GA3P
1,3-BPG
3-PG
2-PG
PEP
Pyruvate
STARCH
Glc-6-P
DHAP GA3P
Fruc-6-P
Fruc-1,6-P
PEP
Pyruvate
Glc
7 98
1011
12
13
14
15
16
17
18
19
20
6-PGL
6-PG
Ru-5-P
PLASTID
R-5-P Xu-5-P
S-7-P GA3P
E-4-P
CYTOSOL
Glc-1-P
ADP-Glc
AcetylCoA
MalonylCoA
3-KetoacylACP
3-hydroxyacylACP
E-2-enoylACP
Acyl-ACP
16:0-ACP18:0-ACP
16:1-ACP18:1-ACP
1,3-BPG
3-PG
2-PG
ER
Acyl-CoAPool
LPA
46
25
37
48
PA
DAG
TAG
PC
2623
2227
21
28
29 30
31
3333
Desaturation18:1 PC 18:2 PC18:2 PC 18:3 PC
α-1,4-Glc24
19
34
35
36
38
39
40
41 42
43
44
43
45
47
53
54, 55
49
50
51,52
54, 5556
57
5859 60
61
6263
G3P
Nascent oil body
Fig. 3 Schematic representation of the steps involved in the conversionof sucrose into triacylglycerol in J. curcas seeds. The enzymes involvedin the various steps are represented by numbers and are detailed inTable 2. Genes corresponding to the enzymes involved in stepsindicated with dashed arrows (8 and 20) are presently unknown.UDP-Glc uracil-diphosphate-glucose, Glc-1-P glucose-1-phosphate,Glc-6-P glucose-6-phosphate, Fruc-6-P fructose-6-phosphate, Fruc-1,6-P fructose-1,6-bisphosphate, DHAP dihydroxyacetone phosphate,GA3P glyceraldehyde-3-phosphate, 1,3-BPG 1,3-bisphosphoglycerate,
3-PG 3-phosphoglycerate, 2-PG 2-phosphoglyercate, PEP phospho-enolpyruvate, α-1,4-Glc α-1,4-glucan, Glc glucose, ADP-Glc adenosinediphosphate-glucose, 6-PGL 6-phospho-gluconolactone, 6-PG 6-phosphogluconate, Ru-5-P ribulose-5-phosphate, R-5-P ribose-5-phosphate, Xu-5-P xlyulose-5-phosphate, E-4-P erythrose-4-phosphate,S-7-P sedulose 7-phosphate, CoA coenzyme-A, ACP acyl carrierprotein, G3P glycerol-3-phosphate, LPA lysophosphatidic acid, PAphosphatidic acid, DAG diacylglycerol, PC phosphotidylcholine, TAGtriacylglycerol
Bioenerg. Res. (2011) 4:211–221 219
developing seed EST library obtained from 7,320 dye-terminator sequencing reads. For steroyl-ACP desaturase,we obtained sequence corresponding to the full codingregion for orthologues of two Arabidopsis genes(At2g43710 and At3g02630). No orthologues to plastidialsteroyl-ACP desaturases were found in the study of Costaet al. [9]. Similarly, we were able to obtain sequence datafor the full coding region of two oleate desaturase genes inour J. curcas library, whereas the Costa et al. study containsonly partial sequence data for one of these genes [9]. Insummary, the pyrosequencing data presented in this studyhas resulted in a greater depth of sequence coverage thanobtained from previous studies and provides sequence datafor a larger number of genes present in the developing seedtranscriptome.
Interestingly, Costa et al. report an exceptionally highlevel of transposable elements (TE) in their developing seedtranscriptome, with around 11% of the ESTs showingsignificant homology to TEs. Further analysis of the ESTdataset deposited by Costa et al. revealed that these wereTy3-gypsy type LTR retrotransposons. Analysis of ourdataset revealed only 31 ESTs with significant homologyto LTR retrotransposons. These differences in the level ofTE elements present in the different transcriptome datasetscould be due to the selection of material at differentdevelopmental stages in the two studies or possible stressinduction caused by the removal of the testa in the study ofCosta et al.
Conclusions
Transcriptome analysis of the developing J. curcas seedsusing a single run of the GLS-FLX produced 41 Mbp ofsequence data after removal of low quality sequences andprimers. Assembly of these sequences resulted in 12,419contigs. Despite the greater depth of sequence coverageachieved with pyrosequencing, most contigs were relativelyshort and contained few sequences, indicating the samplingof the developing seed transcriptome had not reachedsaturation. Nonetheless, a search of the ESTs produced inthat database revealed homologues of all known sequencesinvolved in pathways for the conversion of sucrose intostorage lipid (TAG). The sequence data therefore provides auseful resource for further transcriptome studies (qPCR,etc.) or as a dataset for sequence analysis in proteomicstudies. To achieve more saturated coverage of thedeveloping seed transcriptome, further cDNA sequencingreactions could be performed after normalisation of thecDNA population. Alternatively, the relatively small size ofthe J. curcas genome [41], coupled with recent increases inthe throughput of 454 sequencing makes genome sequenc-ing of J. curcas a viable proposition. The lack of type-II
RIPs within the sequence database suggests that J. curcasseeds do not contain RIPs with a lectin domain. Theavailability of the sequence data presented in this manu-script will provide a useful resource for those engaged in J.curcas research.
Acknowledgements This work was supported by funding from theGarfield Weston Foundation. The individual sequence reads have beendeposited in the GenBank Short Read Archive (SRA) as accessionSRR027577. The 12,419 contigs have been deposited in the GenBankTranscriptome Shotgun Assembly (TSA) archive as accessionsEZ407282-EZ419700. The annotations obtained from a BLASTXsearch with these contigs is presented in Supplementary Table 1.
References
1. Fairless D (2007) Biofuel: the little shrub that could—maybe.Nature 499:652–655
2. King AJ et al (2009) Potential of Jatropha curcas as a source ofrenewable oil and animal feed. J Exp Bot 60(10):2897–2905
3. Heller J (1996) Physic nut. Jatropha curcas L. Promoting theconservation and use of underutilized and neglected crops.Institute of Plant Genetics and Crop Research, Gatersleben,Germany and International Plant Genetic Resource Institute,Rome, p 66
4. White JA et al (2000) A new set of Arabidopsis expressedsequence tags from developing seeds. The metabolic pathwayfrom carbohydrates to seed oil. Plant Physiol 124(4):1582–1594
5. Lu C, Wallis J, Browse J (2007) An analysis of expressedsequence tags of developing castor endosperm using a full-lengthcDNA library. BMC Plant Biol 7(1):42
6. van de Loo FJ, Turner S, Somerville C (1995) Expressedsequence tags from developing castor seeds. Plant Physiol 108(3):1141–1150
7. Chung Suh M et al (2003) Comparative analysis of expressedsequence tags from Sesamum indicum and Arabidopsis thalianadeveloping seeds. Plant Mol Biol 52(6):1107–1123
8. Cahoon EB et al (1999) Biosynthetic origin of conjugated doublebonds: production of fatty acid components of high-value dryingoils in transgenic soybean embryos. Proc Nat Acad Sci USA 96(22):12935–12940
9. Costa G et al (2010) Transcriptome analysis of the oil-rich seed ofthe bioenergy crop Jatropha curcas L. BMC Genomics 11(1):462
10. Emrich SJ et al (2007) Gene discovery and annotation usingLCM-454 transcriptome sequencing. Genome Res 17(1):69–73
11. Cheung F et al (2006) Sequencing Medicago truncatula expressedsequenced tags using 454 Life Sciences technology. BMCGenomics 7(1):272
12. Weber APM et al (2007) Sampling the Arabidopsis transcriptomewith massively parallel pyrosequencing. Plant Physiol 144(1):32–42
13. Larson TR, Graham IA (2001) A novel technique for the sensitivequantification of acyl CoA esters from plant tissues. Plant J 25(1):115–125
14. Gasic K, Hernandez A, Korban SS (2004) RNA extraction fromdifferent apple tissues rich in polyphenols and polysaccharides forcDNA library construction. Plant Mol Biol Rep 22(4):437a–437g
15. Huang X, Madan A (1999) CAP3: a DNA sequence assemblyprogram. Genome Res 9(9):868–877
16. Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucl Acids Res25(17):3389–3402
220 Bioenerg. Res. (2011) 4:211–221
17. Beisson F et al (2003) Arabidopsis genes involved in acyl lipidmetabolism. A 2003 census of the candidates, a study of thedistribution of expressed sequence tags in organs, and a web-based database. Plant Physiol 132:681–697
18. Chen GQ et al (2007) Expression profiles of genes involved infatty acid and triacylglycerol synthesis in castor bean (Ricinuscommunis L.). Lipids 42:263–274
19. Juan L et al (2003) Cloning and expression of curcin, a ribosome-inactivating protein from the seeds of Jatropha curcas. Acta BotSin 45(7):858–863
20. Qin W et al (2005) Expression of a ribosome inactivating protein(curcin 2) in Jatropha curcas is induced by stress. J Biosci 30(3):351–357
21. Barbieri L, Battellia MG, Stirpe F (1993) Ribosome-inactivatingproteins from plants. Biochim Biophys Acta 1154:237–282
22. Hartley MR, Lord JM (2004) Genetics of ribosome inactivatingproteins. Mini-Rev Med Chem 4:487–492
23. Audi J et al (2005) Ricin poisoning: a comprehensive review. JAm Med Assoc 294(18):2342–2351
24. Challoner KR, McCarron MM (1990) Castor bean intoxication.Ann Emerg Med 19(10):1177–1183
25. Salas JJ, Ohlrogge JB (2002) Characterization of substratespecificity of plant FatA and FatB acyl-ACP thioesterases. ArchBiochem Biophys 403(1):25–34
26. Schnurr JA et al (2002) Fatty acid export from the chloroplast.Molecular characterization of a major plastidial acyl-coenzyme Asynthetase from Arabidopsis. Plant Physiol 129(4):1700–1709
27. Baud S, Dubreucq B, Miquel M, Rochat C, Lepiniec L (2008)Storage reserve accumulation in Arabidopsis: metabolic anddevelopmental control of seed filling. In: The Arabidopsis Book.Rockville MD: American Society of Plant Biologists
28. Lardizabal KD et al (2001) DGAT2 is a new diacylglycerolacyltransferase gene family: purification, cloning, and expressionin insect cells of two polypeptides from Mortierella ramannianawith diacylglycerol acyltransferase activity. J Biol Chem 276(42):38862–388629
29. He X et al (2004) Regulation of diacylglycerol acyltransferase indeveloping seeds of castor. Lipids 39(9):865–871
30. Ståhl U et al (2004) Cloning and functional characterization of aphospholipid:diacylglycerol acyltransferase from Arabidopsis.Plant Physiol 135(3):1324–1335
31. Lichtenthaler HK (1999) The 1-deoxy-D-xylulose-5-phosphatepathway of isoprenoid biosynthesis in plants. Annu Rev PlantPhysiol Plant Mol Biol 50(1):47–65
32. Mau CJ, West CA (1994) Cloning of casbene synthase cDNA:evidence for conserved structural features among terpenoidcyclases in plants. Proc Nat Acad Sci USA 91(18):8497–8501
33. Wildung MR, Croteau R (1996) A cDNA clone for taxadienesynthase, the diterpene cyclase that catalyzes the committed stepof taxol biosynthesis. J Biol Chem 271(16):9201–9204
34. Martin DM, Faldt J, Bohlmann J (2004) Functional characteriza-tion of nine Norway spruce TPS genes and evolution ofgymnosperm terpene synthases of the TPS-d subfamily. PlantPhysiol 135(4):1908–1927
35. Gerber E et al (2009) The plastidial 2-C-methyl-D-erythritol 4-phosphate pathway provides the isoprenyl moiety for proteingeranylgeranylation in tobacco BY-2 cells. Plant Cell 21(1):285–300
36. Kasahara H et al (2002) Contribution of the mevalonate andmethylerythritol phosphate pathways to the biosynthesis ofgibberellins in Arabidopsis. J Biol Chem 277(47):45188–45194
37. Okada K et al (2000) Five geranylgeranyl diphosphate synthasesexpressed in different organs Are localized into three subcellularcompartments in Arabidopsis. Plant Physiol 122(4):1045–1056
38. Wang G, Dixon RA (2009) Heterodimeric geranyl(geranyl)diphosphate synthase from hop (Humulus lupulus) and theevolution of monoterpene biosynthesis. Proc Nat Acad Sci USA106(24):9914–9919
39. Kirby J et al (2010) Cloning of casbene and neocembrene synthasesfrom Euphorbiaceae plants and expression in Saccharomycescerevisiae. Phytochemistry 71(13):1466–1473
40. Bohlmann J, Meyer-Gauen G, Croteau R (1998) Plant terpenoidsynthases: molecular biology and phylogenetic analysis. Proc NatAcad Sci USA 95:4126–4133
41. Carvalho CR et al (2008) Genome size, base composition andkaryotype of Jatropha curcas L., an important biofuel plant. PlantSci 174(6):613–617
Bioenerg. Res. (2011) 4:211–221 221