research article analysis of polygala tenuifolia...

12
Research Article Analysis of Polygala tenuifolia Transcriptome and Description of Secondary Metabolite Biosynthetic Pathways by Illumina Sequencing Hongling Tian, 1 Xiaoshuang Xu, 2,3 Fusheng Zhang, 2 Yaoqin Wang, 1 Shuhong Guo, 1 Xuemei Qin, 2 and Guanhua Du 2,4 1 Research Institute of Economics Crop, Shanxi Academy of Agriculture Science, Fenyang, Shanxi 032200, China 2 Modern Research Center for Traditional Chinese Medicine, Shanxi University, No. 92 Wucheng Road, Taiyuan, Shanxi 030006, China 3 College of Chemistry and Chemical Engineering, Shanxi University, No. 92 Wucheng Road, Taiyuan, Shanxi 030006, China 4 Institute of Materia Medica, Chinese Academy of Medical Sciences, Beijing 100050, China Correspondence should be addressed to Fusheng Zhang; [email protected] and Guanhua Du; [email protected] Received 22 June 2015; Accepted 3 August 2015 Academic Editor: Xiaohan Yang Copyright © 2015 Hongling Tian et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Radix polygalae, the dried roots of Polygala tenuifolia and P. sibirica, is one of the most well-known traditional Chinese medicinal plants. Radix polygalae contains various saponins, xanthones, and oligosaccharide esters and these compounds are responsible for several pharmacological properties. To provide basic breeding information, enhance molecular biological analysis, and determine secondary metabolite biosynthetic pathways of P. tenuifolia, we applied Illumina sequencing technology and de novo assembly. We also applied this technique to gain an overview of P. tenuifolia transcriptome from samples with different years. Using Illumina sequencing, approximately 67.2% of unique sequences were annotated by basic local alignment search tool similarity searches against public sequence databases. We classified the annotated unigenes by using Nr, Nt, GO, COG, and KEGG databases compared with NCBI. We also obtained many candidates CYP450s and UGTs by the analysis of genes in the secondary metabolite biosynthetic pathways, including putative terpenoid backbone and phenylpropanoid biosynthesis pathway. With this transcriptome sequencing, future genetic and genomics studies related to the molecular mechanisms associated with the chemical composition of P. tenuifolia may be improved. Genes involved in the enrichment of secondary metabolite biosynthesis-related pathways could enhance the potential applications of P. tenuifolia in pharmaceutical industries. 1. Introduction As a part of commonly known traditional Chinese medicinal plants, Radix polygalae (RP) can be obtained from Polygala tenuifolia Willd. and P. sibirica L., which are listed in the Chi- nese Pharmacopoeia (2010); RP elicits sedative, antipsychotic, expectorant, and anti-inflammatory effects. High consump- tion and decreased regeneration of wild resources of RP have prompted researchers to domesticate and cultivate it and a large-scale planting has been performed since early 1980s. us far, RP from P. tenuifolia has been the most commonly used variety, so P. tenuifolia was used in this study. P. tenuifolia is a perennial herb mainly distributed in northeast, north, and northwest of China. is herb contains various medicinal components, such as saponins, xanthones, and oligosaccharide esters, which exhibit tonic, sedative, antipsy- chotic, expectorant, and other pharmacological effects [1]. In previous studies, chemical and pharmacological properties of triterpenoid saponins, one of the most important medicinal components of P. tenuifolia, were investigated [2]. However, studies on the biosynthetic pathway of triterpenoid saponins in P. tenuifolia have been rarely conducted compared with those in licorice, ginseng, notoginseng, quinquefolius, and salvia [3–7]. For instance, fourteen nucleotide sequences have been deposited in the NCBI GenBank database until August 2015; among these sequences, six are related to the biosynthetic pathway of triterpenoid saponins in P. tenuifolia. e lack of sequence data has limited extensive and intensive Hindawi Publishing Corporation International Journal of Genomics Volume 2015, Article ID 782635, 11 pages http://dx.doi.org/10.1155/2015/782635

Upload: hoangque

Post on 17-Sep-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Research ArticleAnalysis of Polygala tenuifolia Transcriptomeand Description of Secondary Metabolite BiosyntheticPathways by Illumina Sequencing

Hongling Tian1 Xiaoshuang Xu23 Fusheng Zhang2 Yaoqin Wang1 Shuhong Guo1

Xuemei Qin2 and Guanhua Du24

1Research Institute of Economics Crop Shanxi Academy of Agriculture Science Fenyang Shanxi 032200 China2Modern Research Center for Traditional ChineseMedicine Shanxi University No 92Wucheng Road Taiyuan Shanxi 030006 China3College of Chemistry and Chemical Engineering Shanxi University No 92 Wucheng Road Taiyuan Shanxi 030006 China4Institute of Materia Medica Chinese Academy of Medical Sciences Beijing 100050 China

Correspondence should be addressed to Fusheng Zhang ample1007sxueducn and Guanhua Du dughimmaccn

Received 22 June 2015 Accepted 3 August 2015

Academic Editor Xiaohan Yang

Copyright copy 2015 Hongling Tian et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Radix polygalae the dried roots of Polygala tenuifolia and P sibirica is one of the most well-known traditional Chinese medicinalplants Radix polygalae contains various saponins xanthones and oligosaccharide esters and these compounds are responsible forseveral pharmacological properties To provide basic breeding information enhance molecular biological analysis and determinesecondary metabolite biosynthetic pathways of P tenuifolia we applied Illumina sequencing technology and de novo assembly Wealso applied this technique to gain an overview of P tenuifolia transcriptome from samples with different years Using Illuminasequencing approximately 672 of unique sequences were annotated by basic local alignment search tool similarity searchesagainst public sequence databasesWe classified the annotated unigenes by usingNr Nt GO COG and KEGGdatabases comparedwithNCBIWe also obtainedmany candidates CYP450s andUGTs by the analysis of genes in the secondarymetabolite biosyntheticpathways including putative terpenoid backbone and phenylpropanoid biosynthesis pathwayWith this transcriptome sequencingfuture genetic and genomics studies related to the molecular mechanisms associated with the chemical composition of P tenuifoliamay be improved Genes involved in the enrichment of secondary metabolite biosynthesis-related pathways could enhance thepotential applications of P tenuifolia in pharmaceutical industries

1 Introduction

As a part of commonly known traditional Chinese medicinalplants Radix polygalae (RP) can be obtained from PolygalatenuifoliaWilld and P sibirica L which are listed in the Chi-nese Pharmacopoeia (2010) RP elicits sedative antipsychoticexpectorant and anti-inflammatory effects High consump-tion and decreased regeneration of wild resources of RP haveprompted researchers to domesticate and cultivate it and alarge-scale planting has been performed since early 1980sThus far RP from P tenuifolia has been the most commonlyused variety so P tenuifolia was used in this study Ptenuifolia is a perennial herb mainly distributed in northeastnorth and northwest of China This herb contains various

medicinal components such as saponins xanthones andoligosaccharide esters which exhibit tonic sedative antipsy-chotic expectorant and other pharmacological effects [1] Inprevious studies chemical and pharmacological properties oftriterpenoid saponins one of the most important medicinalcomponents of P tenuifolia were investigated [2] Howeverstudies on the biosynthetic pathway of triterpenoid saponinsin P tenuifolia have been rarely conducted compared withthose in licorice ginseng notoginseng quinquefolius andsalvia [3ndash7] For instance fourteen nucleotide sequenceshave been deposited in the NCBI GenBank database untilAugust 2015 among these sequences six are related to thebiosynthetic pathway of triterpenoid saponins in P tenuifoliaThe lack of sequence data has limited extensive and intensive

Hindawi Publishing CorporationInternational Journal of GenomicsVolume 2015 Article ID 782635 11 pageshttpdxdoiorg1011552015782635

2 International Journal of Genomics

studies onP tenuifolia nevertheless genomic research relatedto this medicinal plant is feasible With the medicinal andeconomic importance of P tenuifolia genomic data sourcesof this plant species should be investigated to discover genesand develop further functional studies

DNA sequencing technology was developed by FrederickSanger in 1977 The ldquofirst-generation sequencingrdquo was labori-ous and costly After years of improvement next-generationsequencing (NGS) has been developed in terms of speed andaccuracy with reduced cost and manpower Typical examplesof NGS include SOLiDIon Torrent PGM from Life SciencesGenome AnalyzerHiSeq 2000MiSeq from Illumina andGS FLX TitaniumGS Junior from Roche [8] Among thesetechniques Illumina HiSeq 2000 yields the largest outputand entails the lowest reagent cost this technique has beencommonly used in deep sequencing of model and nonmodelorganisms [9] For example Illumina HiSeq 2000 has beensuccessfully applied in a wide range of species includingmodel plants (Arabidopsis rice tomato) [10ndash12] and cropswith large genomes including field pea castor and Sorghumpropinquum [13ndash15]This technique could promote studies onnot only the genomics of these plants but also the biosyntheticpathways of the main active ingredients

Triterpenoid saponins xanthones and oligosaccharideesters are currently considered as the main active ingredientsof P tenuifolia [16] However small amounts of secondarymetabolites such as isoquinoline alkaloids lignins andflavonols are present in P tenuifolia Indeed Illuminasequencing should be applied to determine the number ofenzymes implicated in the biosynthetic pathway of secondarymetabolites in P tenuifolia To date the biosyntheticpathway of triterpenoid saponins particularly pentacyclictriterpenoid saponins in P tenuifolia has been extensivelystudied [2] The formation of a primary cucurbitane skeletonvia the isoprenoid pathway is before the cyclization of23-oxidosqualene of terpenoid biosynthesis in plantMevalonate acid (MVA) pathway located in cytoplasm andendoplasmic reticulum and 2-C-methyl-d-erythritol-4-phosphate1-deoxy-d-xylulose-5-phosphate (MEPDOXP)pathway located in plasmid which are likely to participatein the formation of a presenegenin skeleton in P tenuifoliaThese two pathways are present in the different cellularparts of green plants [17 18] In addition phenylpropanoidbiosynthetic pathways in P tenuifolia are yet to be reportedPhenylalanine is an end product of the shikimate pathwayin which the aromatic amino acids tyrosine and tryptophanare also produced [19] With phenylalanine as a precursorlignin biosynthesis proceeds via a series of side-chainmodifications ring hydroxylations and O-methylations toyield lignin monomers [20] In this pathway other smallmolecules such as flavonoids coumarins hydroxycinnamicacid conjugates and lignans are also produced Many ofthese molecules could be or have been the main focusof studies [21] These two metabolic pathways are mainlyfound in P tenuifolia Therefore whole transcripts forcomplete gene expression profiling should be identifiedby transcriptome sequencing because of limited genomicsources and information regarding the biosynthetic pathwaysof secondary metabolites in P tenuifolia It will support

further studies on these two metabolic pathways by usinghigh-throughput Illumina deep-sequencing techniques

In this study a cDNA library was constructed to obtaindetailed and general data from P tenuifolia by using a high-throughput Illumina deep-sequencing technique We coulddetermine genes encoding the enzymes involved in wholebiosynthetic pathways of triterpenoid saponins phenyl-propanoids and other secondary metabolites and resultscould promote further analysis Furthermore our resultscould provide direct experimental data of P tenuifolia notonly for guiding genetic breeding but also for subsequenttriterpenoid biosynthesis cloning and functional verificationof key genes Therefore this transcriptome sequencing mayhelp improve future genetics and genomics studies onmolec-ular mechanisms associated with the chemical compositionsof P tenuifolia Eventually we can control the quality of Ptenuifolia from germplasm sources and thus provide impor-tant theoretical and practical significance

2 Materials and Methods

21 Plant Materials Fresh roots of one- two- and three-year-old P tenuifolia were sampled and collected in XinjiangCounty Shanxi Province China Another batch of two-year-old fresh roots of P tenuifolia was collected in Anguo CityHebei Province All of the samples were intact and healthyThe samples were snap-frozen in liquid nitrogen and storedat minus80∘C before use

22 RNA Extraction cDNA Library Construction and Illu-mina Sequencing According to the manufacturerrsquos protocola plant RNA isolation kit (AutoLab) was used to extract thetotal RNA from fresh tissues and then purified by an RNeasyMiniElute cleanup kit (Qiagen) In brief mRNA was purifiedwith the combination of oligo-dT-containing beads and poly-A-containing mRNA and then used methods of chemicalsand high temperature to fragmentate the mRNA and getfragments of 150 bp The next step was to synthesize thedouble-stranded cDNA and repair the DNA fragments withprotruding terminal through the function of 31015840-51015840 exonucle-ase and polymerase In order to guarantee DNA fragmentsand adapters can combine with ldquoArdquo and ldquoTrdquo complementarypair connection and prevent the inserted DNA fragmentsconnecting with each other single-base ldquoArdquo was imported tothe 31015840 end of the bluntDNAand single-base ldquoTrdquowas importedto the 31015840 end of the adapters Adapters of the tag-containingand theDNA fragments were incubated and connected by thefunction of ligase The free adapters and nonadapters DNAfragments were purified by using AMPureXP beads DNAfragments with adapters were enriched selectively by PCRand the PCR amplification with fewer number of cycles wasoperated in order to avoid errors in the library

The methods of quantitative of library were Pico greenand fluorescence spectrophotometer (Quantifluor-ST flu-orometer Promega E6090 Quant-iT PicoGreen dsDNAAssay Kit Invitrogen P7589) and the quality control ofthe enrichment of the PCR fragments and validation ofthe size and distribution of DNA fragments in the library

International Journal of Genomics 3

Table 1 Functional annotation of P tenuifolia

Database Total119864 cutoff Database version

Number of annotated Percent ()Total unigenes 115477Nt 69211 599 100119890 minus 05 201301Nr 77599 672 100119890 minus 05 201301Swissprot 54582 473 100119890 minus 10 201301COG 30548 265 100119890 minus 10 No versionKEGG 72636 629 100119890 minus 10 Release58Interpro 49873 4319 Interproscan 48 v36GO 41823 3622

were conducted usingAgilent 2100 (Agilent 2100 BioanalyzerAgilent 2100 Agilent High Sensitivity DNA Kit Agilent5067-4626) The samples mixed after homogenization to10 nM then diluted gradually and quantitated to 4sim5 pMfor Illumina sequencing to construct the multiplexed DNAlibraries Illumina sequencing was conducted with IlluminaHiSeqTM2000 in CapitalBio Ltd Beijing China

23 De Novo Assembly and Functional Annotation Sequenc-ing of the original data sequence throughquality analysis afterremoval of low quality and subsequences gets the sequencesavailable for subsequent analysis A Perl programwas writtento remove reads with adapters the reads with the averagequality of less than Q20 for its 31015840 end to the 51015840 end thereads with the final length of less sequence than 50 bp andthe reads with the uncertainty bases High-quality readswere used for de novo assembly by the Trinity softwareto construct unique consensus sequences Trimmed solexatranscriptome reads were mapped onto unique consensussequences by using Bowtie [22] (Bowtie parameters -v3 -all -best -strata)The functional annotation of unigenes was com-pared with National Center for Biotechnology Information(NCBI) nonredundant protein database (Nr) nonredundantnucleotide databases (Nt) UniProtSwiss-Prot Kyoto Ency-clopedia of Genes andGenomes (KEGG) Clusters of Orthol-ogous Groups (COG) Gene Ontology (GO) and Interprodatabases with Basic Local Alignment Search Tool (BLASThttpwwwncbinlmnihgov) Unigenes were identifiedby comparing sequence similarity against SWISS-PROT(SWISS-PROT downloaded from European BioinformaticsInstitute ftpftpebiacukpubdatabasesswissprot) COG[23 24] and KEGG database [25] with BLAST at 119864 valuesle 1119890minus10TheKO information retrieved fromblast results by aPerl script and the pathways between databases and unigeneswere established InterProScan [26] was used to annotatethe InterPro domains Then the functional assignmentsof InterPro domains were mapped onto GO and the GOclassification and tree were performed by WEGO [27]

3 Results

31 Sequence Assembly The RNA extracted from all thesamples was mixed for Illmina sequencing in order to getthe whole range of transcript diversity In total 5888 million

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Un

igen

es

Length

Unigene length distribution

Unigenes

400

499

600

699

800

899

1000

1099

1200

1299

1400

1499

1600

1699

1800

1899

2000

2099

2200

2299

2400

2499

2600

2699

2800

2899

3000

3099

3200

3299

3400

3499

3600

3699

3800

3899

4000

4099

4200

4299

4400

4499

4600

4699

4800

4899

200

299

Figure 1 Length distribution of unigenes

raw reads and 1177 gigabase pairs were sequenced with anaverage GC content of 439 119876 ge 20 and no ambiguousldquoNrdquo (Table S1 in Supplementary Material available online athttpdxdoiorg1011552015782635) A total of 55432632high-quality reads were assembled and 316703 contigs and145857 transcripts with the N50 values were 1636 bp All thetranscripts were subjected to cluster and assembly analysesyielding 39625 unigenes with a mean length of 1378 bpand the N50 values were 1971 bp (Figure 1) The assemblystatistics of contigs transcripts and unigenes were shown inTable S2

32 Functional Annotation and Classification In the presentlibrary approximately 672 of the unigenes were annotatedby BLASTX and BLASTN search with a threshold of 10minus5against seven public databases including Nr Nt UniProtSwiss-Prot KEGG COG GO and Interpro databases(Table 1)

321 GO Annotation A total of 41832 unigenes for GOannotation were divided into three major categories includ-ing cellular location molecular function and biologicalprocess Among 34 functional groups with GO assignmentsthe dominant biochemistry cell apoptosis and metabolismwere annotated (Figure 2)

4 International Journal of Genomics

Cel

l

Cel

l p

art

En

velo

pe

Ext

race

llu

lar

regi

on

Mac

rom

ole

cula

r co

mp

lex

Mem

bra

ne

encl

ose

d lu

men

Org

anel

le

Org

anel

le p

art

An

tio

xid

ant

acti

vity

Bin

din

g

Cat

alyt

ic a

ctiv

ity

Ele

ctro

n c

arri

er a

ctiv

ity

En

zym

e re

gula

tor

acti

vity

Mo

lecu

lar

tran

sdu

cer

acti

vity

Nu

trie

nt

rese

rvo

ir a

ctiv

ity

Stru

ctu

ral

mo

lecu

le a

ctiv

ity

Tra

nsc

rip

tio

n r

egu

lato

r ac

tivi

ty

Tra

nsl

atio

n r

egu

lato

r ac

tivi

ty

Tra

nsp

ort

er a

ctiv

ity

An

ato

mic

al s

tru

ctu

re f

orm

atio

n

Bio

logi

cal

regu

lati

on

Cel

lula

r co

mp

on

ent

bio

gen

esis

Cel

lula

r co

mp

on

ent

org

aniz

atio

n

Cel

lula

r p

roce

ss

Dev

elo

pm

enta

l pro

cess

Est

abli

shm

ent

of

loca

liza

tio

n

Lo

cali

zati

on

Met

abo

lic

pro

cess

Mu

lti

org

anis

m p

roce

ss

Mu

ltic

ellu

lar

org

anis

mal

pro

cess

Pig

men

tati

on

Rep

rod

uct

ion

Rep

rod

uct

ive

pro

cess

Res

po

nse

to

sti

mu

lus

Vir

al r

epro

du

ctio

n

Cellular location Molecular function Biological process

01

Gen

es (

)

1

10

100 41820

4182

Nu

mb

er o

f ge

nes

418

41

GO-standard

Figure 2 Gene ontology classification assigned to the unigenes

In cellular location eight classifications were clustered bythe matched unique sequences and the larger subcategorieswere ldquocellrdquo and ldquocell partrdquo In terms of molecular function11 classifications were categorized including the representedcellular components such as ldquobindingrdquo and ldquocatalytic activ-ityrdquo According to biological processes 16 classifications weredivided including the represented biological processes suchas ldquocellular processrdquo and ldquometabolic processrdquo However fewgenes were assigned to the categories ldquonutrient reservoiractivityrdquo and ldquoviral reproductionrdquo and no genes were foundin the clusters of ldquodevelopmental processrdquo

322 COGAnnotation Unigenes for theCOGclassificationswere annotated in order to further evaluate the effectivenessof the annotation process (Figure 3) Class definition num-ber and percentage of this class were shown in Figure 3 Theproteins in the COG categories were assumed to have thesame ancestor protein and these proteins were also assumedto be paralogs or orthologs The largest category was generalfunctional prediction only with 2484 the second cate-gories were posttranslational modification protein turnoverand chaperon with 897 and the third category includedtranslation ribosomal structure and biogenesis with 724Cell motility (24 009) and nuclear structure (10 004)represented the smallest category

323 KEGG Annotation The KEGG database can be usedto categorize gene functions with an emphasis on biochem-ical pathways The KEGG database can also be used tosystematically analyze inner cell metabolic pathways andfunctions of gene products Pathway-based analyses helpfurther determine the biological functions of genes To gainfurther insights into the biological pathways in P tenuifolia

we performed a BLASTX search against the KEGG proteindatabase on the assembled unigenes A total of 72636 (629)unigenes had blast hits and 42702 unigenes were assignedto five KEGG biochemical pathways (Table 2) includingmetabolism (17217 unigenes) genetic information processing(14593 unigenes) environmental information (2719 uni-genes) cellular processes (3919 unigenes) and human dis-eases (4254 unigenes)Thepathwayswith the highest unigenerepresentation were spliceosome (ko03041 1431 unigenes)chromosome (ko03036 1232 unigenes) and ubiquitin system(ko04121 1060 unigenes)

As a Chinese traditional medicinal plant P tenuifoliaundergoes highly active metabolism The higher expressionof metabolism results in primary metabolite biosynthesissuch as ldquocarbohydrate metabolismrdquo (3702 unigenes) ldquoaminoacid metabolismrdquo (2168 unigenes) and ldquoenergy metabolismrdquo(1762 unigenes) However the most active ingredients of Ptenuifolia were secondary metabolites Therefore we listedthe classifications of terpenoids polyketides and other sec-ondary metabolites (Figure 4) with high expression levels

Metabolic pathways were involved in the metabolismof terpenoids and polyketides (620 unigenes Figure 4(a))including ldquoterpenoid backbone biosynthesisrdquo (176 unigenes28) ldquocarotenoid biosynthesisrdquo (78 unigenes 13) ldquozeatinbiosynthesisrdquo (44 unigenes 7) ldquolimonene and pinenedegradationrdquo (46 unigenes 7) ldquoditerpenoid biosynthesisrdquo(30 unigenes 5) ldquotetracycline biosynthesisrdquo (16 unigenes3) ldquobrassinosteroid biosynthesisrdquo (13 unigenes 2) andldquopolyketide sugar unit biosynthesisrdquo (5 unigenes 1) Thelarge amount of transcriptomic informationmay enhance thestudy of terpenoid biosynthesis in P tenuifolia

Metabolic pathways were also involved in the bio-synthesis of other secondary metabolites (522 unigenes

International Journal of Genomics 5

Class definition number of this class percent of this class

Cluster of orthologous groups

General function prediction only 6401 2484

Posttranslational modification protein turnover chaperones 2310 897

Translation ribosomal structure and biogenesis1865 724

Carbohydrate transport an metabolism 1712 664

Amino acid transport and metabolism 1427 554

Replication recombination and repair 1319 512

Energy production and conversion 1152 447

Signal transduction mechanisms 1110 431

Transcription 1045 406

Lipid transport and metabolism 1021 396

Inorganic ion transport and metabolism

821 319

Function unknown 809 314

Coenzyme transport and metabolism 728 283

Cell wallmembraneenvelope biogenesis 721 280

Secondary metabolites biosynthesis transport and catabolism 621 241

Nucleotide transport and metabolism 588 228

Intracellular trafficking secretion and vesicular transport 530 206

Cytoskeleton 392 152

Defense mechanism 344 134

Chromatin structure and dynamics 331 128

Cell cycle control cell division and chromosome partitioning 273 106

RNA processing and modification 211 082

Cell motility 24 009

Nuclear structure 10 004

Figure 3 COG classification assigned to the unigenes

Figure 4(b)) including ldquophenylpropanoid biosynthesisrdquo (202unigenes 39) ldquoflavonoid biosynthesisrdquo (55 unigenes 11)ldquotropane piperidine and pyridine alkaloid biosynthesisrdquo (53unigenes 9) ldquostreptomycin biosynthesisrdquo (45 unigenes9) ldquoisoquinoline alkaloid biosynthesisrdquo (44 unigenes 8)ldquostilbenoid diarylheptanoid and gingerol biosynthesisrdquo (35unigenes 7) ldquoflavone and flavonol biosynthesisrdquo (27 uni-genes 5) and ldquobutirosin and neomycin biosynthesisrdquo (25unigenes 5) In our sequence dataset a total of 176 unigeneswere found to be potentially related to the biosynthesis of ter-penoid biosynthesis including MEP and MVA pathways Inaddition 202 and 55 unigeneswere related to the biosynthesesof phenylpropanoid and flavonoid respectively

(1) Characterization and Expression Analysis of the Genes In-volved in the Biosynthetic Pathway of Putative Terpenoid Back-bone Dimethylallyl pyrophosphate (DMAPP) and isopen-tenyl pyrophosphate (IPP) are recognized as a precursor ofthe biosynthesis of terpenoid saponins in green plants andthe compounds of ldquoactivity of isoprenerdquo in vivo DMAPP andIPP are alletic isomers which are synthesized by MVA andMEP biosynthesis pathwaysmentioned above [28]TheMVApathway starts with acetyl-CoAwhich is synthesized throughthe triterpene derivatives captured CO

2in cytoplasm and the

MEP pathway starts with pyruvate and glyceraldehydes-3-phosphate in plastid [29] In these databases the biosyntheticpathways of MVA and MEP were identified (Figure 5(a)) Inmost cases two or more unique sequences were labeled asthe same enzyme and these unique sequences could representsingle gene or different fragments of gene

The process of MVA pathway was as follows Firstly twomolecules of acetyl-CoAwere catalyzed into acetoacetyl CoAby AACT (EC 2319 13 unigenes) and then catalyzed into3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) by HMGS(EC 23310 18 unigenes) MVA was formulated by HMGR(EC 11134 18 unigenes) and then MVK (EC 27136 2unigenes) catalyzed MVA into MVA-5-Phosphat Next IPPwas synthesized by PMK (EC 2742 4 unigenes) and PMD(EC 41133 3 unigenes) The process of MEP pathway wasas follows Firstly pyruvate and glyceraldehydes-3-phosphatewere converted to 1-deoxy-d-xylulose-5-phosphate (DXOP)by DXS (EC 2217 17 unigenes) and then catalyzed intoMEPby DXR (EC 111267 7 unigenes) Next MCT (EC 277607 unigenes) and CMK (EC 271148 4 unigenes) catalyzedtheMEP to 4-dicytidine triphosphate-2-methyl-d-erythritol-2-phosphate (CDP-MEP) Lastly IPP was synthesized byMDS (EC 46112 1 unigene) HDS (EC 11771 9 unigenes)and HDR (EC 11712 12 unigenes) The next step was the

6 International Journal of Genomics

Table 2 Pathway classification of P tenuifolia

Category Pathway Count

Metabolism

Carbohydrate metabolism 3702Amino acid metabolism 2618

Enzyme families 1781Energy metabolism 1762Lipid metabolism 1610

Glycan biosynthesis and metabolism 1390Nucleotide metabolism 1055

Metabolism of cofactors and vitamins 879Metabolism of other amino acids 716

Metabolism of terpenoids and polyketides 620Xenobiotics biodegradation and metabolism 562Biosynthesis of other secondary metabolites 522

Genetic information processing

Folding sorting and degradation 4312Translation 4029Transcription 3191

Replication and repair 3061

Environmental information processingSignal transduction 2050Membrane transport 342

Signaling molecules and interaction 327

Cellular processes

Cell growth and death 1540Transport and catabolism 1399

Cell motility 507Cell communication 473

Human diseases

Neurodegenerative diseases 1398Infectious diseases 1371

Cancers 1146Immune system diseases 236Cardiovascular diseases 103

synthesis of squalene a series of enzymes were involvedsuch as IPPI (EC 5332 15 unigenes) and SQS (EC 251217 unigenes) However GPPS (EC 25129) and FPPS (EC25110) involved in terpenoid biosynthesis were not foundfrom the transcriptome resultsmaybe because some assemblycDNAs were not of full length Squalene was catalyzed into23-oxidosqualene by SE (EC 114997 19 unigenes) In manyplants 23-oxisqualene cyclization was a very important stepin the biosynthesis of terpenes because it was a branchpoint of the synthesis of triterpenoid saponin sterol andother terpenes [30] We have found that two types of OSCgenes of P tenuifolia are present in the dataset these typesare cycloartenol synthase (16 unigenes) and beta-amyrinsynthase (54 unigenes) respectively Moreover except forthe FPPS another five nucleotide sequences (SQS SE CAS(2) cycloartenol synthase and beta-amyrin) related to thetriterpenoid saponins biosynthesis and reported in NCBIwere found in our sequencing data

(2) Characterization and Expression Analysis of the GenesInvolved in the Putative Phenylpropanoid Biosynthesis Path-way Approximately 814 unigenes encoding 16 enzymeswere found in the phenylpropanoid biosynthesis pathway

(Figure 5(b)) But flavonoids and lignins were not maincompound of phenylpropanoids and researchers were rarelymentioned in P tenuifolia phenylpropanoid biosynthesispathway starts with converting phenylalanine to Cinnamateby PAL (EC 43124 11 unigenes) and then Cinnamateconverts to p-coumaryol CoA by the enzymes of C4H (EC1141311 10 unigenes) and 4CL (EC 62112 25 unigenes)

The flavonoid pathway began with the synthesis of chal-cone which was catalyzed by CHS (EC 23174 1 unigenes)and then catalyzed to naringenin by CHI (EC 5516 3unigenes) Dihydrotricetin was synthesized by F3101584051015840H (EC1141388 11 unigenes) F3H (EC 114119 60 unigenes) cat-alyzed these flavanones to dihydroflavonolsThe last step wasdihydroflavonols converted to flavonols by FLS (EC 11411234 unigenes)

The lignin pathway began with the synthesis of p-couma-royl shikimate which was catalyzed by HCT (EC 231133 4unigenes) Then C31015840H (EC 1141336) catalyzed the conver-sion of p-coumaroyl shikimate into caffeoyl shikimateCaffeoyl shikimate was converted by hydroxycinnamoyl-transferase to produce caffeoyl CoA CCoAoMT (EC211104) could convert these caffeoyl CoA to feruloyl CoAThen CCR (EC 12144 6 unigenes) catalyzed the feruloyl

International Journal of Genomics 7

20

2

13

5

6

7

1

26

28

37

Biosynthesis of ansamycins [PATH ko01051]

Biosynthesis of siderophore group nonribosomal

peptides [PATH ko01053]

Brassinosteroid biosynthesis [PATH ko00905]

Carotenoid biosynthesis [PATH ko00906]

Diterpenoid biosynthesis [PATH ko00904]

Geraniol degradation [PATH ko00281]

Limonene and pinene degradation [PATH ko00903]

Polyketide sugar unit biosynthesis [PATH ko00523]

Prenyltransferases [BR ko01006]

Terpenoid backbone biosynthesis [PATH ko00900]

Tetracycline biosynthesis [PATH ko00253]

Zeatin biosynthesis [PATH ko00908]

(a)

52

5

10

8

5

39

7

9

10

Butirosin and neomycin biosynthesis [PATH ko00524]

Caffeine metabolism [PATH ko00232]

Flavone and flavonol biosynthesis [PATH ko00944]

Flavonoid biosynthesis [PATH ko00941]

Isoquinoline alkaloid biosynthesis [PATH ko00950]

Novobiocin biosynthesis [PATH ko00401]

Phenylpropanoid biosynthesis [PATH ko00940]

Stilbenoid diarylheptanoid and gingerol biosynthesis

[PATH ko00945]

Streptomycin biosynthesis [PATH ko00521]

Tropane piperidine and pyridine alkaloid biosynthesis

[PATH ko00960]

(b)

Figure 4 Classifications of unigenes involved in the metabolism of terpenoids and polyketides (a) and biosynthesis of other secondarymetabolites (b)

Table 3 Frequency of identified SSR motifs in P tenuifolia

Motif length Repeat numbers Total 5 6 7 8 9 10 gt10

Mononucleotide mdash mdash mdash mdash mdash 1554 1890 3444 5175Dinucleotide mdash 638 374 333 333 317 125 2120 3185Trinucleotide 646 218 117 10 1 2 2 996 1497Tetranucleotide 64 12 0 0 0 0 0 76 114Pentanucleotide 2 1 0 0 0 0 0 3 0045Hexanucleotide 12 2 2 0 0 0 0 16 024Total 724 871 493 343 334 1873 2017 6655 1087 1309 741 515 502 479 191

CoA into coniferaldehyde Finally coniferaldehyde wasconverted by CAD (EC 111195 12 unigenes) to produceconiferyl alcohol Similar to the biosynthesis of the terpenoidbackbone C31015840H was not found from the transcriptomeresults

33 Simple Sequence Repeat (SSR) Marker Discovery A totalof 6655 SSRs and 5784 sequences were identified from39625 unigenes Among these sequences there was morethan one SSR for 755 unigene sequences Mononucleotideaccounted for 5175 and 3185 for dinucleotide and1497 for trinucleotide motify (Table 3) AT (1539) was the

most abundant repeat type followed by ATGA (207) andGAATCA (69) However the number of tetranucleotidepentanucleotide and hexanucleotide motifs was lt2 in theunigene sequences In this transcriptome database SSRs didnot all exist in the unigenes and the complex SSRmotifs werenot detected

4 Discussions

The advantages of high-throughput accuracy and repro-ducibility lead transcriptome sequencing to become a power-ful technology [31] Illmina sequencing 2000 has been widely

8 International Journal of Genomics

Acetyl CoA

DOXP

HMG-CoA

MVA

MVA-5-phosphate

MEP

Acetoacetyl CoA

CDP-ME

CDP-MEP

HMBPP

IPP

CME-PP

IPP

GPP

FPP

23-Oxidosqualene

Cycloartenol

Triterpenoid saponins

Steroidal saponins

MVA-5-diphosphate

Squalene

DMAPPDMAPP

AACT(13)

HMGS(18)

HMGR(18)

MVK(2)

PMK(4)

PMD(3)

DXS(17)

DXR(7)

MCT(7)

CMK(4)

MDS(1)

HDS(9)

HDR(12)

IPPI IPPI(15)

GPPS(5) GPPS

FPPS

SQS(7)

SE(19)b-AS(9)

CAS(4)CYPs and UGTs

CYPs and UGTs

MV

A p

ath

way

[cy

toso

l]

ME

PD

OX

P p

ath

way

[p

last

id]

GA-3P + pyruvate

120573-amyrin

(a)

Phenylalanine

Cinnamate

p-Coumarate

p-Coumaroyl CoA

PAL(11)

C4H(10)

4CL(25)

Coniferaldehyde

p-Coumaroyl shikimate

Caffeoyl shikimate

Caffeoyl CoA

Feruloyl CoA

HCT(4)

HCT(4)

CCoAoMT(13)

CCR(6)

Coniferyl alcohol

CAD(12)Flavonol

Chalcone

Naringenin

Flavanones

Dihydroflavonols

CHS(1)

CHI(3)

F3H(60)

FLS(4)

Lignin

UGT

F39984005998400H(11)

C3998400(H)

(b)

Figure 5 Schematic of the putative biosynthetic pathways of two major classes of active compounds in P tenuifolia Saponin biosynthesispathway (a) and phenylpropanoid biosynthesis pathway (b) Enzyme abbreviations AACT acetyl CoA C-acetyltransferase or acetoacetyl-CoA thiolase HMGS 3-hydroxy-3-methylglutaryl CoA synthase HMGR 3-hydroxy-3-methylglutaryl CoA reductase MVK mevalonatekinase PMK phosphomevalonate kinase PMD mevalonate pyrophosphate decarboxylase DXS 1-deoxy-d-xylulose-5-phosphate synthaseDXR 1-deoxy-d-xylulose-5-phosphate reductoisomerase MCT 2-C-methyl-erythritol 4-phosphate cytidylytransferase CMK 4-(Cytidine51015840-diphospho)-2-C-methyl-d-erythritol kinase MDS 2-C-methyl-d-erythritol 24-cyclodiphosphate synthase HDS 1-hydroxy-2-methyl-butenyl-4-diphosphatesynthase HDR isopentenyl pyrophosphate (IPP)3 3-dimethylallyl pyrophosphate (DMAPP) synthase IPPIisopentenyl pyrophosphate isomerase GPPS geranyl diphosphate synthase FPPS farnesyl diphosphate synthase SQS squalene synthaseSE squalene epoxidase CAS cycloartenol synthase 120573-AS 120573-amyrin synthase DS dammarenediol synthase PAL phenylalanine ammonialyase C4H cinnamate 4-hydroxylase 4CL 4-coumarate CoA ligase CHS chalcone synthase CHI chalcone isomerase F3101584051015840H flavonoid3101584051015840-hydroxylase F3H flavanone 3-hydroxylase FLS flavonol synthase HCT hydroxycinnamoyl-transferase C31015840H p-coumaroyl shikimate31015840 hydroxylase CCoAoMT caffeoyl CoA 3-O-methyltransferase CCR cinnanoyl-CoA reductase CAD cinnamyl alcohol dehydrogenaseCYPs cytochrome P450s UGTs glycosyltransferases

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

2 International Journal of Genomics

studies onP tenuifolia nevertheless genomic research relatedto this medicinal plant is feasible With the medicinal andeconomic importance of P tenuifolia genomic data sourcesof this plant species should be investigated to discover genesand develop further functional studies

DNA sequencing technology was developed by FrederickSanger in 1977 The ldquofirst-generation sequencingrdquo was labori-ous and costly After years of improvement next-generationsequencing (NGS) has been developed in terms of speed andaccuracy with reduced cost and manpower Typical examplesof NGS include SOLiDIon Torrent PGM from Life SciencesGenome AnalyzerHiSeq 2000MiSeq from Illumina andGS FLX TitaniumGS Junior from Roche [8] Among thesetechniques Illumina HiSeq 2000 yields the largest outputand entails the lowest reagent cost this technique has beencommonly used in deep sequencing of model and nonmodelorganisms [9] For example Illumina HiSeq 2000 has beensuccessfully applied in a wide range of species includingmodel plants (Arabidopsis rice tomato) [10ndash12] and cropswith large genomes including field pea castor and Sorghumpropinquum [13ndash15]This technique could promote studies onnot only the genomics of these plants but also the biosyntheticpathways of the main active ingredients

Triterpenoid saponins xanthones and oligosaccharideesters are currently considered as the main active ingredientsof P tenuifolia [16] However small amounts of secondarymetabolites such as isoquinoline alkaloids lignins andflavonols are present in P tenuifolia Indeed Illuminasequencing should be applied to determine the number ofenzymes implicated in the biosynthetic pathway of secondarymetabolites in P tenuifolia To date the biosyntheticpathway of triterpenoid saponins particularly pentacyclictriterpenoid saponins in P tenuifolia has been extensivelystudied [2] The formation of a primary cucurbitane skeletonvia the isoprenoid pathway is before the cyclization of23-oxidosqualene of terpenoid biosynthesis in plantMevalonate acid (MVA) pathway located in cytoplasm andendoplasmic reticulum and 2-C-methyl-d-erythritol-4-phosphate1-deoxy-d-xylulose-5-phosphate (MEPDOXP)pathway located in plasmid which are likely to participatein the formation of a presenegenin skeleton in P tenuifoliaThese two pathways are present in the different cellularparts of green plants [17 18] In addition phenylpropanoidbiosynthetic pathways in P tenuifolia are yet to be reportedPhenylalanine is an end product of the shikimate pathwayin which the aromatic amino acids tyrosine and tryptophanare also produced [19] With phenylalanine as a precursorlignin biosynthesis proceeds via a series of side-chainmodifications ring hydroxylations and O-methylations toyield lignin monomers [20] In this pathway other smallmolecules such as flavonoids coumarins hydroxycinnamicacid conjugates and lignans are also produced Many ofthese molecules could be or have been the main focusof studies [21] These two metabolic pathways are mainlyfound in P tenuifolia Therefore whole transcripts forcomplete gene expression profiling should be identifiedby transcriptome sequencing because of limited genomicsources and information regarding the biosynthetic pathwaysof secondary metabolites in P tenuifolia It will support

further studies on these two metabolic pathways by usinghigh-throughput Illumina deep-sequencing techniques

In this study a cDNA library was constructed to obtaindetailed and general data from P tenuifolia by using a high-throughput Illumina deep-sequencing technique We coulddetermine genes encoding the enzymes involved in wholebiosynthetic pathways of triterpenoid saponins phenyl-propanoids and other secondary metabolites and resultscould promote further analysis Furthermore our resultscould provide direct experimental data of P tenuifolia notonly for guiding genetic breeding but also for subsequenttriterpenoid biosynthesis cloning and functional verificationof key genes Therefore this transcriptome sequencing mayhelp improve future genetics and genomics studies onmolec-ular mechanisms associated with the chemical compositionsof P tenuifolia Eventually we can control the quality of Ptenuifolia from germplasm sources and thus provide impor-tant theoretical and practical significance

2 Materials and Methods

21 Plant Materials Fresh roots of one- two- and three-year-old P tenuifolia were sampled and collected in XinjiangCounty Shanxi Province China Another batch of two-year-old fresh roots of P tenuifolia was collected in Anguo CityHebei Province All of the samples were intact and healthyThe samples were snap-frozen in liquid nitrogen and storedat minus80∘C before use

22 RNA Extraction cDNA Library Construction and Illu-mina Sequencing According to the manufacturerrsquos protocola plant RNA isolation kit (AutoLab) was used to extract thetotal RNA from fresh tissues and then purified by an RNeasyMiniElute cleanup kit (Qiagen) In brief mRNA was purifiedwith the combination of oligo-dT-containing beads and poly-A-containing mRNA and then used methods of chemicalsand high temperature to fragmentate the mRNA and getfragments of 150 bp The next step was to synthesize thedouble-stranded cDNA and repair the DNA fragments withprotruding terminal through the function of 31015840-51015840 exonucle-ase and polymerase In order to guarantee DNA fragmentsand adapters can combine with ldquoArdquo and ldquoTrdquo complementarypair connection and prevent the inserted DNA fragmentsconnecting with each other single-base ldquoArdquo was imported tothe 31015840 end of the bluntDNAand single-base ldquoTrdquowas importedto the 31015840 end of the adapters Adapters of the tag-containingand theDNA fragments were incubated and connected by thefunction of ligase The free adapters and nonadapters DNAfragments were purified by using AMPureXP beads DNAfragments with adapters were enriched selectively by PCRand the PCR amplification with fewer number of cycles wasoperated in order to avoid errors in the library

The methods of quantitative of library were Pico greenand fluorescence spectrophotometer (Quantifluor-ST flu-orometer Promega E6090 Quant-iT PicoGreen dsDNAAssay Kit Invitrogen P7589) and the quality control ofthe enrichment of the PCR fragments and validation ofthe size and distribution of DNA fragments in the library

International Journal of Genomics 3

Table 1 Functional annotation of P tenuifolia

Database Total119864 cutoff Database version

Number of annotated Percent ()Total unigenes 115477Nt 69211 599 100119890 minus 05 201301Nr 77599 672 100119890 minus 05 201301Swissprot 54582 473 100119890 minus 10 201301COG 30548 265 100119890 minus 10 No versionKEGG 72636 629 100119890 minus 10 Release58Interpro 49873 4319 Interproscan 48 v36GO 41823 3622

were conducted usingAgilent 2100 (Agilent 2100 BioanalyzerAgilent 2100 Agilent High Sensitivity DNA Kit Agilent5067-4626) The samples mixed after homogenization to10 nM then diluted gradually and quantitated to 4sim5 pMfor Illumina sequencing to construct the multiplexed DNAlibraries Illumina sequencing was conducted with IlluminaHiSeqTM2000 in CapitalBio Ltd Beijing China

23 De Novo Assembly and Functional Annotation Sequenc-ing of the original data sequence throughquality analysis afterremoval of low quality and subsequences gets the sequencesavailable for subsequent analysis A Perl programwas writtento remove reads with adapters the reads with the averagequality of less than Q20 for its 31015840 end to the 51015840 end thereads with the final length of less sequence than 50 bp andthe reads with the uncertainty bases High-quality readswere used for de novo assembly by the Trinity softwareto construct unique consensus sequences Trimmed solexatranscriptome reads were mapped onto unique consensussequences by using Bowtie [22] (Bowtie parameters -v3 -all -best -strata)The functional annotation of unigenes was com-pared with National Center for Biotechnology Information(NCBI) nonredundant protein database (Nr) nonredundantnucleotide databases (Nt) UniProtSwiss-Prot Kyoto Ency-clopedia of Genes andGenomes (KEGG) Clusters of Orthol-ogous Groups (COG) Gene Ontology (GO) and Interprodatabases with Basic Local Alignment Search Tool (BLASThttpwwwncbinlmnihgov) Unigenes were identifiedby comparing sequence similarity against SWISS-PROT(SWISS-PROT downloaded from European BioinformaticsInstitute ftpftpebiacukpubdatabasesswissprot) COG[23 24] and KEGG database [25] with BLAST at 119864 valuesle 1119890minus10TheKO information retrieved fromblast results by aPerl script and the pathways between databases and unigeneswere established InterProScan [26] was used to annotatethe InterPro domains Then the functional assignmentsof InterPro domains were mapped onto GO and the GOclassification and tree were performed by WEGO [27]

3 Results

31 Sequence Assembly The RNA extracted from all thesamples was mixed for Illmina sequencing in order to getthe whole range of transcript diversity In total 5888 million

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Un

igen

es

Length

Unigene length distribution

Unigenes

400

499

600

699

800

899

1000

1099

1200

1299

1400

1499

1600

1699

1800

1899

2000

2099

2200

2299

2400

2499

2600

2699

2800

2899

3000

3099

3200

3299

3400

3499

3600

3699

3800

3899

4000

4099

4200

4299

4400

4499

4600

4699

4800

4899

200

299

Figure 1 Length distribution of unigenes

raw reads and 1177 gigabase pairs were sequenced with anaverage GC content of 439 119876 ge 20 and no ambiguousldquoNrdquo (Table S1 in Supplementary Material available online athttpdxdoiorg1011552015782635) A total of 55432632high-quality reads were assembled and 316703 contigs and145857 transcripts with the N50 values were 1636 bp All thetranscripts were subjected to cluster and assembly analysesyielding 39625 unigenes with a mean length of 1378 bpand the N50 values were 1971 bp (Figure 1) The assemblystatistics of contigs transcripts and unigenes were shown inTable S2

32 Functional Annotation and Classification In the presentlibrary approximately 672 of the unigenes were annotatedby BLASTX and BLASTN search with a threshold of 10minus5against seven public databases including Nr Nt UniProtSwiss-Prot KEGG COG GO and Interpro databases(Table 1)

321 GO Annotation A total of 41832 unigenes for GOannotation were divided into three major categories includ-ing cellular location molecular function and biologicalprocess Among 34 functional groups with GO assignmentsthe dominant biochemistry cell apoptosis and metabolismwere annotated (Figure 2)

4 International Journal of Genomics

Cel

l

Cel

l p

art

En

velo

pe

Ext

race

llu

lar

regi

on

Mac

rom

ole

cula

r co

mp

lex

Mem

bra

ne

encl

ose

d lu

men

Org

anel

le

Org

anel

le p

art

An

tio

xid

ant

acti

vity

Bin

din

g

Cat

alyt

ic a

ctiv

ity

Ele

ctro

n c

arri

er a

ctiv

ity

En

zym

e re

gula

tor

acti

vity

Mo

lecu

lar

tran

sdu

cer

acti

vity

Nu

trie

nt

rese

rvo

ir a

ctiv

ity

Stru

ctu

ral

mo

lecu

le a

ctiv

ity

Tra

nsc

rip

tio

n r

egu

lato

r ac

tivi

ty

Tra

nsl

atio

n r

egu

lato

r ac

tivi

ty

Tra

nsp

ort

er a

ctiv

ity

An

ato

mic

al s

tru

ctu

re f

orm

atio

n

Bio

logi

cal

regu

lati

on

Cel

lula

r co

mp

on

ent

bio

gen

esis

Cel

lula

r co

mp

on

ent

org

aniz

atio

n

Cel

lula

r p

roce

ss

Dev

elo

pm

enta

l pro

cess

Est

abli

shm

ent

of

loca

liza

tio

n

Lo

cali

zati

on

Met

abo

lic

pro

cess

Mu

lti

org

anis

m p

roce

ss

Mu

ltic

ellu

lar

org

anis

mal

pro

cess

Pig

men

tati

on

Rep

rod

uct

ion

Rep

rod

uct

ive

pro

cess

Res

po

nse

to

sti

mu

lus

Vir

al r

epro

du

ctio

n

Cellular location Molecular function Biological process

01

Gen

es (

)

1

10

100 41820

4182

Nu

mb

er o

f ge

nes

418

41

GO-standard

Figure 2 Gene ontology classification assigned to the unigenes

In cellular location eight classifications were clustered bythe matched unique sequences and the larger subcategorieswere ldquocellrdquo and ldquocell partrdquo In terms of molecular function11 classifications were categorized including the representedcellular components such as ldquobindingrdquo and ldquocatalytic activ-ityrdquo According to biological processes 16 classifications weredivided including the represented biological processes suchas ldquocellular processrdquo and ldquometabolic processrdquo However fewgenes were assigned to the categories ldquonutrient reservoiractivityrdquo and ldquoviral reproductionrdquo and no genes were foundin the clusters of ldquodevelopmental processrdquo

322 COGAnnotation Unigenes for theCOGclassificationswere annotated in order to further evaluate the effectivenessof the annotation process (Figure 3) Class definition num-ber and percentage of this class were shown in Figure 3 Theproteins in the COG categories were assumed to have thesame ancestor protein and these proteins were also assumedto be paralogs or orthologs The largest category was generalfunctional prediction only with 2484 the second cate-gories were posttranslational modification protein turnoverand chaperon with 897 and the third category includedtranslation ribosomal structure and biogenesis with 724Cell motility (24 009) and nuclear structure (10 004)represented the smallest category

323 KEGG Annotation The KEGG database can be usedto categorize gene functions with an emphasis on biochem-ical pathways The KEGG database can also be used tosystematically analyze inner cell metabolic pathways andfunctions of gene products Pathway-based analyses helpfurther determine the biological functions of genes To gainfurther insights into the biological pathways in P tenuifolia

we performed a BLASTX search against the KEGG proteindatabase on the assembled unigenes A total of 72636 (629)unigenes had blast hits and 42702 unigenes were assignedto five KEGG biochemical pathways (Table 2) includingmetabolism (17217 unigenes) genetic information processing(14593 unigenes) environmental information (2719 uni-genes) cellular processes (3919 unigenes) and human dis-eases (4254 unigenes)Thepathwayswith the highest unigenerepresentation were spliceosome (ko03041 1431 unigenes)chromosome (ko03036 1232 unigenes) and ubiquitin system(ko04121 1060 unigenes)

As a Chinese traditional medicinal plant P tenuifoliaundergoes highly active metabolism The higher expressionof metabolism results in primary metabolite biosynthesissuch as ldquocarbohydrate metabolismrdquo (3702 unigenes) ldquoaminoacid metabolismrdquo (2168 unigenes) and ldquoenergy metabolismrdquo(1762 unigenes) However the most active ingredients of Ptenuifolia were secondary metabolites Therefore we listedthe classifications of terpenoids polyketides and other sec-ondary metabolites (Figure 4) with high expression levels

Metabolic pathways were involved in the metabolismof terpenoids and polyketides (620 unigenes Figure 4(a))including ldquoterpenoid backbone biosynthesisrdquo (176 unigenes28) ldquocarotenoid biosynthesisrdquo (78 unigenes 13) ldquozeatinbiosynthesisrdquo (44 unigenes 7) ldquolimonene and pinenedegradationrdquo (46 unigenes 7) ldquoditerpenoid biosynthesisrdquo(30 unigenes 5) ldquotetracycline biosynthesisrdquo (16 unigenes3) ldquobrassinosteroid biosynthesisrdquo (13 unigenes 2) andldquopolyketide sugar unit biosynthesisrdquo (5 unigenes 1) Thelarge amount of transcriptomic informationmay enhance thestudy of terpenoid biosynthesis in P tenuifolia

Metabolic pathways were also involved in the bio-synthesis of other secondary metabolites (522 unigenes

International Journal of Genomics 5

Class definition number of this class percent of this class

Cluster of orthologous groups

General function prediction only 6401 2484

Posttranslational modification protein turnover chaperones 2310 897

Translation ribosomal structure and biogenesis1865 724

Carbohydrate transport an metabolism 1712 664

Amino acid transport and metabolism 1427 554

Replication recombination and repair 1319 512

Energy production and conversion 1152 447

Signal transduction mechanisms 1110 431

Transcription 1045 406

Lipid transport and metabolism 1021 396

Inorganic ion transport and metabolism

821 319

Function unknown 809 314

Coenzyme transport and metabolism 728 283

Cell wallmembraneenvelope biogenesis 721 280

Secondary metabolites biosynthesis transport and catabolism 621 241

Nucleotide transport and metabolism 588 228

Intracellular trafficking secretion and vesicular transport 530 206

Cytoskeleton 392 152

Defense mechanism 344 134

Chromatin structure and dynamics 331 128

Cell cycle control cell division and chromosome partitioning 273 106

RNA processing and modification 211 082

Cell motility 24 009

Nuclear structure 10 004

Figure 3 COG classification assigned to the unigenes

Figure 4(b)) including ldquophenylpropanoid biosynthesisrdquo (202unigenes 39) ldquoflavonoid biosynthesisrdquo (55 unigenes 11)ldquotropane piperidine and pyridine alkaloid biosynthesisrdquo (53unigenes 9) ldquostreptomycin biosynthesisrdquo (45 unigenes9) ldquoisoquinoline alkaloid biosynthesisrdquo (44 unigenes 8)ldquostilbenoid diarylheptanoid and gingerol biosynthesisrdquo (35unigenes 7) ldquoflavone and flavonol biosynthesisrdquo (27 uni-genes 5) and ldquobutirosin and neomycin biosynthesisrdquo (25unigenes 5) In our sequence dataset a total of 176 unigeneswere found to be potentially related to the biosynthesis of ter-penoid biosynthesis including MEP and MVA pathways Inaddition 202 and 55 unigeneswere related to the biosynthesesof phenylpropanoid and flavonoid respectively

(1) Characterization and Expression Analysis of the Genes In-volved in the Biosynthetic Pathway of Putative Terpenoid Back-bone Dimethylallyl pyrophosphate (DMAPP) and isopen-tenyl pyrophosphate (IPP) are recognized as a precursor ofthe biosynthesis of terpenoid saponins in green plants andthe compounds of ldquoactivity of isoprenerdquo in vivo DMAPP andIPP are alletic isomers which are synthesized by MVA andMEP biosynthesis pathwaysmentioned above [28]TheMVApathway starts with acetyl-CoAwhich is synthesized throughthe triterpene derivatives captured CO

2in cytoplasm and the

MEP pathway starts with pyruvate and glyceraldehydes-3-phosphate in plastid [29] In these databases the biosyntheticpathways of MVA and MEP were identified (Figure 5(a)) Inmost cases two or more unique sequences were labeled asthe same enzyme and these unique sequences could representsingle gene or different fragments of gene

The process of MVA pathway was as follows Firstly twomolecules of acetyl-CoAwere catalyzed into acetoacetyl CoAby AACT (EC 2319 13 unigenes) and then catalyzed into3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) by HMGS(EC 23310 18 unigenes) MVA was formulated by HMGR(EC 11134 18 unigenes) and then MVK (EC 27136 2unigenes) catalyzed MVA into MVA-5-Phosphat Next IPPwas synthesized by PMK (EC 2742 4 unigenes) and PMD(EC 41133 3 unigenes) The process of MEP pathway wasas follows Firstly pyruvate and glyceraldehydes-3-phosphatewere converted to 1-deoxy-d-xylulose-5-phosphate (DXOP)by DXS (EC 2217 17 unigenes) and then catalyzed intoMEPby DXR (EC 111267 7 unigenes) Next MCT (EC 277607 unigenes) and CMK (EC 271148 4 unigenes) catalyzedtheMEP to 4-dicytidine triphosphate-2-methyl-d-erythritol-2-phosphate (CDP-MEP) Lastly IPP was synthesized byMDS (EC 46112 1 unigene) HDS (EC 11771 9 unigenes)and HDR (EC 11712 12 unigenes) The next step was the

6 International Journal of Genomics

Table 2 Pathway classification of P tenuifolia

Category Pathway Count

Metabolism

Carbohydrate metabolism 3702Amino acid metabolism 2618

Enzyme families 1781Energy metabolism 1762Lipid metabolism 1610

Glycan biosynthesis and metabolism 1390Nucleotide metabolism 1055

Metabolism of cofactors and vitamins 879Metabolism of other amino acids 716

Metabolism of terpenoids and polyketides 620Xenobiotics biodegradation and metabolism 562Biosynthesis of other secondary metabolites 522

Genetic information processing

Folding sorting and degradation 4312Translation 4029Transcription 3191

Replication and repair 3061

Environmental information processingSignal transduction 2050Membrane transport 342

Signaling molecules and interaction 327

Cellular processes

Cell growth and death 1540Transport and catabolism 1399

Cell motility 507Cell communication 473

Human diseases

Neurodegenerative diseases 1398Infectious diseases 1371

Cancers 1146Immune system diseases 236Cardiovascular diseases 103

synthesis of squalene a series of enzymes were involvedsuch as IPPI (EC 5332 15 unigenes) and SQS (EC 251217 unigenes) However GPPS (EC 25129) and FPPS (EC25110) involved in terpenoid biosynthesis were not foundfrom the transcriptome resultsmaybe because some assemblycDNAs were not of full length Squalene was catalyzed into23-oxidosqualene by SE (EC 114997 19 unigenes) In manyplants 23-oxisqualene cyclization was a very important stepin the biosynthesis of terpenes because it was a branchpoint of the synthesis of triterpenoid saponin sterol andother terpenes [30] We have found that two types of OSCgenes of P tenuifolia are present in the dataset these typesare cycloartenol synthase (16 unigenes) and beta-amyrinsynthase (54 unigenes) respectively Moreover except forthe FPPS another five nucleotide sequences (SQS SE CAS(2) cycloartenol synthase and beta-amyrin) related to thetriterpenoid saponins biosynthesis and reported in NCBIwere found in our sequencing data

(2) Characterization and Expression Analysis of the GenesInvolved in the Putative Phenylpropanoid Biosynthesis Path-way Approximately 814 unigenes encoding 16 enzymeswere found in the phenylpropanoid biosynthesis pathway

(Figure 5(b)) But flavonoids and lignins were not maincompound of phenylpropanoids and researchers were rarelymentioned in P tenuifolia phenylpropanoid biosynthesispathway starts with converting phenylalanine to Cinnamateby PAL (EC 43124 11 unigenes) and then Cinnamateconverts to p-coumaryol CoA by the enzymes of C4H (EC1141311 10 unigenes) and 4CL (EC 62112 25 unigenes)

The flavonoid pathway began with the synthesis of chal-cone which was catalyzed by CHS (EC 23174 1 unigenes)and then catalyzed to naringenin by CHI (EC 5516 3unigenes) Dihydrotricetin was synthesized by F3101584051015840H (EC1141388 11 unigenes) F3H (EC 114119 60 unigenes) cat-alyzed these flavanones to dihydroflavonolsThe last step wasdihydroflavonols converted to flavonols by FLS (EC 11411234 unigenes)

The lignin pathway began with the synthesis of p-couma-royl shikimate which was catalyzed by HCT (EC 231133 4unigenes) Then C31015840H (EC 1141336) catalyzed the conver-sion of p-coumaroyl shikimate into caffeoyl shikimateCaffeoyl shikimate was converted by hydroxycinnamoyl-transferase to produce caffeoyl CoA CCoAoMT (EC211104) could convert these caffeoyl CoA to feruloyl CoAThen CCR (EC 12144 6 unigenes) catalyzed the feruloyl

International Journal of Genomics 7

20

2

13

5

6

7

1

26

28

37

Biosynthesis of ansamycins [PATH ko01051]

Biosynthesis of siderophore group nonribosomal

peptides [PATH ko01053]

Brassinosteroid biosynthesis [PATH ko00905]

Carotenoid biosynthesis [PATH ko00906]

Diterpenoid biosynthesis [PATH ko00904]

Geraniol degradation [PATH ko00281]

Limonene and pinene degradation [PATH ko00903]

Polyketide sugar unit biosynthesis [PATH ko00523]

Prenyltransferases [BR ko01006]

Terpenoid backbone biosynthesis [PATH ko00900]

Tetracycline biosynthesis [PATH ko00253]

Zeatin biosynthesis [PATH ko00908]

(a)

52

5

10

8

5

39

7

9

10

Butirosin and neomycin biosynthesis [PATH ko00524]

Caffeine metabolism [PATH ko00232]

Flavone and flavonol biosynthesis [PATH ko00944]

Flavonoid biosynthesis [PATH ko00941]

Isoquinoline alkaloid biosynthesis [PATH ko00950]

Novobiocin biosynthesis [PATH ko00401]

Phenylpropanoid biosynthesis [PATH ko00940]

Stilbenoid diarylheptanoid and gingerol biosynthesis

[PATH ko00945]

Streptomycin biosynthesis [PATH ko00521]

Tropane piperidine and pyridine alkaloid biosynthesis

[PATH ko00960]

(b)

Figure 4 Classifications of unigenes involved in the metabolism of terpenoids and polyketides (a) and biosynthesis of other secondarymetabolites (b)

Table 3 Frequency of identified SSR motifs in P tenuifolia

Motif length Repeat numbers Total 5 6 7 8 9 10 gt10

Mononucleotide mdash mdash mdash mdash mdash 1554 1890 3444 5175Dinucleotide mdash 638 374 333 333 317 125 2120 3185Trinucleotide 646 218 117 10 1 2 2 996 1497Tetranucleotide 64 12 0 0 0 0 0 76 114Pentanucleotide 2 1 0 0 0 0 0 3 0045Hexanucleotide 12 2 2 0 0 0 0 16 024Total 724 871 493 343 334 1873 2017 6655 1087 1309 741 515 502 479 191

CoA into coniferaldehyde Finally coniferaldehyde wasconverted by CAD (EC 111195 12 unigenes) to produceconiferyl alcohol Similar to the biosynthesis of the terpenoidbackbone C31015840H was not found from the transcriptomeresults

33 Simple Sequence Repeat (SSR) Marker Discovery A totalof 6655 SSRs and 5784 sequences were identified from39625 unigenes Among these sequences there was morethan one SSR for 755 unigene sequences Mononucleotideaccounted for 5175 and 3185 for dinucleotide and1497 for trinucleotide motify (Table 3) AT (1539) was the

most abundant repeat type followed by ATGA (207) andGAATCA (69) However the number of tetranucleotidepentanucleotide and hexanucleotide motifs was lt2 in theunigene sequences In this transcriptome database SSRs didnot all exist in the unigenes and the complex SSRmotifs werenot detected

4 Discussions

The advantages of high-throughput accuracy and repro-ducibility lead transcriptome sequencing to become a power-ful technology [31] Illmina sequencing 2000 has been widely

8 International Journal of Genomics

Acetyl CoA

DOXP

HMG-CoA

MVA

MVA-5-phosphate

MEP

Acetoacetyl CoA

CDP-ME

CDP-MEP

HMBPP

IPP

CME-PP

IPP

GPP

FPP

23-Oxidosqualene

Cycloartenol

Triterpenoid saponins

Steroidal saponins

MVA-5-diphosphate

Squalene

DMAPPDMAPP

AACT(13)

HMGS(18)

HMGR(18)

MVK(2)

PMK(4)

PMD(3)

DXS(17)

DXR(7)

MCT(7)

CMK(4)

MDS(1)

HDS(9)

HDR(12)

IPPI IPPI(15)

GPPS(5) GPPS

FPPS

SQS(7)

SE(19)b-AS(9)

CAS(4)CYPs and UGTs

CYPs and UGTs

MV

A p

ath

way

[cy

toso

l]

ME

PD

OX

P p

ath

way

[p

last

id]

GA-3P + pyruvate

120573-amyrin

(a)

Phenylalanine

Cinnamate

p-Coumarate

p-Coumaroyl CoA

PAL(11)

C4H(10)

4CL(25)

Coniferaldehyde

p-Coumaroyl shikimate

Caffeoyl shikimate

Caffeoyl CoA

Feruloyl CoA

HCT(4)

HCT(4)

CCoAoMT(13)

CCR(6)

Coniferyl alcohol

CAD(12)Flavonol

Chalcone

Naringenin

Flavanones

Dihydroflavonols

CHS(1)

CHI(3)

F3H(60)

FLS(4)

Lignin

UGT

F39984005998400H(11)

C3998400(H)

(b)

Figure 5 Schematic of the putative biosynthetic pathways of two major classes of active compounds in P tenuifolia Saponin biosynthesispathway (a) and phenylpropanoid biosynthesis pathway (b) Enzyme abbreviations AACT acetyl CoA C-acetyltransferase or acetoacetyl-CoA thiolase HMGS 3-hydroxy-3-methylglutaryl CoA synthase HMGR 3-hydroxy-3-methylglutaryl CoA reductase MVK mevalonatekinase PMK phosphomevalonate kinase PMD mevalonate pyrophosphate decarboxylase DXS 1-deoxy-d-xylulose-5-phosphate synthaseDXR 1-deoxy-d-xylulose-5-phosphate reductoisomerase MCT 2-C-methyl-erythritol 4-phosphate cytidylytransferase CMK 4-(Cytidine51015840-diphospho)-2-C-methyl-d-erythritol kinase MDS 2-C-methyl-d-erythritol 24-cyclodiphosphate synthase HDS 1-hydroxy-2-methyl-butenyl-4-diphosphatesynthase HDR isopentenyl pyrophosphate (IPP)3 3-dimethylallyl pyrophosphate (DMAPP) synthase IPPIisopentenyl pyrophosphate isomerase GPPS geranyl diphosphate synthase FPPS farnesyl diphosphate synthase SQS squalene synthaseSE squalene epoxidase CAS cycloartenol synthase 120573-AS 120573-amyrin synthase DS dammarenediol synthase PAL phenylalanine ammonialyase C4H cinnamate 4-hydroxylase 4CL 4-coumarate CoA ligase CHS chalcone synthase CHI chalcone isomerase F3101584051015840H flavonoid3101584051015840-hydroxylase F3H flavanone 3-hydroxylase FLS flavonol synthase HCT hydroxycinnamoyl-transferase C31015840H p-coumaroyl shikimate31015840 hydroxylase CCoAoMT caffeoyl CoA 3-O-methyltransferase CCR cinnanoyl-CoA reductase CAD cinnamyl alcohol dehydrogenaseCYPs cytochrome P450s UGTs glycosyltransferases

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

International Journal of Genomics 3

Table 1 Functional annotation of P tenuifolia

Database Total119864 cutoff Database version

Number of annotated Percent ()Total unigenes 115477Nt 69211 599 100119890 minus 05 201301Nr 77599 672 100119890 minus 05 201301Swissprot 54582 473 100119890 minus 10 201301COG 30548 265 100119890 minus 10 No versionKEGG 72636 629 100119890 minus 10 Release58Interpro 49873 4319 Interproscan 48 v36GO 41823 3622

were conducted usingAgilent 2100 (Agilent 2100 BioanalyzerAgilent 2100 Agilent High Sensitivity DNA Kit Agilent5067-4626) The samples mixed after homogenization to10 nM then diluted gradually and quantitated to 4sim5 pMfor Illumina sequencing to construct the multiplexed DNAlibraries Illumina sequencing was conducted with IlluminaHiSeqTM2000 in CapitalBio Ltd Beijing China

23 De Novo Assembly and Functional Annotation Sequenc-ing of the original data sequence throughquality analysis afterremoval of low quality and subsequences gets the sequencesavailable for subsequent analysis A Perl programwas writtento remove reads with adapters the reads with the averagequality of less than Q20 for its 31015840 end to the 51015840 end thereads with the final length of less sequence than 50 bp andthe reads with the uncertainty bases High-quality readswere used for de novo assembly by the Trinity softwareto construct unique consensus sequences Trimmed solexatranscriptome reads were mapped onto unique consensussequences by using Bowtie [22] (Bowtie parameters -v3 -all -best -strata)The functional annotation of unigenes was com-pared with National Center for Biotechnology Information(NCBI) nonredundant protein database (Nr) nonredundantnucleotide databases (Nt) UniProtSwiss-Prot Kyoto Ency-clopedia of Genes andGenomes (KEGG) Clusters of Orthol-ogous Groups (COG) Gene Ontology (GO) and Interprodatabases with Basic Local Alignment Search Tool (BLASThttpwwwncbinlmnihgov) Unigenes were identifiedby comparing sequence similarity against SWISS-PROT(SWISS-PROT downloaded from European BioinformaticsInstitute ftpftpebiacukpubdatabasesswissprot) COG[23 24] and KEGG database [25] with BLAST at 119864 valuesle 1119890minus10TheKO information retrieved fromblast results by aPerl script and the pathways between databases and unigeneswere established InterProScan [26] was used to annotatethe InterPro domains Then the functional assignmentsof InterPro domains were mapped onto GO and the GOclassification and tree were performed by WEGO [27]

3 Results

31 Sequence Assembly The RNA extracted from all thesamples was mixed for Illmina sequencing in order to getthe whole range of transcript diversity In total 5888 million

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Un

igen

es

Length

Unigene length distribution

Unigenes

400

499

600

699

800

899

1000

1099

1200

1299

1400

1499

1600

1699

1800

1899

2000

2099

2200

2299

2400

2499

2600

2699

2800

2899

3000

3099

3200

3299

3400

3499

3600

3699

3800

3899

4000

4099

4200

4299

4400

4499

4600

4699

4800

4899

200

299

Figure 1 Length distribution of unigenes

raw reads and 1177 gigabase pairs were sequenced with anaverage GC content of 439 119876 ge 20 and no ambiguousldquoNrdquo (Table S1 in Supplementary Material available online athttpdxdoiorg1011552015782635) A total of 55432632high-quality reads were assembled and 316703 contigs and145857 transcripts with the N50 values were 1636 bp All thetranscripts were subjected to cluster and assembly analysesyielding 39625 unigenes with a mean length of 1378 bpand the N50 values were 1971 bp (Figure 1) The assemblystatistics of contigs transcripts and unigenes were shown inTable S2

32 Functional Annotation and Classification In the presentlibrary approximately 672 of the unigenes were annotatedby BLASTX and BLASTN search with a threshold of 10minus5against seven public databases including Nr Nt UniProtSwiss-Prot KEGG COG GO and Interpro databases(Table 1)

321 GO Annotation A total of 41832 unigenes for GOannotation were divided into three major categories includ-ing cellular location molecular function and biologicalprocess Among 34 functional groups with GO assignmentsthe dominant biochemistry cell apoptosis and metabolismwere annotated (Figure 2)

4 International Journal of Genomics

Cel

l

Cel

l p

art

En

velo

pe

Ext

race

llu

lar

regi

on

Mac

rom

ole

cula

r co

mp

lex

Mem

bra

ne

encl

ose

d lu

men

Org

anel

le

Org

anel

le p

art

An

tio

xid

ant

acti

vity

Bin

din

g

Cat

alyt

ic a

ctiv

ity

Ele

ctro

n c

arri

er a

ctiv

ity

En

zym

e re

gula

tor

acti

vity

Mo

lecu

lar

tran

sdu

cer

acti

vity

Nu

trie

nt

rese

rvo

ir a

ctiv

ity

Stru

ctu

ral

mo

lecu

le a

ctiv

ity

Tra

nsc

rip

tio

n r

egu

lato

r ac

tivi

ty

Tra

nsl

atio

n r

egu

lato

r ac

tivi

ty

Tra

nsp

ort

er a

ctiv

ity

An

ato

mic

al s

tru

ctu

re f

orm

atio

n

Bio

logi

cal

regu

lati

on

Cel

lula

r co

mp

on

ent

bio

gen

esis

Cel

lula

r co

mp

on

ent

org

aniz

atio

n

Cel

lula

r p

roce

ss

Dev

elo

pm

enta

l pro

cess

Est

abli

shm

ent

of

loca

liza

tio

n

Lo

cali

zati

on

Met

abo

lic

pro

cess

Mu

lti

org

anis

m p

roce

ss

Mu

ltic

ellu

lar

org

anis

mal

pro

cess

Pig

men

tati

on

Rep

rod

uct

ion

Rep

rod

uct

ive

pro

cess

Res

po

nse

to

sti

mu

lus

Vir

al r

epro

du

ctio

n

Cellular location Molecular function Biological process

01

Gen

es (

)

1

10

100 41820

4182

Nu

mb

er o

f ge

nes

418

41

GO-standard

Figure 2 Gene ontology classification assigned to the unigenes

In cellular location eight classifications were clustered bythe matched unique sequences and the larger subcategorieswere ldquocellrdquo and ldquocell partrdquo In terms of molecular function11 classifications were categorized including the representedcellular components such as ldquobindingrdquo and ldquocatalytic activ-ityrdquo According to biological processes 16 classifications weredivided including the represented biological processes suchas ldquocellular processrdquo and ldquometabolic processrdquo However fewgenes were assigned to the categories ldquonutrient reservoiractivityrdquo and ldquoviral reproductionrdquo and no genes were foundin the clusters of ldquodevelopmental processrdquo

322 COGAnnotation Unigenes for theCOGclassificationswere annotated in order to further evaluate the effectivenessof the annotation process (Figure 3) Class definition num-ber and percentage of this class were shown in Figure 3 Theproteins in the COG categories were assumed to have thesame ancestor protein and these proteins were also assumedto be paralogs or orthologs The largest category was generalfunctional prediction only with 2484 the second cate-gories were posttranslational modification protein turnoverand chaperon with 897 and the third category includedtranslation ribosomal structure and biogenesis with 724Cell motility (24 009) and nuclear structure (10 004)represented the smallest category

323 KEGG Annotation The KEGG database can be usedto categorize gene functions with an emphasis on biochem-ical pathways The KEGG database can also be used tosystematically analyze inner cell metabolic pathways andfunctions of gene products Pathway-based analyses helpfurther determine the biological functions of genes To gainfurther insights into the biological pathways in P tenuifolia

we performed a BLASTX search against the KEGG proteindatabase on the assembled unigenes A total of 72636 (629)unigenes had blast hits and 42702 unigenes were assignedto five KEGG biochemical pathways (Table 2) includingmetabolism (17217 unigenes) genetic information processing(14593 unigenes) environmental information (2719 uni-genes) cellular processes (3919 unigenes) and human dis-eases (4254 unigenes)Thepathwayswith the highest unigenerepresentation were spliceosome (ko03041 1431 unigenes)chromosome (ko03036 1232 unigenes) and ubiquitin system(ko04121 1060 unigenes)

As a Chinese traditional medicinal plant P tenuifoliaundergoes highly active metabolism The higher expressionof metabolism results in primary metabolite biosynthesissuch as ldquocarbohydrate metabolismrdquo (3702 unigenes) ldquoaminoacid metabolismrdquo (2168 unigenes) and ldquoenergy metabolismrdquo(1762 unigenes) However the most active ingredients of Ptenuifolia were secondary metabolites Therefore we listedthe classifications of terpenoids polyketides and other sec-ondary metabolites (Figure 4) with high expression levels

Metabolic pathways were involved in the metabolismof terpenoids and polyketides (620 unigenes Figure 4(a))including ldquoterpenoid backbone biosynthesisrdquo (176 unigenes28) ldquocarotenoid biosynthesisrdquo (78 unigenes 13) ldquozeatinbiosynthesisrdquo (44 unigenes 7) ldquolimonene and pinenedegradationrdquo (46 unigenes 7) ldquoditerpenoid biosynthesisrdquo(30 unigenes 5) ldquotetracycline biosynthesisrdquo (16 unigenes3) ldquobrassinosteroid biosynthesisrdquo (13 unigenes 2) andldquopolyketide sugar unit biosynthesisrdquo (5 unigenes 1) Thelarge amount of transcriptomic informationmay enhance thestudy of terpenoid biosynthesis in P tenuifolia

Metabolic pathways were also involved in the bio-synthesis of other secondary metabolites (522 unigenes

International Journal of Genomics 5

Class definition number of this class percent of this class

Cluster of orthologous groups

General function prediction only 6401 2484

Posttranslational modification protein turnover chaperones 2310 897

Translation ribosomal structure and biogenesis1865 724

Carbohydrate transport an metabolism 1712 664

Amino acid transport and metabolism 1427 554

Replication recombination and repair 1319 512

Energy production and conversion 1152 447

Signal transduction mechanisms 1110 431

Transcription 1045 406

Lipid transport and metabolism 1021 396

Inorganic ion transport and metabolism

821 319

Function unknown 809 314

Coenzyme transport and metabolism 728 283

Cell wallmembraneenvelope biogenesis 721 280

Secondary metabolites biosynthesis transport and catabolism 621 241

Nucleotide transport and metabolism 588 228

Intracellular trafficking secretion and vesicular transport 530 206

Cytoskeleton 392 152

Defense mechanism 344 134

Chromatin structure and dynamics 331 128

Cell cycle control cell division and chromosome partitioning 273 106

RNA processing and modification 211 082

Cell motility 24 009

Nuclear structure 10 004

Figure 3 COG classification assigned to the unigenes

Figure 4(b)) including ldquophenylpropanoid biosynthesisrdquo (202unigenes 39) ldquoflavonoid biosynthesisrdquo (55 unigenes 11)ldquotropane piperidine and pyridine alkaloid biosynthesisrdquo (53unigenes 9) ldquostreptomycin biosynthesisrdquo (45 unigenes9) ldquoisoquinoline alkaloid biosynthesisrdquo (44 unigenes 8)ldquostilbenoid diarylheptanoid and gingerol biosynthesisrdquo (35unigenes 7) ldquoflavone and flavonol biosynthesisrdquo (27 uni-genes 5) and ldquobutirosin and neomycin biosynthesisrdquo (25unigenes 5) In our sequence dataset a total of 176 unigeneswere found to be potentially related to the biosynthesis of ter-penoid biosynthesis including MEP and MVA pathways Inaddition 202 and 55 unigeneswere related to the biosynthesesof phenylpropanoid and flavonoid respectively

(1) Characterization and Expression Analysis of the Genes In-volved in the Biosynthetic Pathway of Putative Terpenoid Back-bone Dimethylallyl pyrophosphate (DMAPP) and isopen-tenyl pyrophosphate (IPP) are recognized as a precursor ofthe biosynthesis of terpenoid saponins in green plants andthe compounds of ldquoactivity of isoprenerdquo in vivo DMAPP andIPP are alletic isomers which are synthesized by MVA andMEP biosynthesis pathwaysmentioned above [28]TheMVApathway starts with acetyl-CoAwhich is synthesized throughthe triterpene derivatives captured CO

2in cytoplasm and the

MEP pathway starts with pyruvate and glyceraldehydes-3-phosphate in plastid [29] In these databases the biosyntheticpathways of MVA and MEP were identified (Figure 5(a)) Inmost cases two or more unique sequences were labeled asthe same enzyme and these unique sequences could representsingle gene or different fragments of gene

The process of MVA pathway was as follows Firstly twomolecules of acetyl-CoAwere catalyzed into acetoacetyl CoAby AACT (EC 2319 13 unigenes) and then catalyzed into3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) by HMGS(EC 23310 18 unigenes) MVA was formulated by HMGR(EC 11134 18 unigenes) and then MVK (EC 27136 2unigenes) catalyzed MVA into MVA-5-Phosphat Next IPPwas synthesized by PMK (EC 2742 4 unigenes) and PMD(EC 41133 3 unigenes) The process of MEP pathway wasas follows Firstly pyruvate and glyceraldehydes-3-phosphatewere converted to 1-deoxy-d-xylulose-5-phosphate (DXOP)by DXS (EC 2217 17 unigenes) and then catalyzed intoMEPby DXR (EC 111267 7 unigenes) Next MCT (EC 277607 unigenes) and CMK (EC 271148 4 unigenes) catalyzedtheMEP to 4-dicytidine triphosphate-2-methyl-d-erythritol-2-phosphate (CDP-MEP) Lastly IPP was synthesized byMDS (EC 46112 1 unigene) HDS (EC 11771 9 unigenes)and HDR (EC 11712 12 unigenes) The next step was the

6 International Journal of Genomics

Table 2 Pathway classification of P tenuifolia

Category Pathway Count

Metabolism

Carbohydrate metabolism 3702Amino acid metabolism 2618

Enzyme families 1781Energy metabolism 1762Lipid metabolism 1610

Glycan biosynthesis and metabolism 1390Nucleotide metabolism 1055

Metabolism of cofactors and vitamins 879Metabolism of other amino acids 716

Metabolism of terpenoids and polyketides 620Xenobiotics biodegradation and metabolism 562Biosynthesis of other secondary metabolites 522

Genetic information processing

Folding sorting and degradation 4312Translation 4029Transcription 3191

Replication and repair 3061

Environmental information processingSignal transduction 2050Membrane transport 342

Signaling molecules and interaction 327

Cellular processes

Cell growth and death 1540Transport and catabolism 1399

Cell motility 507Cell communication 473

Human diseases

Neurodegenerative diseases 1398Infectious diseases 1371

Cancers 1146Immune system diseases 236Cardiovascular diseases 103

synthesis of squalene a series of enzymes were involvedsuch as IPPI (EC 5332 15 unigenes) and SQS (EC 251217 unigenes) However GPPS (EC 25129) and FPPS (EC25110) involved in terpenoid biosynthesis were not foundfrom the transcriptome resultsmaybe because some assemblycDNAs were not of full length Squalene was catalyzed into23-oxidosqualene by SE (EC 114997 19 unigenes) In manyplants 23-oxisqualene cyclization was a very important stepin the biosynthesis of terpenes because it was a branchpoint of the synthesis of triterpenoid saponin sterol andother terpenes [30] We have found that two types of OSCgenes of P tenuifolia are present in the dataset these typesare cycloartenol synthase (16 unigenes) and beta-amyrinsynthase (54 unigenes) respectively Moreover except forthe FPPS another five nucleotide sequences (SQS SE CAS(2) cycloartenol synthase and beta-amyrin) related to thetriterpenoid saponins biosynthesis and reported in NCBIwere found in our sequencing data

(2) Characterization and Expression Analysis of the GenesInvolved in the Putative Phenylpropanoid Biosynthesis Path-way Approximately 814 unigenes encoding 16 enzymeswere found in the phenylpropanoid biosynthesis pathway

(Figure 5(b)) But flavonoids and lignins were not maincompound of phenylpropanoids and researchers were rarelymentioned in P tenuifolia phenylpropanoid biosynthesispathway starts with converting phenylalanine to Cinnamateby PAL (EC 43124 11 unigenes) and then Cinnamateconverts to p-coumaryol CoA by the enzymes of C4H (EC1141311 10 unigenes) and 4CL (EC 62112 25 unigenes)

The flavonoid pathway began with the synthesis of chal-cone which was catalyzed by CHS (EC 23174 1 unigenes)and then catalyzed to naringenin by CHI (EC 5516 3unigenes) Dihydrotricetin was synthesized by F3101584051015840H (EC1141388 11 unigenes) F3H (EC 114119 60 unigenes) cat-alyzed these flavanones to dihydroflavonolsThe last step wasdihydroflavonols converted to flavonols by FLS (EC 11411234 unigenes)

The lignin pathway began with the synthesis of p-couma-royl shikimate which was catalyzed by HCT (EC 231133 4unigenes) Then C31015840H (EC 1141336) catalyzed the conver-sion of p-coumaroyl shikimate into caffeoyl shikimateCaffeoyl shikimate was converted by hydroxycinnamoyl-transferase to produce caffeoyl CoA CCoAoMT (EC211104) could convert these caffeoyl CoA to feruloyl CoAThen CCR (EC 12144 6 unigenes) catalyzed the feruloyl

International Journal of Genomics 7

20

2

13

5

6

7

1

26

28

37

Biosynthesis of ansamycins [PATH ko01051]

Biosynthesis of siderophore group nonribosomal

peptides [PATH ko01053]

Brassinosteroid biosynthesis [PATH ko00905]

Carotenoid biosynthesis [PATH ko00906]

Diterpenoid biosynthesis [PATH ko00904]

Geraniol degradation [PATH ko00281]

Limonene and pinene degradation [PATH ko00903]

Polyketide sugar unit biosynthesis [PATH ko00523]

Prenyltransferases [BR ko01006]

Terpenoid backbone biosynthesis [PATH ko00900]

Tetracycline biosynthesis [PATH ko00253]

Zeatin biosynthesis [PATH ko00908]

(a)

52

5

10

8

5

39

7

9

10

Butirosin and neomycin biosynthesis [PATH ko00524]

Caffeine metabolism [PATH ko00232]

Flavone and flavonol biosynthesis [PATH ko00944]

Flavonoid biosynthesis [PATH ko00941]

Isoquinoline alkaloid biosynthesis [PATH ko00950]

Novobiocin biosynthesis [PATH ko00401]

Phenylpropanoid biosynthesis [PATH ko00940]

Stilbenoid diarylheptanoid and gingerol biosynthesis

[PATH ko00945]

Streptomycin biosynthesis [PATH ko00521]

Tropane piperidine and pyridine alkaloid biosynthesis

[PATH ko00960]

(b)

Figure 4 Classifications of unigenes involved in the metabolism of terpenoids and polyketides (a) and biosynthesis of other secondarymetabolites (b)

Table 3 Frequency of identified SSR motifs in P tenuifolia

Motif length Repeat numbers Total 5 6 7 8 9 10 gt10

Mononucleotide mdash mdash mdash mdash mdash 1554 1890 3444 5175Dinucleotide mdash 638 374 333 333 317 125 2120 3185Trinucleotide 646 218 117 10 1 2 2 996 1497Tetranucleotide 64 12 0 0 0 0 0 76 114Pentanucleotide 2 1 0 0 0 0 0 3 0045Hexanucleotide 12 2 2 0 0 0 0 16 024Total 724 871 493 343 334 1873 2017 6655 1087 1309 741 515 502 479 191

CoA into coniferaldehyde Finally coniferaldehyde wasconverted by CAD (EC 111195 12 unigenes) to produceconiferyl alcohol Similar to the biosynthesis of the terpenoidbackbone C31015840H was not found from the transcriptomeresults

33 Simple Sequence Repeat (SSR) Marker Discovery A totalof 6655 SSRs and 5784 sequences were identified from39625 unigenes Among these sequences there was morethan one SSR for 755 unigene sequences Mononucleotideaccounted for 5175 and 3185 for dinucleotide and1497 for trinucleotide motify (Table 3) AT (1539) was the

most abundant repeat type followed by ATGA (207) andGAATCA (69) However the number of tetranucleotidepentanucleotide and hexanucleotide motifs was lt2 in theunigene sequences In this transcriptome database SSRs didnot all exist in the unigenes and the complex SSRmotifs werenot detected

4 Discussions

The advantages of high-throughput accuracy and repro-ducibility lead transcriptome sequencing to become a power-ful technology [31] Illmina sequencing 2000 has been widely

8 International Journal of Genomics

Acetyl CoA

DOXP

HMG-CoA

MVA

MVA-5-phosphate

MEP

Acetoacetyl CoA

CDP-ME

CDP-MEP

HMBPP

IPP

CME-PP

IPP

GPP

FPP

23-Oxidosqualene

Cycloartenol

Triterpenoid saponins

Steroidal saponins

MVA-5-diphosphate

Squalene

DMAPPDMAPP

AACT(13)

HMGS(18)

HMGR(18)

MVK(2)

PMK(4)

PMD(3)

DXS(17)

DXR(7)

MCT(7)

CMK(4)

MDS(1)

HDS(9)

HDR(12)

IPPI IPPI(15)

GPPS(5) GPPS

FPPS

SQS(7)

SE(19)b-AS(9)

CAS(4)CYPs and UGTs

CYPs and UGTs

MV

A p

ath

way

[cy

toso

l]

ME

PD

OX

P p

ath

way

[p

last

id]

GA-3P + pyruvate

120573-amyrin

(a)

Phenylalanine

Cinnamate

p-Coumarate

p-Coumaroyl CoA

PAL(11)

C4H(10)

4CL(25)

Coniferaldehyde

p-Coumaroyl shikimate

Caffeoyl shikimate

Caffeoyl CoA

Feruloyl CoA

HCT(4)

HCT(4)

CCoAoMT(13)

CCR(6)

Coniferyl alcohol

CAD(12)Flavonol

Chalcone

Naringenin

Flavanones

Dihydroflavonols

CHS(1)

CHI(3)

F3H(60)

FLS(4)

Lignin

UGT

F39984005998400H(11)

C3998400(H)

(b)

Figure 5 Schematic of the putative biosynthetic pathways of two major classes of active compounds in P tenuifolia Saponin biosynthesispathway (a) and phenylpropanoid biosynthesis pathway (b) Enzyme abbreviations AACT acetyl CoA C-acetyltransferase or acetoacetyl-CoA thiolase HMGS 3-hydroxy-3-methylglutaryl CoA synthase HMGR 3-hydroxy-3-methylglutaryl CoA reductase MVK mevalonatekinase PMK phosphomevalonate kinase PMD mevalonate pyrophosphate decarboxylase DXS 1-deoxy-d-xylulose-5-phosphate synthaseDXR 1-deoxy-d-xylulose-5-phosphate reductoisomerase MCT 2-C-methyl-erythritol 4-phosphate cytidylytransferase CMK 4-(Cytidine51015840-diphospho)-2-C-methyl-d-erythritol kinase MDS 2-C-methyl-d-erythritol 24-cyclodiphosphate synthase HDS 1-hydroxy-2-methyl-butenyl-4-diphosphatesynthase HDR isopentenyl pyrophosphate (IPP)3 3-dimethylallyl pyrophosphate (DMAPP) synthase IPPIisopentenyl pyrophosphate isomerase GPPS geranyl diphosphate synthase FPPS farnesyl diphosphate synthase SQS squalene synthaseSE squalene epoxidase CAS cycloartenol synthase 120573-AS 120573-amyrin synthase DS dammarenediol synthase PAL phenylalanine ammonialyase C4H cinnamate 4-hydroxylase 4CL 4-coumarate CoA ligase CHS chalcone synthase CHI chalcone isomerase F3101584051015840H flavonoid3101584051015840-hydroxylase F3H flavanone 3-hydroxylase FLS flavonol synthase HCT hydroxycinnamoyl-transferase C31015840H p-coumaroyl shikimate31015840 hydroxylase CCoAoMT caffeoyl CoA 3-O-methyltransferase CCR cinnanoyl-CoA reductase CAD cinnamyl alcohol dehydrogenaseCYPs cytochrome P450s UGTs glycosyltransferases

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

4 International Journal of Genomics

Cel

l

Cel

l p

art

En

velo

pe

Ext

race

llu

lar

regi

on

Mac

rom

ole

cula

r co

mp

lex

Mem

bra

ne

encl

ose

d lu

men

Org

anel

le

Org

anel

le p

art

An

tio

xid

ant

acti

vity

Bin

din

g

Cat

alyt

ic a

ctiv

ity

Ele

ctro

n c

arri

er a

ctiv

ity

En

zym

e re

gula

tor

acti

vity

Mo

lecu

lar

tran

sdu

cer

acti

vity

Nu

trie

nt

rese

rvo

ir a

ctiv

ity

Stru

ctu

ral

mo

lecu

le a

ctiv

ity

Tra

nsc

rip

tio

n r

egu

lato

r ac

tivi

ty

Tra

nsl

atio

n r

egu

lato

r ac

tivi

ty

Tra

nsp

ort

er a

ctiv

ity

An

ato

mic

al s

tru

ctu

re f

orm

atio

n

Bio

logi

cal

regu

lati

on

Cel

lula

r co

mp

on

ent

bio

gen

esis

Cel

lula

r co

mp

on

ent

org

aniz

atio

n

Cel

lula

r p

roce

ss

Dev

elo

pm

enta

l pro

cess

Est

abli

shm

ent

of

loca

liza

tio

n

Lo

cali

zati

on

Met

abo

lic

pro

cess

Mu

lti

org

anis

m p

roce

ss

Mu

ltic

ellu

lar

org

anis

mal

pro

cess

Pig

men

tati

on

Rep

rod

uct

ion

Rep

rod

uct

ive

pro

cess

Res

po

nse

to

sti

mu

lus

Vir

al r

epro

du

ctio

n

Cellular location Molecular function Biological process

01

Gen

es (

)

1

10

100 41820

4182

Nu

mb

er o

f ge

nes

418

41

GO-standard

Figure 2 Gene ontology classification assigned to the unigenes

In cellular location eight classifications were clustered bythe matched unique sequences and the larger subcategorieswere ldquocellrdquo and ldquocell partrdquo In terms of molecular function11 classifications were categorized including the representedcellular components such as ldquobindingrdquo and ldquocatalytic activ-ityrdquo According to biological processes 16 classifications weredivided including the represented biological processes suchas ldquocellular processrdquo and ldquometabolic processrdquo However fewgenes were assigned to the categories ldquonutrient reservoiractivityrdquo and ldquoviral reproductionrdquo and no genes were foundin the clusters of ldquodevelopmental processrdquo

322 COGAnnotation Unigenes for theCOGclassificationswere annotated in order to further evaluate the effectivenessof the annotation process (Figure 3) Class definition num-ber and percentage of this class were shown in Figure 3 Theproteins in the COG categories were assumed to have thesame ancestor protein and these proteins were also assumedto be paralogs or orthologs The largest category was generalfunctional prediction only with 2484 the second cate-gories were posttranslational modification protein turnoverand chaperon with 897 and the third category includedtranslation ribosomal structure and biogenesis with 724Cell motility (24 009) and nuclear structure (10 004)represented the smallest category

323 KEGG Annotation The KEGG database can be usedto categorize gene functions with an emphasis on biochem-ical pathways The KEGG database can also be used tosystematically analyze inner cell metabolic pathways andfunctions of gene products Pathway-based analyses helpfurther determine the biological functions of genes To gainfurther insights into the biological pathways in P tenuifolia

we performed a BLASTX search against the KEGG proteindatabase on the assembled unigenes A total of 72636 (629)unigenes had blast hits and 42702 unigenes were assignedto five KEGG biochemical pathways (Table 2) includingmetabolism (17217 unigenes) genetic information processing(14593 unigenes) environmental information (2719 uni-genes) cellular processes (3919 unigenes) and human dis-eases (4254 unigenes)Thepathwayswith the highest unigenerepresentation were spliceosome (ko03041 1431 unigenes)chromosome (ko03036 1232 unigenes) and ubiquitin system(ko04121 1060 unigenes)

As a Chinese traditional medicinal plant P tenuifoliaundergoes highly active metabolism The higher expressionof metabolism results in primary metabolite biosynthesissuch as ldquocarbohydrate metabolismrdquo (3702 unigenes) ldquoaminoacid metabolismrdquo (2168 unigenes) and ldquoenergy metabolismrdquo(1762 unigenes) However the most active ingredients of Ptenuifolia were secondary metabolites Therefore we listedthe classifications of terpenoids polyketides and other sec-ondary metabolites (Figure 4) with high expression levels

Metabolic pathways were involved in the metabolismof terpenoids and polyketides (620 unigenes Figure 4(a))including ldquoterpenoid backbone biosynthesisrdquo (176 unigenes28) ldquocarotenoid biosynthesisrdquo (78 unigenes 13) ldquozeatinbiosynthesisrdquo (44 unigenes 7) ldquolimonene and pinenedegradationrdquo (46 unigenes 7) ldquoditerpenoid biosynthesisrdquo(30 unigenes 5) ldquotetracycline biosynthesisrdquo (16 unigenes3) ldquobrassinosteroid biosynthesisrdquo (13 unigenes 2) andldquopolyketide sugar unit biosynthesisrdquo (5 unigenes 1) Thelarge amount of transcriptomic informationmay enhance thestudy of terpenoid biosynthesis in P tenuifolia

Metabolic pathways were also involved in the bio-synthesis of other secondary metabolites (522 unigenes

International Journal of Genomics 5

Class definition number of this class percent of this class

Cluster of orthologous groups

General function prediction only 6401 2484

Posttranslational modification protein turnover chaperones 2310 897

Translation ribosomal structure and biogenesis1865 724

Carbohydrate transport an metabolism 1712 664

Amino acid transport and metabolism 1427 554

Replication recombination and repair 1319 512

Energy production and conversion 1152 447

Signal transduction mechanisms 1110 431

Transcription 1045 406

Lipid transport and metabolism 1021 396

Inorganic ion transport and metabolism

821 319

Function unknown 809 314

Coenzyme transport and metabolism 728 283

Cell wallmembraneenvelope biogenesis 721 280

Secondary metabolites biosynthesis transport and catabolism 621 241

Nucleotide transport and metabolism 588 228

Intracellular trafficking secretion and vesicular transport 530 206

Cytoskeleton 392 152

Defense mechanism 344 134

Chromatin structure and dynamics 331 128

Cell cycle control cell division and chromosome partitioning 273 106

RNA processing and modification 211 082

Cell motility 24 009

Nuclear structure 10 004

Figure 3 COG classification assigned to the unigenes

Figure 4(b)) including ldquophenylpropanoid biosynthesisrdquo (202unigenes 39) ldquoflavonoid biosynthesisrdquo (55 unigenes 11)ldquotropane piperidine and pyridine alkaloid biosynthesisrdquo (53unigenes 9) ldquostreptomycin biosynthesisrdquo (45 unigenes9) ldquoisoquinoline alkaloid biosynthesisrdquo (44 unigenes 8)ldquostilbenoid diarylheptanoid and gingerol biosynthesisrdquo (35unigenes 7) ldquoflavone and flavonol biosynthesisrdquo (27 uni-genes 5) and ldquobutirosin and neomycin biosynthesisrdquo (25unigenes 5) In our sequence dataset a total of 176 unigeneswere found to be potentially related to the biosynthesis of ter-penoid biosynthesis including MEP and MVA pathways Inaddition 202 and 55 unigeneswere related to the biosynthesesof phenylpropanoid and flavonoid respectively

(1) Characterization and Expression Analysis of the Genes In-volved in the Biosynthetic Pathway of Putative Terpenoid Back-bone Dimethylallyl pyrophosphate (DMAPP) and isopen-tenyl pyrophosphate (IPP) are recognized as a precursor ofthe biosynthesis of terpenoid saponins in green plants andthe compounds of ldquoactivity of isoprenerdquo in vivo DMAPP andIPP are alletic isomers which are synthesized by MVA andMEP biosynthesis pathwaysmentioned above [28]TheMVApathway starts with acetyl-CoAwhich is synthesized throughthe triterpene derivatives captured CO

2in cytoplasm and the

MEP pathway starts with pyruvate and glyceraldehydes-3-phosphate in plastid [29] In these databases the biosyntheticpathways of MVA and MEP were identified (Figure 5(a)) Inmost cases two or more unique sequences were labeled asthe same enzyme and these unique sequences could representsingle gene or different fragments of gene

The process of MVA pathway was as follows Firstly twomolecules of acetyl-CoAwere catalyzed into acetoacetyl CoAby AACT (EC 2319 13 unigenes) and then catalyzed into3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) by HMGS(EC 23310 18 unigenes) MVA was formulated by HMGR(EC 11134 18 unigenes) and then MVK (EC 27136 2unigenes) catalyzed MVA into MVA-5-Phosphat Next IPPwas synthesized by PMK (EC 2742 4 unigenes) and PMD(EC 41133 3 unigenes) The process of MEP pathway wasas follows Firstly pyruvate and glyceraldehydes-3-phosphatewere converted to 1-deoxy-d-xylulose-5-phosphate (DXOP)by DXS (EC 2217 17 unigenes) and then catalyzed intoMEPby DXR (EC 111267 7 unigenes) Next MCT (EC 277607 unigenes) and CMK (EC 271148 4 unigenes) catalyzedtheMEP to 4-dicytidine triphosphate-2-methyl-d-erythritol-2-phosphate (CDP-MEP) Lastly IPP was synthesized byMDS (EC 46112 1 unigene) HDS (EC 11771 9 unigenes)and HDR (EC 11712 12 unigenes) The next step was the

6 International Journal of Genomics

Table 2 Pathway classification of P tenuifolia

Category Pathway Count

Metabolism

Carbohydrate metabolism 3702Amino acid metabolism 2618

Enzyme families 1781Energy metabolism 1762Lipid metabolism 1610

Glycan biosynthesis and metabolism 1390Nucleotide metabolism 1055

Metabolism of cofactors and vitamins 879Metabolism of other amino acids 716

Metabolism of terpenoids and polyketides 620Xenobiotics biodegradation and metabolism 562Biosynthesis of other secondary metabolites 522

Genetic information processing

Folding sorting and degradation 4312Translation 4029Transcription 3191

Replication and repair 3061

Environmental information processingSignal transduction 2050Membrane transport 342

Signaling molecules and interaction 327

Cellular processes

Cell growth and death 1540Transport and catabolism 1399

Cell motility 507Cell communication 473

Human diseases

Neurodegenerative diseases 1398Infectious diseases 1371

Cancers 1146Immune system diseases 236Cardiovascular diseases 103

synthesis of squalene a series of enzymes were involvedsuch as IPPI (EC 5332 15 unigenes) and SQS (EC 251217 unigenes) However GPPS (EC 25129) and FPPS (EC25110) involved in terpenoid biosynthesis were not foundfrom the transcriptome resultsmaybe because some assemblycDNAs were not of full length Squalene was catalyzed into23-oxidosqualene by SE (EC 114997 19 unigenes) In manyplants 23-oxisqualene cyclization was a very important stepin the biosynthesis of terpenes because it was a branchpoint of the synthesis of triterpenoid saponin sterol andother terpenes [30] We have found that two types of OSCgenes of P tenuifolia are present in the dataset these typesare cycloartenol synthase (16 unigenes) and beta-amyrinsynthase (54 unigenes) respectively Moreover except forthe FPPS another five nucleotide sequences (SQS SE CAS(2) cycloartenol synthase and beta-amyrin) related to thetriterpenoid saponins biosynthesis and reported in NCBIwere found in our sequencing data

(2) Characterization and Expression Analysis of the GenesInvolved in the Putative Phenylpropanoid Biosynthesis Path-way Approximately 814 unigenes encoding 16 enzymeswere found in the phenylpropanoid biosynthesis pathway

(Figure 5(b)) But flavonoids and lignins were not maincompound of phenylpropanoids and researchers were rarelymentioned in P tenuifolia phenylpropanoid biosynthesispathway starts with converting phenylalanine to Cinnamateby PAL (EC 43124 11 unigenes) and then Cinnamateconverts to p-coumaryol CoA by the enzymes of C4H (EC1141311 10 unigenes) and 4CL (EC 62112 25 unigenes)

The flavonoid pathway began with the synthesis of chal-cone which was catalyzed by CHS (EC 23174 1 unigenes)and then catalyzed to naringenin by CHI (EC 5516 3unigenes) Dihydrotricetin was synthesized by F3101584051015840H (EC1141388 11 unigenes) F3H (EC 114119 60 unigenes) cat-alyzed these flavanones to dihydroflavonolsThe last step wasdihydroflavonols converted to flavonols by FLS (EC 11411234 unigenes)

The lignin pathway began with the synthesis of p-couma-royl shikimate which was catalyzed by HCT (EC 231133 4unigenes) Then C31015840H (EC 1141336) catalyzed the conver-sion of p-coumaroyl shikimate into caffeoyl shikimateCaffeoyl shikimate was converted by hydroxycinnamoyl-transferase to produce caffeoyl CoA CCoAoMT (EC211104) could convert these caffeoyl CoA to feruloyl CoAThen CCR (EC 12144 6 unigenes) catalyzed the feruloyl

International Journal of Genomics 7

20

2

13

5

6

7

1

26

28

37

Biosynthesis of ansamycins [PATH ko01051]

Biosynthesis of siderophore group nonribosomal

peptides [PATH ko01053]

Brassinosteroid biosynthesis [PATH ko00905]

Carotenoid biosynthesis [PATH ko00906]

Diterpenoid biosynthesis [PATH ko00904]

Geraniol degradation [PATH ko00281]

Limonene and pinene degradation [PATH ko00903]

Polyketide sugar unit biosynthesis [PATH ko00523]

Prenyltransferases [BR ko01006]

Terpenoid backbone biosynthesis [PATH ko00900]

Tetracycline biosynthesis [PATH ko00253]

Zeatin biosynthesis [PATH ko00908]

(a)

52

5

10

8

5

39

7

9

10

Butirosin and neomycin biosynthesis [PATH ko00524]

Caffeine metabolism [PATH ko00232]

Flavone and flavonol biosynthesis [PATH ko00944]

Flavonoid biosynthesis [PATH ko00941]

Isoquinoline alkaloid biosynthesis [PATH ko00950]

Novobiocin biosynthesis [PATH ko00401]

Phenylpropanoid biosynthesis [PATH ko00940]

Stilbenoid diarylheptanoid and gingerol biosynthesis

[PATH ko00945]

Streptomycin biosynthesis [PATH ko00521]

Tropane piperidine and pyridine alkaloid biosynthesis

[PATH ko00960]

(b)

Figure 4 Classifications of unigenes involved in the metabolism of terpenoids and polyketides (a) and biosynthesis of other secondarymetabolites (b)

Table 3 Frequency of identified SSR motifs in P tenuifolia

Motif length Repeat numbers Total 5 6 7 8 9 10 gt10

Mononucleotide mdash mdash mdash mdash mdash 1554 1890 3444 5175Dinucleotide mdash 638 374 333 333 317 125 2120 3185Trinucleotide 646 218 117 10 1 2 2 996 1497Tetranucleotide 64 12 0 0 0 0 0 76 114Pentanucleotide 2 1 0 0 0 0 0 3 0045Hexanucleotide 12 2 2 0 0 0 0 16 024Total 724 871 493 343 334 1873 2017 6655 1087 1309 741 515 502 479 191

CoA into coniferaldehyde Finally coniferaldehyde wasconverted by CAD (EC 111195 12 unigenes) to produceconiferyl alcohol Similar to the biosynthesis of the terpenoidbackbone C31015840H was not found from the transcriptomeresults

33 Simple Sequence Repeat (SSR) Marker Discovery A totalof 6655 SSRs and 5784 sequences were identified from39625 unigenes Among these sequences there was morethan one SSR for 755 unigene sequences Mononucleotideaccounted for 5175 and 3185 for dinucleotide and1497 for trinucleotide motify (Table 3) AT (1539) was the

most abundant repeat type followed by ATGA (207) andGAATCA (69) However the number of tetranucleotidepentanucleotide and hexanucleotide motifs was lt2 in theunigene sequences In this transcriptome database SSRs didnot all exist in the unigenes and the complex SSRmotifs werenot detected

4 Discussions

The advantages of high-throughput accuracy and repro-ducibility lead transcriptome sequencing to become a power-ful technology [31] Illmina sequencing 2000 has been widely

8 International Journal of Genomics

Acetyl CoA

DOXP

HMG-CoA

MVA

MVA-5-phosphate

MEP

Acetoacetyl CoA

CDP-ME

CDP-MEP

HMBPP

IPP

CME-PP

IPP

GPP

FPP

23-Oxidosqualene

Cycloartenol

Triterpenoid saponins

Steroidal saponins

MVA-5-diphosphate

Squalene

DMAPPDMAPP

AACT(13)

HMGS(18)

HMGR(18)

MVK(2)

PMK(4)

PMD(3)

DXS(17)

DXR(7)

MCT(7)

CMK(4)

MDS(1)

HDS(9)

HDR(12)

IPPI IPPI(15)

GPPS(5) GPPS

FPPS

SQS(7)

SE(19)b-AS(9)

CAS(4)CYPs and UGTs

CYPs and UGTs

MV

A p

ath

way

[cy

toso

l]

ME

PD

OX

P p

ath

way

[p

last

id]

GA-3P + pyruvate

120573-amyrin

(a)

Phenylalanine

Cinnamate

p-Coumarate

p-Coumaroyl CoA

PAL(11)

C4H(10)

4CL(25)

Coniferaldehyde

p-Coumaroyl shikimate

Caffeoyl shikimate

Caffeoyl CoA

Feruloyl CoA

HCT(4)

HCT(4)

CCoAoMT(13)

CCR(6)

Coniferyl alcohol

CAD(12)Flavonol

Chalcone

Naringenin

Flavanones

Dihydroflavonols

CHS(1)

CHI(3)

F3H(60)

FLS(4)

Lignin

UGT

F39984005998400H(11)

C3998400(H)

(b)

Figure 5 Schematic of the putative biosynthetic pathways of two major classes of active compounds in P tenuifolia Saponin biosynthesispathway (a) and phenylpropanoid biosynthesis pathway (b) Enzyme abbreviations AACT acetyl CoA C-acetyltransferase or acetoacetyl-CoA thiolase HMGS 3-hydroxy-3-methylglutaryl CoA synthase HMGR 3-hydroxy-3-methylglutaryl CoA reductase MVK mevalonatekinase PMK phosphomevalonate kinase PMD mevalonate pyrophosphate decarboxylase DXS 1-deoxy-d-xylulose-5-phosphate synthaseDXR 1-deoxy-d-xylulose-5-phosphate reductoisomerase MCT 2-C-methyl-erythritol 4-phosphate cytidylytransferase CMK 4-(Cytidine51015840-diphospho)-2-C-methyl-d-erythritol kinase MDS 2-C-methyl-d-erythritol 24-cyclodiphosphate synthase HDS 1-hydroxy-2-methyl-butenyl-4-diphosphatesynthase HDR isopentenyl pyrophosphate (IPP)3 3-dimethylallyl pyrophosphate (DMAPP) synthase IPPIisopentenyl pyrophosphate isomerase GPPS geranyl diphosphate synthase FPPS farnesyl diphosphate synthase SQS squalene synthaseSE squalene epoxidase CAS cycloartenol synthase 120573-AS 120573-amyrin synthase DS dammarenediol synthase PAL phenylalanine ammonialyase C4H cinnamate 4-hydroxylase 4CL 4-coumarate CoA ligase CHS chalcone synthase CHI chalcone isomerase F3101584051015840H flavonoid3101584051015840-hydroxylase F3H flavanone 3-hydroxylase FLS flavonol synthase HCT hydroxycinnamoyl-transferase C31015840H p-coumaroyl shikimate31015840 hydroxylase CCoAoMT caffeoyl CoA 3-O-methyltransferase CCR cinnanoyl-CoA reductase CAD cinnamyl alcohol dehydrogenaseCYPs cytochrome P450s UGTs glycosyltransferases

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

International Journal of Genomics 5

Class definition number of this class percent of this class

Cluster of orthologous groups

General function prediction only 6401 2484

Posttranslational modification protein turnover chaperones 2310 897

Translation ribosomal structure and biogenesis1865 724

Carbohydrate transport an metabolism 1712 664

Amino acid transport and metabolism 1427 554

Replication recombination and repair 1319 512

Energy production and conversion 1152 447

Signal transduction mechanisms 1110 431

Transcription 1045 406

Lipid transport and metabolism 1021 396

Inorganic ion transport and metabolism

821 319

Function unknown 809 314

Coenzyme transport and metabolism 728 283

Cell wallmembraneenvelope biogenesis 721 280

Secondary metabolites biosynthesis transport and catabolism 621 241

Nucleotide transport and metabolism 588 228

Intracellular trafficking secretion and vesicular transport 530 206

Cytoskeleton 392 152

Defense mechanism 344 134

Chromatin structure and dynamics 331 128

Cell cycle control cell division and chromosome partitioning 273 106

RNA processing and modification 211 082

Cell motility 24 009

Nuclear structure 10 004

Figure 3 COG classification assigned to the unigenes

Figure 4(b)) including ldquophenylpropanoid biosynthesisrdquo (202unigenes 39) ldquoflavonoid biosynthesisrdquo (55 unigenes 11)ldquotropane piperidine and pyridine alkaloid biosynthesisrdquo (53unigenes 9) ldquostreptomycin biosynthesisrdquo (45 unigenes9) ldquoisoquinoline alkaloid biosynthesisrdquo (44 unigenes 8)ldquostilbenoid diarylheptanoid and gingerol biosynthesisrdquo (35unigenes 7) ldquoflavone and flavonol biosynthesisrdquo (27 uni-genes 5) and ldquobutirosin and neomycin biosynthesisrdquo (25unigenes 5) In our sequence dataset a total of 176 unigeneswere found to be potentially related to the biosynthesis of ter-penoid biosynthesis including MEP and MVA pathways Inaddition 202 and 55 unigeneswere related to the biosynthesesof phenylpropanoid and flavonoid respectively

(1) Characterization and Expression Analysis of the Genes In-volved in the Biosynthetic Pathway of Putative Terpenoid Back-bone Dimethylallyl pyrophosphate (DMAPP) and isopen-tenyl pyrophosphate (IPP) are recognized as a precursor ofthe biosynthesis of terpenoid saponins in green plants andthe compounds of ldquoactivity of isoprenerdquo in vivo DMAPP andIPP are alletic isomers which are synthesized by MVA andMEP biosynthesis pathwaysmentioned above [28]TheMVApathway starts with acetyl-CoAwhich is synthesized throughthe triterpene derivatives captured CO

2in cytoplasm and the

MEP pathway starts with pyruvate and glyceraldehydes-3-phosphate in plastid [29] In these databases the biosyntheticpathways of MVA and MEP were identified (Figure 5(a)) Inmost cases two or more unique sequences were labeled asthe same enzyme and these unique sequences could representsingle gene or different fragments of gene

The process of MVA pathway was as follows Firstly twomolecules of acetyl-CoAwere catalyzed into acetoacetyl CoAby AACT (EC 2319 13 unigenes) and then catalyzed into3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) by HMGS(EC 23310 18 unigenes) MVA was formulated by HMGR(EC 11134 18 unigenes) and then MVK (EC 27136 2unigenes) catalyzed MVA into MVA-5-Phosphat Next IPPwas synthesized by PMK (EC 2742 4 unigenes) and PMD(EC 41133 3 unigenes) The process of MEP pathway wasas follows Firstly pyruvate and glyceraldehydes-3-phosphatewere converted to 1-deoxy-d-xylulose-5-phosphate (DXOP)by DXS (EC 2217 17 unigenes) and then catalyzed intoMEPby DXR (EC 111267 7 unigenes) Next MCT (EC 277607 unigenes) and CMK (EC 271148 4 unigenes) catalyzedtheMEP to 4-dicytidine triphosphate-2-methyl-d-erythritol-2-phosphate (CDP-MEP) Lastly IPP was synthesized byMDS (EC 46112 1 unigene) HDS (EC 11771 9 unigenes)and HDR (EC 11712 12 unigenes) The next step was the

6 International Journal of Genomics

Table 2 Pathway classification of P tenuifolia

Category Pathway Count

Metabolism

Carbohydrate metabolism 3702Amino acid metabolism 2618

Enzyme families 1781Energy metabolism 1762Lipid metabolism 1610

Glycan biosynthesis and metabolism 1390Nucleotide metabolism 1055

Metabolism of cofactors and vitamins 879Metabolism of other amino acids 716

Metabolism of terpenoids and polyketides 620Xenobiotics biodegradation and metabolism 562Biosynthesis of other secondary metabolites 522

Genetic information processing

Folding sorting and degradation 4312Translation 4029Transcription 3191

Replication and repair 3061

Environmental information processingSignal transduction 2050Membrane transport 342

Signaling molecules and interaction 327

Cellular processes

Cell growth and death 1540Transport and catabolism 1399

Cell motility 507Cell communication 473

Human diseases

Neurodegenerative diseases 1398Infectious diseases 1371

Cancers 1146Immune system diseases 236Cardiovascular diseases 103

synthesis of squalene a series of enzymes were involvedsuch as IPPI (EC 5332 15 unigenes) and SQS (EC 251217 unigenes) However GPPS (EC 25129) and FPPS (EC25110) involved in terpenoid biosynthesis were not foundfrom the transcriptome resultsmaybe because some assemblycDNAs were not of full length Squalene was catalyzed into23-oxidosqualene by SE (EC 114997 19 unigenes) In manyplants 23-oxisqualene cyclization was a very important stepin the biosynthesis of terpenes because it was a branchpoint of the synthesis of triterpenoid saponin sterol andother terpenes [30] We have found that two types of OSCgenes of P tenuifolia are present in the dataset these typesare cycloartenol synthase (16 unigenes) and beta-amyrinsynthase (54 unigenes) respectively Moreover except forthe FPPS another five nucleotide sequences (SQS SE CAS(2) cycloartenol synthase and beta-amyrin) related to thetriterpenoid saponins biosynthesis and reported in NCBIwere found in our sequencing data

(2) Characterization and Expression Analysis of the GenesInvolved in the Putative Phenylpropanoid Biosynthesis Path-way Approximately 814 unigenes encoding 16 enzymeswere found in the phenylpropanoid biosynthesis pathway

(Figure 5(b)) But flavonoids and lignins were not maincompound of phenylpropanoids and researchers were rarelymentioned in P tenuifolia phenylpropanoid biosynthesispathway starts with converting phenylalanine to Cinnamateby PAL (EC 43124 11 unigenes) and then Cinnamateconverts to p-coumaryol CoA by the enzymes of C4H (EC1141311 10 unigenes) and 4CL (EC 62112 25 unigenes)

The flavonoid pathway began with the synthesis of chal-cone which was catalyzed by CHS (EC 23174 1 unigenes)and then catalyzed to naringenin by CHI (EC 5516 3unigenes) Dihydrotricetin was synthesized by F3101584051015840H (EC1141388 11 unigenes) F3H (EC 114119 60 unigenes) cat-alyzed these flavanones to dihydroflavonolsThe last step wasdihydroflavonols converted to flavonols by FLS (EC 11411234 unigenes)

The lignin pathway began with the synthesis of p-couma-royl shikimate which was catalyzed by HCT (EC 231133 4unigenes) Then C31015840H (EC 1141336) catalyzed the conver-sion of p-coumaroyl shikimate into caffeoyl shikimateCaffeoyl shikimate was converted by hydroxycinnamoyl-transferase to produce caffeoyl CoA CCoAoMT (EC211104) could convert these caffeoyl CoA to feruloyl CoAThen CCR (EC 12144 6 unigenes) catalyzed the feruloyl

International Journal of Genomics 7

20

2

13

5

6

7

1

26

28

37

Biosynthesis of ansamycins [PATH ko01051]

Biosynthesis of siderophore group nonribosomal

peptides [PATH ko01053]

Brassinosteroid biosynthesis [PATH ko00905]

Carotenoid biosynthesis [PATH ko00906]

Diterpenoid biosynthesis [PATH ko00904]

Geraniol degradation [PATH ko00281]

Limonene and pinene degradation [PATH ko00903]

Polyketide sugar unit biosynthesis [PATH ko00523]

Prenyltransferases [BR ko01006]

Terpenoid backbone biosynthesis [PATH ko00900]

Tetracycline biosynthesis [PATH ko00253]

Zeatin biosynthesis [PATH ko00908]

(a)

52

5

10

8

5

39

7

9

10

Butirosin and neomycin biosynthesis [PATH ko00524]

Caffeine metabolism [PATH ko00232]

Flavone and flavonol biosynthesis [PATH ko00944]

Flavonoid biosynthesis [PATH ko00941]

Isoquinoline alkaloid biosynthesis [PATH ko00950]

Novobiocin biosynthesis [PATH ko00401]

Phenylpropanoid biosynthesis [PATH ko00940]

Stilbenoid diarylheptanoid and gingerol biosynthesis

[PATH ko00945]

Streptomycin biosynthesis [PATH ko00521]

Tropane piperidine and pyridine alkaloid biosynthesis

[PATH ko00960]

(b)

Figure 4 Classifications of unigenes involved in the metabolism of terpenoids and polyketides (a) and biosynthesis of other secondarymetabolites (b)

Table 3 Frequency of identified SSR motifs in P tenuifolia

Motif length Repeat numbers Total 5 6 7 8 9 10 gt10

Mononucleotide mdash mdash mdash mdash mdash 1554 1890 3444 5175Dinucleotide mdash 638 374 333 333 317 125 2120 3185Trinucleotide 646 218 117 10 1 2 2 996 1497Tetranucleotide 64 12 0 0 0 0 0 76 114Pentanucleotide 2 1 0 0 0 0 0 3 0045Hexanucleotide 12 2 2 0 0 0 0 16 024Total 724 871 493 343 334 1873 2017 6655 1087 1309 741 515 502 479 191

CoA into coniferaldehyde Finally coniferaldehyde wasconverted by CAD (EC 111195 12 unigenes) to produceconiferyl alcohol Similar to the biosynthesis of the terpenoidbackbone C31015840H was not found from the transcriptomeresults

33 Simple Sequence Repeat (SSR) Marker Discovery A totalof 6655 SSRs and 5784 sequences were identified from39625 unigenes Among these sequences there was morethan one SSR for 755 unigene sequences Mononucleotideaccounted for 5175 and 3185 for dinucleotide and1497 for trinucleotide motify (Table 3) AT (1539) was the

most abundant repeat type followed by ATGA (207) andGAATCA (69) However the number of tetranucleotidepentanucleotide and hexanucleotide motifs was lt2 in theunigene sequences In this transcriptome database SSRs didnot all exist in the unigenes and the complex SSRmotifs werenot detected

4 Discussions

The advantages of high-throughput accuracy and repro-ducibility lead transcriptome sequencing to become a power-ful technology [31] Illmina sequencing 2000 has been widely

8 International Journal of Genomics

Acetyl CoA

DOXP

HMG-CoA

MVA

MVA-5-phosphate

MEP

Acetoacetyl CoA

CDP-ME

CDP-MEP

HMBPP

IPP

CME-PP

IPP

GPP

FPP

23-Oxidosqualene

Cycloartenol

Triterpenoid saponins

Steroidal saponins

MVA-5-diphosphate

Squalene

DMAPPDMAPP

AACT(13)

HMGS(18)

HMGR(18)

MVK(2)

PMK(4)

PMD(3)

DXS(17)

DXR(7)

MCT(7)

CMK(4)

MDS(1)

HDS(9)

HDR(12)

IPPI IPPI(15)

GPPS(5) GPPS

FPPS

SQS(7)

SE(19)b-AS(9)

CAS(4)CYPs and UGTs

CYPs and UGTs

MV

A p

ath

way

[cy

toso

l]

ME

PD

OX

P p

ath

way

[p

last

id]

GA-3P + pyruvate

120573-amyrin

(a)

Phenylalanine

Cinnamate

p-Coumarate

p-Coumaroyl CoA

PAL(11)

C4H(10)

4CL(25)

Coniferaldehyde

p-Coumaroyl shikimate

Caffeoyl shikimate

Caffeoyl CoA

Feruloyl CoA

HCT(4)

HCT(4)

CCoAoMT(13)

CCR(6)

Coniferyl alcohol

CAD(12)Flavonol

Chalcone

Naringenin

Flavanones

Dihydroflavonols

CHS(1)

CHI(3)

F3H(60)

FLS(4)

Lignin

UGT

F39984005998400H(11)

C3998400(H)

(b)

Figure 5 Schematic of the putative biosynthetic pathways of two major classes of active compounds in P tenuifolia Saponin biosynthesispathway (a) and phenylpropanoid biosynthesis pathway (b) Enzyme abbreviations AACT acetyl CoA C-acetyltransferase or acetoacetyl-CoA thiolase HMGS 3-hydroxy-3-methylglutaryl CoA synthase HMGR 3-hydroxy-3-methylglutaryl CoA reductase MVK mevalonatekinase PMK phosphomevalonate kinase PMD mevalonate pyrophosphate decarboxylase DXS 1-deoxy-d-xylulose-5-phosphate synthaseDXR 1-deoxy-d-xylulose-5-phosphate reductoisomerase MCT 2-C-methyl-erythritol 4-phosphate cytidylytransferase CMK 4-(Cytidine51015840-diphospho)-2-C-methyl-d-erythritol kinase MDS 2-C-methyl-d-erythritol 24-cyclodiphosphate synthase HDS 1-hydroxy-2-methyl-butenyl-4-diphosphatesynthase HDR isopentenyl pyrophosphate (IPP)3 3-dimethylallyl pyrophosphate (DMAPP) synthase IPPIisopentenyl pyrophosphate isomerase GPPS geranyl diphosphate synthase FPPS farnesyl diphosphate synthase SQS squalene synthaseSE squalene epoxidase CAS cycloartenol synthase 120573-AS 120573-amyrin synthase DS dammarenediol synthase PAL phenylalanine ammonialyase C4H cinnamate 4-hydroxylase 4CL 4-coumarate CoA ligase CHS chalcone synthase CHI chalcone isomerase F3101584051015840H flavonoid3101584051015840-hydroxylase F3H flavanone 3-hydroxylase FLS flavonol synthase HCT hydroxycinnamoyl-transferase C31015840H p-coumaroyl shikimate31015840 hydroxylase CCoAoMT caffeoyl CoA 3-O-methyltransferase CCR cinnanoyl-CoA reductase CAD cinnamyl alcohol dehydrogenaseCYPs cytochrome P450s UGTs glycosyltransferases

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

6 International Journal of Genomics

Table 2 Pathway classification of P tenuifolia

Category Pathway Count

Metabolism

Carbohydrate metabolism 3702Amino acid metabolism 2618

Enzyme families 1781Energy metabolism 1762Lipid metabolism 1610

Glycan biosynthesis and metabolism 1390Nucleotide metabolism 1055

Metabolism of cofactors and vitamins 879Metabolism of other amino acids 716

Metabolism of terpenoids and polyketides 620Xenobiotics biodegradation and metabolism 562Biosynthesis of other secondary metabolites 522

Genetic information processing

Folding sorting and degradation 4312Translation 4029Transcription 3191

Replication and repair 3061

Environmental information processingSignal transduction 2050Membrane transport 342

Signaling molecules and interaction 327

Cellular processes

Cell growth and death 1540Transport and catabolism 1399

Cell motility 507Cell communication 473

Human diseases

Neurodegenerative diseases 1398Infectious diseases 1371

Cancers 1146Immune system diseases 236Cardiovascular diseases 103

synthesis of squalene a series of enzymes were involvedsuch as IPPI (EC 5332 15 unigenes) and SQS (EC 251217 unigenes) However GPPS (EC 25129) and FPPS (EC25110) involved in terpenoid biosynthesis were not foundfrom the transcriptome resultsmaybe because some assemblycDNAs were not of full length Squalene was catalyzed into23-oxidosqualene by SE (EC 114997 19 unigenes) In manyplants 23-oxisqualene cyclization was a very important stepin the biosynthesis of terpenes because it was a branchpoint of the synthesis of triterpenoid saponin sterol andother terpenes [30] We have found that two types of OSCgenes of P tenuifolia are present in the dataset these typesare cycloartenol synthase (16 unigenes) and beta-amyrinsynthase (54 unigenes) respectively Moreover except forthe FPPS another five nucleotide sequences (SQS SE CAS(2) cycloartenol synthase and beta-amyrin) related to thetriterpenoid saponins biosynthesis and reported in NCBIwere found in our sequencing data

(2) Characterization and Expression Analysis of the GenesInvolved in the Putative Phenylpropanoid Biosynthesis Path-way Approximately 814 unigenes encoding 16 enzymeswere found in the phenylpropanoid biosynthesis pathway

(Figure 5(b)) But flavonoids and lignins were not maincompound of phenylpropanoids and researchers were rarelymentioned in P tenuifolia phenylpropanoid biosynthesispathway starts with converting phenylalanine to Cinnamateby PAL (EC 43124 11 unigenes) and then Cinnamateconverts to p-coumaryol CoA by the enzymes of C4H (EC1141311 10 unigenes) and 4CL (EC 62112 25 unigenes)

The flavonoid pathway began with the synthesis of chal-cone which was catalyzed by CHS (EC 23174 1 unigenes)and then catalyzed to naringenin by CHI (EC 5516 3unigenes) Dihydrotricetin was synthesized by F3101584051015840H (EC1141388 11 unigenes) F3H (EC 114119 60 unigenes) cat-alyzed these flavanones to dihydroflavonolsThe last step wasdihydroflavonols converted to flavonols by FLS (EC 11411234 unigenes)

The lignin pathway began with the synthesis of p-couma-royl shikimate which was catalyzed by HCT (EC 231133 4unigenes) Then C31015840H (EC 1141336) catalyzed the conver-sion of p-coumaroyl shikimate into caffeoyl shikimateCaffeoyl shikimate was converted by hydroxycinnamoyl-transferase to produce caffeoyl CoA CCoAoMT (EC211104) could convert these caffeoyl CoA to feruloyl CoAThen CCR (EC 12144 6 unigenes) catalyzed the feruloyl

International Journal of Genomics 7

20

2

13

5

6

7

1

26

28

37

Biosynthesis of ansamycins [PATH ko01051]

Biosynthesis of siderophore group nonribosomal

peptides [PATH ko01053]

Brassinosteroid biosynthesis [PATH ko00905]

Carotenoid biosynthesis [PATH ko00906]

Diterpenoid biosynthesis [PATH ko00904]

Geraniol degradation [PATH ko00281]

Limonene and pinene degradation [PATH ko00903]

Polyketide sugar unit biosynthesis [PATH ko00523]

Prenyltransferases [BR ko01006]

Terpenoid backbone biosynthesis [PATH ko00900]

Tetracycline biosynthesis [PATH ko00253]

Zeatin biosynthesis [PATH ko00908]

(a)

52

5

10

8

5

39

7

9

10

Butirosin and neomycin biosynthesis [PATH ko00524]

Caffeine metabolism [PATH ko00232]

Flavone and flavonol biosynthesis [PATH ko00944]

Flavonoid biosynthesis [PATH ko00941]

Isoquinoline alkaloid biosynthesis [PATH ko00950]

Novobiocin biosynthesis [PATH ko00401]

Phenylpropanoid biosynthesis [PATH ko00940]

Stilbenoid diarylheptanoid and gingerol biosynthesis

[PATH ko00945]

Streptomycin biosynthesis [PATH ko00521]

Tropane piperidine and pyridine alkaloid biosynthesis

[PATH ko00960]

(b)

Figure 4 Classifications of unigenes involved in the metabolism of terpenoids and polyketides (a) and biosynthesis of other secondarymetabolites (b)

Table 3 Frequency of identified SSR motifs in P tenuifolia

Motif length Repeat numbers Total 5 6 7 8 9 10 gt10

Mononucleotide mdash mdash mdash mdash mdash 1554 1890 3444 5175Dinucleotide mdash 638 374 333 333 317 125 2120 3185Trinucleotide 646 218 117 10 1 2 2 996 1497Tetranucleotide 64 12 0 0 0 0 0 76 114Pentanucleotide 2 1 0 0 0 0 0 3 0045Hexanucleotide 12 2 2 0 0 0 0 16 024Total 724 871 493 343 334 1873 2017 6655 1087 1309 741 515 502 479 191

CoA into coniferaldehyde Finally coniferaldehyde wasconverted by CAD (EC 111195 12 unigenes) to produceconiferyl alcohol Similar to the biosynthesis of the terpenoidbackbone C31015840H was not found from the transcriptomeresults

33 Simple Sequence Repeat (SSR) Marker Discovery A totalof 6655 SSRs and 5784 sequences were identified from39625 unigenes Among these sequences there was morethan one SSR for 755 unigene sequences Mononucleotideaccounted for 5175 and 3185 for dinucleotide and1497 for trinucleotide motify (Table 3) AT (1539) was the

most abundant repeat type followed by ATGA (207) andGAATCA (69) However the number of tetranucleotidepentanucleotide and hexanucleotide motifs was lt2 in theunigene sequences In this transcriptome database SSRs didnot all exist in the unigenes and the complex SSRmotifs werenot detected

4 Discussions

The advantages of high-throughput accuracy and repro-ducibility lead transcriptome sequencing to become a power-ful technology [31] Illmina sequencing 2000 has been widely

8 International Journal of Genomics

Acetyl CoA

DOXP

HMG-CoA

MVA

MVA-5-phosphate

MEP

Acetoacetyl CoA

CDP-ME

CDP-MEP

HMBPP

IPP

CME-PP

IPP

GPP

FPP

23-Oxidosqualene

Cycloartenol

Triterpenoid saponins

Steroidal saponins

MVA-5-diphosphate

Squalene

DMAPPDMAPP

AACT(13)

HMGS(18)

HMGR(18)

MVK(2)

PMK(4)

PMD(3)

DXS(17)

DXR(7)

MCT(7)

CMK(4)

MDS(1)

HDS(9)

HDR(12)

IPPI IPPI(15)

GPPS(5) GPPS

FPPS

SQS(7)

SE(19)b-AS(9)

CAS(4)CYPs and UGTs

CYPs and UGTs

MV

A p

ath

way

[cy

toso

l]

ME

PD

OX

P p

ath

way

[p

last

id]

GA-3P + pyruvate

120573-amyrin

(a)

Phenylalanine

Cinnamate

p-Coumarate

p-Coumaroyl CoA

PAL(11)

C4H(10)

4CL(25)

Coniferaldehyde

p-Coumaroyl shikimate

Caffeoyl shikimate

Caffeoyl CoA

Feruloyl CoA

HCT(4)

HCT(4)

CCoAoMT(13)

CCR(6)

Coniferyl alcohol

CAD(12)Flavonol

Chalcone

Naringenin

Flavanones

Dihydroflavonols

CHS(1)

CHI(3)

F3H(60)

FLS(4)

Lignin

UGT

F39984005998400H(11)

C3998400(H)

(b)

Figure 5 Schematic of the putative biosynthetic pathways of two major classes of active compounds in P tenuifolia Saponin biosynthesispathway (a) and phenylpropanoid biosynthesis pathway (b) Enzyme abbreviations AACT acetyl CoA C-acetyltransferase or acetoacetyl-CoA thiolase HMGS 3-hydroxy-3-methylglutaryl CoA synthase HMGR 3-hydroxy-3-methylglutaryl CoA reductase MVK mevalonatekinase PMK phosphomevalonate kinase PMD mevalonate pyrophosphate decarboxylase DXS 1-deoxy-d-xylulose-5-phosphate synthaseDXR 1-deoxy-d-xylulose-5-phosphate reductoisomerase MCT 2-C-methyl-erythritol 4-phosphate cytidylytransferase CMK 4-(Cytidine51015840-diphospho)-2-C-methyl-d-erythritol kinase MDS 2-C-methyl-d-erythritol 24-cyclodiphosphate synthase HDS 1-hydroxy-2-methyl-butenyl-4-diphosphatesynthase HDR isopentenyl pyrophosphate (IPP)3 3-dimethylallyl pyrophosphate (DMAPP) synthase IPPIisopentenyl pyrophosphate isomerase GPPS geranyl diphosphate synthase FPPS farnesyl diphosphate synthase SQS squalene synthaseSE squalene epoxidase CAS cycloartenol synthase 120573-AS 120573-amyrin synthase DS dammarenediol synthase PAL phenylalanine ammonialyase C4H cinnamate 4-hydroxylase 4CL 4-coumarate CoA ligase CHS chalcone synthase CHI chalcone isomerase F3101584051015840H flavonoid3101584051015840-hydroxylase F3H flavanone 3-hydroxylase FLS flavonol synthase HCT hydroxycinnamoyl-transferase C31015840H p-coumaroyl shikimate31015840 hydroxylase CCoAoMT caffeoyl CoA 3-O-methyltransferase CCR cinnanoyl-CoA reductase CAD cinnamyl alcohol dehydrogenaseCYPs cytochrome P450s UGTs glycosyltransferases

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

International Journal of Genomics 7

20

2

13

5

6

7

1

26

28

37

Biosynthesis of ansamycins [PATH ko01051]

Biosynthesis of siderophore group nonribosomal

peptides [PATH ko01053]

Brassinosteroid biosynthesis [PATH ko00905]

Carotenoid biosynthesis [PATH ko00906]

Diterpenoid biosynthesis [PATH ko00904]

Geraniol degradation [PATH ko00281]

Limonene and pinene degradation [PATH ko00903]

Polyketide sugar unit biosynthesis [PATH ko00523]

Prenyltransferases [BR ko01006]

Terpenoid backbone biosynthesis [PATH ko00900]

Tetracycline biosynthesis [PATH ko00253]

Zeatin biosynthesis [PATH ko00908]

(a)

52

5

10

8

5

39

7

9

10

Butirosin and neomycin biosynthesis [PATH ko00524]

Caffeine metabolism [PATH ko00232]

Flavone and flavonol biosynthesis [PATH ko00944]

Flavonoid biosynthesis [PATH ko00941]

Isoquinoline alkaloid biosynthesis [PATH ko00950]

Novobiocin biosynthesis [PATH ko00401]

Phenylpropanoid biosynthesis [PATH ko00940]

Stilbenoid diarylheptanoid and gingerol biosynthesis

[PATH ko00945]

Streptomycin biosynthesis [PATH ko00521]

Tropane piperidine and pyridine alkaloid biosynthesis

[PATH ko00960]

(b)

Figure 4 Classifications of unigenes involved in the metabolism of terpenoids and polyketides (a) and biosynthesis of other secondarymetabolites (b)

Table 3 Frequency of identified SSR motifs in P tenuifolia

Motif length Repeat numbers Total 5 6 7 8 9 10 gt10

Mononucleotide mdash mdash mdash mdash mdash 1554 1890 3444 5175Dinucleotide mdash 638 374 333 333 317 125 2120 3185Trinucleotide 646 218 117 10 1 2 2 996 1497Tetranucleotide 64 12 0 0 0 0 0 76 114Pentanucleotide 2 1 0 0 0 0 0 3 0045Hexanucleotide 12 2 2 0 0 0 0 16 024Total 724 871 493 343 334 1873 2017 6655 1087 1309 741 515 502 479 191

CoA into coniferaldehyde Finally coniferaldehyde wasconverted by CAD (EC 111195 12 unigenes) to produceconiferyl alcohol Similar to the biosynthesis of the terpenoidbackbone C31015840H was not found from the transcriptomeresults

33 Simple Sequence Repeat (SSR) Marker Discovery A totalof 6655 SSRs and 5784 sequences were identified from39625 unigenes Among these sequences there was morethan one SSR for 755 unigene sequences Mononucleotideaccounted for 5175 and 3185 for dinucleotide and1497 for trinucleotide motify (Table 3) AT (1539) was the

most abundant repeat type followed by ATGA (207) andGAATCA (69) However the number of tetranucleotidepentanucleotide and hexanucleotide motifs was lt2 in theunigene sequences In this transcriptome database SSRs didnot all exist in the unigenes and the complex SSRmotifs werenot detected

4 Discussions

The advantages of high-throughput accuracy and repro-ducibility lead transcriptome sequencing to become a power-ful technology [31] Illmina sequencing 2000 has been widely

8 International Journal of Genomics

Acetyl CoA

DOXP

HMG-CoA

MVA

MVA-5-phosphate

MEP

Acetoacetyl CoA

CDP-ME

CDP-MEP

HMBPP

IPP

CME-PP

IPP

GPP

FPP

23-Oxidosqualene

Cycloartenol

Triterpenoid saponins

Steroidal saponins

MVA-5-diphosphate

Squalene

DMAPPDMAPP

AACT(13)

HMGS(18)

HMGR(18)

MVK(2)

PMK(4)

PMD(3)

DXS(17)

DXR(7)

MCT(7)

CMK(4)

MDS(1)

HDS(9)

HDR(12)

IPPI IPPI(15)

GPPS(5) GPPS

FPPS

SQS(7)

SE(19)b-AS(9)

CAS(4)CYPs and UGTs

CYPs and UGTs

MV

A p

ath

way

[cy

toso

l]

ME

PD

OX

P p

ath

way

[p

last

id]

GA-3P + pyruvate

120573-amyrin

(a)

Phenylalanine

Cinnamate

p-Coumarate

p-Coumaroyl CoA

PAL(11)

C4H(10)

4CL(25)

Coniferaldehyde

p-Coumaroyl shikimate

Caffeoyl shikimate

Caffeoyl CoA

Feruloyl CoA

HCT(4)

HCT(4)

CCoAoMT(13)

CCR(6)

Coniferyl alcohol

CAD(12)Flavonol

Chalcone

Naringenin

Flavanones

Dihydroflavonols

CHS(1)

CHI(3)

F3H(60)

FLS(4)

Lignin

UGT

F39984005998400H(11)

C3998400(H)

(b)

Figure 5 Schematic of the putative biosynthetic pathways of two major classes of active compounds in P tenuifolia Saponin biosynthesispathway (a) and phenylpropanoid biosynthesis pathway (b) Enzyme abbreviations AACT acetyl CoA C-acetyltransferase or acetoacetyl-CoA thiolase HMGS 3-hydroxy-3-methylglutaryl CoA synthase HMGR 3-hydroxy-3-methylglutaryl CoA reductase MVK mevalonatekinase PMK phosphomevalonate kinase PMD mevalonate pyrophosphate decarboxylase DXS 1-deoxy-d-xylulose-5-phosphate synthaseDXR 1-deoxy-d-xylulose-5-phosphate reductoisomerase MCT 2-C-methyl-erythritol 4-phosphate cytidylytransferase CMK 4-(Cytidine51015840-diphospho)-2-C-methyl-d-erythritol kinase MDS 2-C-methyl-d-erythritol 24-cyclodiphosphate synthase HDS 1-hydroxy-2-methyl-butenyl-4-diphosphatesynthase HDR isopentenyl pyrophosphate (IPP)3 3-dimethylallyl pyrophosphate (DMAPP) synthase IPPIisopentenyl pyrophosphate isomerase GPPS geranyl diphosphate synthase FPPS farnesyl diphosphate synthase SQS squalene synthaseSE squalene epoxidase CAS cycloartenol synthase 120573-AS 120573-amyrin synthase DS dammarenediol synthase PAL phenylalanine ammonialyase C4H cinnamate 4-hydroxylase 4CL 4-coumarate CoA ligase CHS chalcone synthase CHI chalcone isomerase F3101584051015840H flavonoid3101584051015840-hydroxylase F3H flavanone 3-hydroxylase FLS flavonol synthase HCT hydroxycinnamoyl-transferase C31015840H p-coumaroyl shikimate31015840 hydroxylase CCoAoMT caffeoyl CoA 3-O-methyltransferase CCR cinnanoyl-CoA reductase CAD cinnamyl alcohol dehydrogenaseCYPs cytochrome P450s UGTs glycosyltransferases

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

8 International Journal of Genomics

Acetyl CoA

DOXP

HMG-CoA

MVA

MVA-5-phosphate

MEP

Acetoacetyl CoA

CDP-ME

CDP-MEP

HMBPP

IPP

CME-PP

IPP

GPP

FPP

23-Oxidosqualene

Cycloartenol

Triterpenoid saponins

Steroidal saponins

MVA-5-diphosphate

Squalene

DMAPPDMAPP

AACT(13)

HMGS(18)

HMGR(18)

MVK(2)

PMK(4)

PMD(3)

DXS(17)

DXR(7)

MCT(7)

CMK(4)

MDS(1)

HDS(9)

HDR(12)

IPPI IPPI(15)

GPPS(5) GPPS

FPPS

SQS(7)

SE(19)b-AS(9)

CAS(4)CYPs and UGTs

CYPs and UGTs

MV

A p

ath

way

[cy

toso

l]

ME

PD

OX

P p

ath

way

[p

last

id]

GA-3P + pyruvate

120573-amyrin

(a)

Phenylalanine

Cinnamate

p-Coumarate

p-Coumaroyl CoA

PAL(11)

C4H(10)

4CL(25)

Coniferaldehyde

p-Coumaroyl shikimate

Caffeoyl shikimate

Caffeoyl CoA

Feruloyl CoA

HCT(4)

HCT(4)

CCoAoMT(13)

CCR(6)

Coniferyl alcohol

CAD(12)Flavonol

Chalcone

Naringenin

Flavanones

Dihydroflavonols

CHS(1)

CHI(3)

F3H(60)

FLS(4)

Lignin

UGT

F39984005998400H(11)

C3998400(H)

(b)

Figure 5 Schematic of the putative biosynthetic pathways of two major classes of active compounds in P tenuifolia Saponin biosynthesispathway (a) and phenylpropanoid biosynthesis pathway (b) Enzyme abbreviations AACT acetyl CoA C-acetyltransferase or acetoacetyl-CoA thiolase HMGS 3-hydroxy-3-methylglutaryl CoA synthase HMGR 3-hydroxy-3-methylglutaryl CoA reductase MVK mevalonatekinase PMK phosphomevalonate kinase PMD mevalonate pyrophosphate decarboxylase DXS 1-deoxy-d-xylulose-5-phosphate synthaseDXR 1-deoxy-d-xylulose-5-phosphate reductoisomerase MCT 2-C-methyl-erythritol 4-phosphate cytidylytransferase CMK 4-(Cytidine51015840-diphospho)-2-C-methyl-d-erythritol kinase MDS 2-C-methyl-d-erythritol 24-cyclodiphosphate synthase HDS 1-hydroxy-2-methyl-butenyl-4-diphosphatesynthase HDR isopentenyl pyrophosphate (IPP)3 3-dimethylallyl pyrophosphate (DMAPP) synthase IPPIisopentenyl pyrophosphate isomerase GPPS geranyl diphosphate synthase FPPS farnesyl diphosphate synthase SQS squalene synthaseSE squalene epoxidase CAS cycloartenol synthase 120573-AS 120573-amyrin synthase DS dammarenediol synthase PAL phenylalanine ammonialyase C4H cinnamate 4-hydroxylase 4CL 4-coumarate CoA ligase CHS chalcone synthase CHI chalcone isomerase F3101584051015840H flavonoid3101584051015840-hydroxylase F3H flavanone 3-hydroxylase FLS flavonol synthase HCT hydroxycinnamoyl-transferase C31015840H p-coumaroyl shikimate31015840 hydroxylase CCoAoMT caffeoyl CoA 3-O-methyltransferase CCR cinnanoyl-CoA reductase CAD cinnamyl alcohol dehydrogenaseCYPs cytochrome P450s UGTs glycosyltransferases

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

International Journal of Genomics 9

used for deep sequencing for model and nonmodel plantsrecently with the biggest output and lowest reagent cost Inthis study the application of Illumina HiSeq 2000 resultedin a more comprehensive understanding of genomic infor-mation and promoted the study of secondary metabolitesbiosynthetic pathways of P tenuifolia

Triterpenoid saponins especially pentacyclic triterpe-noid saponins are one of the main active ingredients ofP tenuifolia whose biosynthetic pathways are becomingmore and more important The upstream and midstreambiosynthetic pathways of triterpenoid saponins have beenstudied very clearly in the past few years Many researchersnowmainly focus on the downstream biosynthetic pathwayswhich is also the difficult part of the studies [17] In the down-stream of biosynthesis of natural plant products hydroxyla-tion of CYP450s and glycosylation of UGTs play importantroles in stabilizing the product and altering triterpenoidsaponin bioactivity of dammarane-type and oleanane-typeaglycones [32] In this study 466 and 157 unigenes areannotated as CYP450s and UGTs in the transcriptome of Ptenuifolia (data not shown) respectively However becauseof the quantity and complexity of CYP450s and UGTsthe enzymes of the downstream biosynthetic pathway of Ptenuifolia are still unknown which can determine the specificsteps in the accumulation of triterpenoid saponins

CYP450s is a family of enzymes involved in the biosyn-thesis of lignins terpenoids sterols fatty acids hormonespigments and defense-related phytoalexins [33] Only twoCYP450s involved in triterpene saponin biosynthesis arefunctionally characterized until now including CYP88D6from Glycyrrhiza uralensis belonging to the CYP85 family[34] and CYP93E1 from Glycine max belonging to theCYP71 family [35] Recently several CYP450s andUGTswereidentified as candidate genes in many studies of medicinaland nonmedicinal plants One CYP450 (contig00248) wasselected as a candidate gene which is most likely to beinvolved in ginsenoside biosynthesis in Panax quinquefoliusby MeJA-inducible and tissue-specific expression patterns[6] Moreover Pn02132 and Pn00158 closely related toCYP88D6 andCYP93E1 were selected as candidate CYP450sinvolved in triterpene saponin biosynthesis in P notoginseng[5] Furthermore ten unigene sequences were identified cor-responding to the seven different genes with a high homologyto CYP79s and four unigenes were identified correspondingto the two CYP83 genes in core structure biosynthesis ofthe glucosinolate metabolism of radish [36] Our previousstudy once suggested that a combination of ultraperformanceliquid chromatography coupled with electrospray ioniza-tion quadrupole time-of-flight mass spectrometry based onmetabolomics and gene expression analysis can effectivelyelucidate the mechanism of biosynthesis of triterpenoidsaponin and three genes of CYP450s (CYP88D6 CYP716B1and CYP72A1) putatively expressed in biosynthesis pathwayof triterpenoid saponin in P tenuifolia were identified byquantitative real-time PCR (qRT-PCR) analysis [2] Theexamples mentioned above clarified that most identifiedCYP450s belonging to the CYP71 and CYP 88 familiesand few studies were documented about the other CYP450families until now

UGTs are another large multigene family in plants andplay an important role in the last step of biosynthesis of triter-penoid saponin Glycosylation the transfer of activated sac-charides to an aglycone substrate is the predominant mod-ification in triterpenoid saponins biosynthesis RegardingCYP450s genes some UGT genes have also been identifiedThe genes UGT74M1 and UGT73K1 from Saponaria vaccaria[37] and UGT71G1 fromMedicago truncatula [38] have beenpreviously identified Flavonoids related toUGTs (UGT78D1UGT78D2 UGT73C6 and UGT89C1) were also identifiedin previous studies [39 40] Furthermore four UGTs (con-tig01001 contig14976 contig15451 and contig16321) wereselected as candidate genes in P quinquefolius which aremost likely to be involved in ginsenoside biosynthesis byMeJA-inducible and tissue-specific expression patterns [6]Pn13895 closely related to UGT71G1 was regarded as a leadcandidate UGT which is responsible for triterpene biosyn-thesis in P notoginseng [5] Six putative flavonol glycosideswere identified in Isatis indigotica [41] Thirteen unigeneswere identified as UGT74s including UGT74B1 UGT74C1UGT74F1 andUGT74F2 in the core structure biosynthesis ofthe glucosinolate metabolism of radish [36] In our previousstudy mentioned above [2] three genes of UGTs (UGT74B1UGT73B2 and UGT73C6) putatively expressed in triter-penoid saponin biosynthetic pathways were also identifiedby qRT-PCR in P tenuifolia The examples mentioned aboveclarified that most identified UGTs belonged to the UGT74and UGT73 families and few studies have been documentedabout the other UGT families until now

In other words the genes identified in CYP450s andUGTs add little information to the knowledge of downstreamtriterpenoid saponin biosynthesis and thus we will continuewith our in-depth studies The large number of CYP450 andUGTs candidates could provide not only a potential genepool for the identification of special CYP450s and UGTsinvolved in triterpenoid biosynthesis in P tenuifolia but alsoa convenient method to characterize the roles in triterpenoidbiosynthesis in future studies

5 Conclusion

RP one of the predominant Chinese medical plants containsvarious medicinal ingredients with the elicits tonic seda-tive antipsychotic expectorant and other pharmacologicaleffects In this study a large-scale unigene investigation of Ptenuifoliawas performed by Illumina sequencing Our resultsshowed that many transcripts encoded by putative genesare involved in the biosynthesis of triterpene saponins andphenylpropanoid The data we obtained presented the mostabundant genomic resource and provided comprehensiveinformation on gene discovery transcriptome profiling tran-scriptional regulation andmolecular markers of P tenuifoliaThis study will improve the production of active compoundsthrough marker-assisted breeding or genetic engineeringfor P tenuifolia as well as other medicinal plants in thePolygalaceae family However many candidate CYP450s andUGTs are likely involved downstream of the biosyntheticpathway of triterpene saponins but these candidates shouldbe further investigated

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

10 International Journal of Genomics

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

HonglingTian andXiaoshuangXu equally contributed to thiswork

Acknowledgments

This study was supported by the National Natural SciencesFoundation of China (Grant no 31100244) the ShanxiScience and Technology Development Program (Grant no20140313010-1) the National Science-technology SupportPlan Projects (Grant no 2011BA107B05) the TechnologicalInnovation Projects in Shanxi (Grant no 2013007) and theConstruction Plan for Basic Condition Platform of Shanxi(Grant no 2014091022)

References

[1] Y Yao M Jia J-G Wu et al ldquoAnxiolytic and sedative-hypnoticactivities of polygalasaponins from Polygala tenuifolia in micerdquoPharmaceutical Biology vol 48 no 7 pp 801ndash807 2010

[2] F S Zhang X W Li Z Y Li et al ldquoUPLCQ-TOF MS-basedmetabolomics and qRT-PCR in enzyme gene screeningwith keyrole in triterpenoid saponin biosynthesis of Polygala tenuifoliardquoPLoS ONE vol 9 no 8 Article ID e105765 2014

[3] H Seki S Sawai K Ohyama et al ldquoTriterpene functionalgenomics in licorice for identification of CYP72A154 involvedin the biosynthesis of glycyrrhizinrdquo The Plant Cell vol 23 no11 pp 4112ndash4123 2011

[4] S Chen H Luo Y Li et al ldquo454 EST analysis detectsgenes putatively involved in ginsenoside biosynthesis in Panaxginsengrdquo Plant Cell Reports vol 30 no 9 pp 1593ndash1601 2011

[5] H Luo C Sun Y Sun et al ldquoAnalysis of the transcriptome ofPanax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markersrdquo BMC Genomics vol12 no 5 article S5 2011

[6] C Sun Y Li Q Wu et al ldquoDe novo sequencing and analysisof the American ginseng root transcriptome using a GS FLXTitanium platform to discover putative genes involved inginsenoside biosynthesisrdquo BMC Genomics vol 11 article 2622010

[7] W Gao H-X Sun H Xiao et al ldquoCombining metabolomicsand transcriptomics to characterize tanshinone biosynthesis inSalvia miltiorrhizardquo BMC Genomics vol 15 article 73 2014

[8] L Liu Y Li S Li et al ldquoComparison of next-generationsequencing systemsrdquo Journal of Biomedicine and Biotechnologyvol 2012 Article ID 251364 11 pages 2012

[9] Y Yang M Xu Q Luo J Wang and H Li ldquoDe novo tran-scriptome analysis of Liriodendron chinense petals and leaves byIllumina sequencingrdquo Gene vol 534 no 2 pp 155ndash162 2014

[10] S Fowler and M F Thomashow ldquoArabidopsis transcriptomeprofiling indicates that multiple regulatory pathways are acti-vated during cold acclimation in addition to the CBF coldresponse pathwayrdquo Plant Cell vol 14 no 8 pp 1675ndash1690 2002

[11] R Zhai Y Feng H Wang et al ldquoTranscriptome analysis of riceroot heterosis by RNA-SeqrdquoBMCGenomics vol 14 no 1 article19 2013

[12] I Zouari A SalvioliM Chialva et al ldquoFrom root to fruit RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis mayaffect tomato fruit metabolismrdquo BMC Genomics vol 15 article221 2014

[13] S Kaur L W Pembleton N O I Cogan et al ldquoTranscriptomesequencing of field pea and faba bean for discovery andvalidation of SSR genetic markersrdquo BMC Genomics vol 13 no1 article 104 2012

[14] U Chandrasekaran W Xu and A Liu ldquoTranscriptome profil-ing identifiesABAmediated regulatory changes towards storagefilling in developing seeds of castor bean (Ricinus communisL)rdquoCell amp Bioscience vol 4 no 1 article 33 2014

[15] T Zhang X Zhao W Wang et al ldquoDeep transcriptomesequencing of rhizome and aerial-shoot in Sorghum propin-quumrdquo Plant Molecular Biology vol 84 no 3 pp 315ndash327 2014

[16] Y Ikeya S Takeda M Tunakawa et al ldquoCognitive improvingand cerebral protective effects of acylated oligosaccharides inPolygala tenuifoliardquo Biological and Pharmaceutical Bulletin vol27 no 7 pp 1081ndash1085 2004

[17] J M Augustin V Kuzina S B Andersen and S BakldquoMolecular activities biosynthesis and evolution of triterpenoidsaponinsrdquo Phytochemistry vol 72 no 6 pp 435ndash457 2011

[18] Y Wang X Wang T-H Lee S Mansoor and A H PatersonldquoGene body methylation shows distinct patterns associatedwith different gene origins and duplication modes and hasa heterogeneous relationship with gene expression in Oryzasativa (rice)rdquo New Phytologist vol 198 no 1 pp 274ndash283 2013

[19] V Tzin and G Galili ldquoThe biosynthetic pathways for shikimateand aromatic amino acids in Arabidopsis thalianardquo The Ara-bidopsis Book vol 8 Article ID e0132 2010

[20] W Boerjan J Ralph and M Baucher ldquoLignin biosynthesisrdquoAnnual Review of Plant Biology vol 54 pp 519ndash546 2003

[21] J C DrsquoAuria and J Gershenzon ldquoThe secondary metabolism ofArabidopsis thaliana growing like a weedrdquo Current Opinion inPlant Biology vol 8 no 3 pp 308ndash316 2005

[22] B Langmead C Trapnell M Pop and S L Salzberg ldquoUltrafastand memory-efficient alignment of short DNA sequences tothe human genomerdquo Genome Biology vol 10 no 3 article R252009

[23] R L Tatusov E V Koonin and D J Lipman ldquoA genomicperspective on protein familiesrdquo Science vol 278 no 5338 pp631ndash637 1997

[24] R L Tatusov N D Fedorova J D Jackson et al ldquoTheCOG database an updated vesion includes eukaryotesrdquo BMCBioinformatics vol 4 article 41 2003

[25] M Kanehisa S Goto M Hattori et al ldquoFrom genomics tochemical genomics new developments in KEGGrdquoNucleic AcidsResearch vol 34 pp D354ndashD357 2006

[26] E M Zdobnov and R Apweiler ldquoInterProScanmdashan integrationplatform for the signature-recognition methods in InterPrordquoBioinformatics vol 17 no 9 pp 847ndash848 2001

[27] M L Wise and R Croteau ldquoMonoterpene biosynthesisrdquo inComprehensive Natural Products Chemistry p 2 Elsevier 1998

[28] C A Schuhr T Radykewicz S Sagner et al ldquoQuantitativeassessment of crosstalk between the two isoprenoid biosynthe-sis pathways in plants by NMR spectroscopyrdquo PhytochemistryReviews vol 2 no 1-2 pp 3ndash16 2003

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

International Journal of Genomics 11

[29] W Eisenreich M Schwarz A Cartayrade D Arigoni M HZenk andA Bacher ldquoThe deoxyxylulose phosphate pathway ofterpenoid biosynthesis in plants and microorganismsrdquo Chem-istry and Biology vol 5 no 9 pp R221ndashR233 1998

[30] J D Park D K Rhee and Y H Lee ldquoBiological activitiesand chemistry of saponins from Panax ginseng C A MeyerrdquoPhytochemistry Reviews vol 4 no 2-3 pp 159ndash175 2005

[31] M Rohmer ldquoMevalonate-independent methylerythritol phos-phate pathway for isoprenoid biosynthesis elucidation anddistributionrdquo Pure and Applied Chemistry vol 75 no 2-3 pp375ndash387 2003

[32] S Kalra B L PuniyaD Kulshreshtha et al ldquoDe novo transcrip-tome sequencing reveals important molecular networks andmetabolic pathways of the plant chlorophytum borivilianumrdquoPLoS ONE vol 8 no 12 Article ID e83336 2013

[33] A H Meijer E Souer R Verpoorte and J H C HogeldquoIsolation of cytochrome P450 cDNA clones from the higherplant Catharanthus roseus by a PCR strategyrdquo Plant MolecularBiology vol 22 no 2 pp 379ndash383 1993

[34] H Seki K Ohyama S Sawai et al ldquoLicorice beta-amyrin 11-oxidase a cytochrome P450 with a key role in the biosynthesisof the triterpene sweetener glycyrrhizinrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 105 no 37 pp 14204ndash14209 2008

[35] M Shibuya M Hoshino Y Katsube H Hayashi T KushiroandY Ebizuka ldquoIdentification of120573-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functionalexpression assayrdquo FEBS Journal vol 273 no 5 pp 948ndash9592006

[36] Y Wang Y Pan Z Liu et al ldquoDe novo transcriptome sequenc-ing of radish (Raphanus sativus L) and analysis of major genesinvolved in glucosinolate metabolismrdquo BMC Genomics vol 14article 836 2013

[37] D Meesapyodsuk J Balsevich D W Reed and P S CovelloldquoSaponin biosynthesis in Saponaria vaccaria cDNAs encoding120573-amyrin synthase and a triterpene carboxylic acid glucosyl-transferaserdquo Plant Physiology vol 143 no 2 pp 959ndash969 2007

[38] L Achnine D V Huhman M A Farag L W Sumner JW Blount and R A Dixon ldquoGenomics-based selection andfunctional characterization of triterpene glycosyltransferasesfrom themodel legumeMedicago truncatulardquoPlant Journal vol41 no 6 pp 875ndash887 2005

[39] K Yonekura-Sakakibara T Tohge F Matsuda et al ldquoCom-prehensive flavonol profiling and transcriptome coexpressionanalysis leading to decoding gene-metabolite correlations inArabidopsisrdquo Plant Cell vol 20 no 8 pp 2160ndash2176 2008

[40] R Yin B Messner T Faus-Kessler et al ldquoFeedback inhibitionof the general phenylpropanoid and flavonol biosynthetic path-ways upon a compromised flavonol-3-O-glycosylationrdquo Journalof Experimental Botany vol 63 no 7 pp 2465ndash2478 2012

[41] J Chen X Dong Q Li et al ldquoBiosynthesis of the active com-pounds of Isatis indigotica based on transcriptome sequencingand metabolites profilingrdquo BMC Genomics vol 14 article 8572013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology