12864_2012_4589_moesm1_esm.docx - springer static …10.1186/1471 … · web...
TRANSCRIPT
Additional_file_3_Ashrafi_et_al_2012_Pepper_Annotation_Supp_05072012.docxA Microsoft-Word 2007 file with 16 figures comparing the results of Blast2GO for GeneChip (Sanger-EST) and transcriptome assemblies of pepper as well as the IGA transcriptome assembly procedure flow chart.
De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes
AUTHORS:
Hamid Ashrafi, Theresa Hill, Kevin Stoffel, Alexander Kozik, Jiqiang Yao, Sebastian Reyes Chin-Wo and Allen Van Deynze
12
345
6
7
8
9
10
11
1213
14
15
16
17
18
19
20
21
22
23
24
25
26
a
b
Supplement Figure 1. Distribution of E-Values of BLASTX of a) the Sanger-EST unigenes b) IGA transcriptome contigs
1
2
3
4
a
b
Supplement Figure 2. Percent Similarity of assembly sequences with sequences in the GenBank a) Sanger-EST unigenes b) IGA transcriptome contigs. Similarity is computed of each query-hot pair as the sum of similarity values for all matching HSPs
1
234
5
a
b
Supplement Figure 3. Length vs number of sequences in a) Sanger-EST unigenes b) IGA transcriptome contigs.
1
23
4
5
a
b
Supplement Figure 4. High-scoring segment pairs (HSP) per sequence coverage a) Sanger-EST unigenes b) IGA transcriptome contigs.
1
234
a
b
Supplement Figure 5. Evidence code distribution1 sequences depicts the inference about the annotation. For instance IEA is inferred from electronic assay, or IDA inferred from direct assay. a) Sanger-EST unigenes b) IGA transcriptome contigs.
1 Once mapping has been completed, the user can check the distribution of evidence codes in the recovered GO terms and the original database sources of annotations. These charts give an indication of suitable values for B2G annotation parameters. For example, when a good overall level of sequence similarity is obtained for the dataset, the default annotation cutoff value could be raised to improve annotation accuracy. Similarly, if evidence code charts indicate a low representation of experimentally derived GOs, the user might choose to increase the weight given to annotations. After the final annotation step, new charts show the distribution of annotated sequences, electronic the number of GOs per sequence, the number of sequences per GO, and the distribution of annotations per GO level, which jointly provide a general overview of the performance of the annotation procedure.
123
123456789
a
b
Supplement Figure 6. Evidence code distribution for BLAST hits depicts the inference about the annotation. For instance IEA is inferred from electronic assay, or IDA inferred from direct assay. a) Sanger-EST unigenes b) IGA transcriptome contigs.
12345
a
b
Supplement Figure 7. Number of high similarity pairs per BLAST hit a) Sanger-EST unigenes b) IGA transcriptome contigs.
123
4
5
6
a
b
Supplement Figure 8. Database resources that were used for mapping step of BLAST2GO a) Sanger-EST unigenes b) IGA transcriptome contigs.
1
23
4
5
6
0
500
1000
1500
2000
2500
3000
Num
ber o
f Con
tigs
Number of GO terms
0
1000
2000
3000
4000
5000
6000
7000
8000
Num
ber o
f Con
tigs
Number of GO terms
a
b
Supplement Figure 9. Number of GO terms per contigs. a) On average (weighted average) 5 GO terms was mapped to 19,966 (64%) contigs of Sanger-EST assembly. b) on average (weighted average) between 5 GO terms was mapped to 37,000 (30%) contigs of IGA transcriptome assembly.
1
2345
6
a
b
Supplement Figure 10. Number of annotations at each GO level. P for Biological Processes, F for Molecular Function and C stands for Cellular components. a) Sanger-EST unigenes b) IGA transcriptome contigs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
222324
25
26
PigmentationNitrogen utilization
Biological adhesionLocomotion
Cell killingViral reproduction
Cell proliferationRhythmic processCarbon utilization
Immune system processDeath
GrowthCell wall organization or biogenesis
Multi-organism processReproduction
Cellular component biogenesisSignaling
Cellular component organizationMulticellular organismal process
Developmental processLocalization
Biological regulationResponse to stimulus
Metabolic processCellular process
Number of Sequences
Biol
ogic
al P
roce
sse
Direct Go Counts of Biological Processes a
Amine bindingEnzyme activator activity
Nucleoside-triphosphatase regulator activityCarboxylic acid binding
Peroxidase activityMetal cluster binding
Enzyme inhibitor activityLipid binding
Carbohydrate bindingVitamin binding
Isomerase activityTetrapyrrole binding
Signal transducer activityStructural constituent of ribosome
Lyase activitySequence-specific DNA binding TF activity
Ligase activityCofactor binding
Substrate-specific transporter activityTransmembrane transporter activity
Oxidoreductase activityNucleic acid binding
Protein bindingHydrolase activity
Ion bindingNucleotide binding
Transferase activity
Number of Sequences
Bio
logi
cal F
unct
ion
Direct GO Counts of Molecular Functions
b
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
c
Receptor complexPeriplasmic spaceApical part of cell
Extracellular spaceExternal encapsulating structure part
Extracellular matrixCell surface
Beta-galactosidase complexProtein histidine kinase complex
Cell fractionIntrinsic to organelle membrane
Vesicle membraneGolgi membrane
Serine/threonine phosphatase complexNetwork of nuclear outer & ER membranes
Organelle subcompartmentExternal encapsulating structure
Endomembrane systemOrganelle envelope
EnvelopeMembrane-bounded vesicle
Organelle lumenOrganelle membrane
Membrane partIntracellular organelle part
MembraneIintracellular part
Intracellular
Number of Sequnces
Cellu
lar C
ompo
nent
Direct Go Counts of Cellular Components
a
Sulfur utilizationNitrogen utilization
PigmentationBiological adhesion
Cell killingViral reproduction
LocomotionCell proliferation
Carbon utilizationRhythmic process
Immune system processCell wall organization or biogenesis
GrowthDeath
Multi-organism processCellular component biogenesis
ReproductionSignaling
Cellular component organizationDevelopmental process
Multicellular organismal processLocalization
Biological regulationResponse to stimulus
Metabolic processCellular process
Number of Sequence
Bio
logi
cal P
roce
ss
Direct GO Counts of Biological Processes
Supplement Figure 11. Direct GO count graphs depicting, a) Biological processes b) Cellular components and c) Molecular functions in the Sanger-EST assembly.
1
2
3
4
5
6
7
8
9
10
11
12
13
1415
16
17
18
19
20
21
22
23
24
25
26
c
Proton-transporting ATP synthase complexNADH dehydrogenase complex
Respiratory chain complex ICoated membrane
Cytoplasmic vesicle partOrganelle outer membrane
Membrane coatOuter membrane
Proton-transporting two-sector ATPase complexProteasome complex
Nuclear envelopeExtrinsic to membrane
PhotosystemRespiratory chain
Endoplasmic reticulum membraneMitochondrial membrane part
Ubiquitin ligase complexOrganelle inner membranePhotosynthetic membrane
Thylakoid partThylakoid
cell wallPlasma membrane part
Ribonucleoprotein complexIntrinsic to membrane
Plasma membraneCytoplasmic part
CytoplasmIntracellular organelle
Number of Sequences
Cellu
lar C
ompo
nent
Direct GO Counts of Cellular Components
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Supplement Figure 12 The Direct GO count graphs depicting, a) Biological processes b) Cellular components c) Molecular functions in the IGA transcriptome assembly.
1
2
3
4
56
a
b
Supplement Figure 13. The relationship between number of Go terms and length of sequences. a) Sanger-EST unigenes b) IGA transcriptome contigs.
1
23
4
5
6
a
b
Supplement Figure 14. Distribution of annotation score vs. number of sequences a) Sanger-EST unigenes b) IGA transcriptome contigs.
1
23
4
5
6
a
b
Supplement Figure 15. The relationship between length of sequence and annotation a) Sanger-EST unigenes b) IGA transcriptome contigs.
1
23
4
5
Supplement Figure 16: A flow chart of steps taken to assemble pepper IGA reads. Super assembly comprises of the combined assembly of Velvet K-mers or CLC workbench iterations (within each square box two super assemblies). The assembly of each super assembly is depicted by different colors to show Mega assemblies (immediately below each box). The Mega assemblies were combined to make Meta assembly (navy blue box marked as reference sequence).
123456
7