1 comparative analyses of the potato and tomato transcriptomes david francis, allenvan deynze, john...
TRANSCRIPT
![Page 1: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/1.jpg)
1
Comparative analyses of the potato Comparative analyses of the potato and tomato transcriptomesand tomato transcriptomes
David Francis, AllenDavid Francis, AllenVan DeynzeVan Deynze, John Hamilton, , John Hamilton, Walter De Jong, David Douches, Sanwen Huang, Walter De Jong, David Douches, Sanwen Huang, and C. and C. Robin BuellRobin Buell
Supported by the AFRI Plant Breeding, Genetics, and Supported by the AFRI Plant Breeding, Genetics, and Genomics Program of USDA’s National Institute of Food and Genomics Program of USDA’s National Institute of Food and
AgricultureAgriculture
![Page 2: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/2.jpg)
2
Questions
International Sol Project: How can a common set of genes/proteins give rise to such a wide range of morphologically and ecologically distinct organisms?
SolCAP: How can variation be harnessed to improve varieties that benefit the consumer, processors, and the environment?
Sequence data available to address these questions: Draft genome for doubled monoploid DM1-3 516R44 (S. tuberosum L. Phureja group); S. tuberosum, S. lycopersicum, S. pimpinellifolium GAII transcriptomes
Technology
Next Generation Sequencing
SNP genotyping
![Page 3: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/3.jpg)
3
What comparisons do we want to make?
How well do S. tuberosum expressed sequences align to DM1-3 516R44 genomic sequences?
How well do S. lycopersicum expressed sequences align to DM1-3 516R44 genomic sequences?
How is variation distributed within a Species?
within a market class?
within a variety?
within a gene?
Which sequence variation is important to phenotypic variation?
![Page 4: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/4.jpg)
Library creation/QC GAII sequencing (single and paired end)
Data Collection Assembly
400
300
Analysis:transcriptome complexitySNP calling/validation
identification of genes under selection
![Page 5: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/5.jpg)
Sample Total ClustersTotal PE Reads
PF Passed Clusters
% PF Passed Clusters
Total PE PF Reads
Actual PE Reads
Atlantic 1 7,601,277 15,202,554 6,382,748 83.97 12,765,496
Atlantic 2 10,544,542 21,089,084 9,252,168 87.74 18,504,336 30,185,186
Premier 1 7,812,394 15,624,788 6,652,121 85.15 13,304,242
Premier 2 11,678,379 23,356,758 9,999,926 85.63 19,999,852 31,949,096
Snowden 1 7,996,418 15,992,836 6,837,553 85.51 13,675,106
Snowden 2 11,781,671 23,563,342 10,393,322 88.22 20,786,644 33,288,120
Illumina GA II Output for Potato
![Page 6: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/6.jpg)
Velvet Assemblies of Potato Illumina Sequences
• With a minimum kmer of 31 and a minimum contig length of 150bp:
Variety Total GbTranscriptome
Size (Mb)No.
Contigs N50 (bp)Maximum
Contig (Kb)
Atlantic 1.8 38.4 45215 666 11.2
Premier 1.9 38.2 54917 408 6.6
Snowden 2.0 38.2 58754 358 6.9
![Page 7: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/7.jpg)
Velvet Assemblies of Potato Illumina Sequences
• Atlantic:• 45214 contigs• 32520 align with GMAP(95%id, 50%cov)• 27106 align with GMAP(95%id, 90%cov)
• Premier:• 54917 contigs• 41497 align with GMAP (95%id, 50%cov)• 37297 align with GMAP (95%id, 90%cov)
• Snowden:• 58754 contigs• 44479 align with GMAP (95%id, 50%cov)• 40708 align with GMAP (95%id, 90%cov)
Alignment of S. tuberosum GAII-transcriptome contigs to the PGSC draft genome sequence from DM1-3 516R44:
![Page 8: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/8.jpg)
Tomato Illumina GA II Output
VarietyInsert Size
Read Length Total Reads PF Reads %PF Passed Total PF
FL7600 300 61/47 22,491,304 20,685,342 92.0FL7600 300 60 16,025,976 14,382,577 89.8FL7600 300 60 15,645,164 13,985,875 89.4 49,053,794
NC84173 350 61/61 27,079,946 22,687,626 83.8NC84173 350 60 11,058,431 10,366,811 93.8NC84173 350 60 14,401,240 12,687,134 88.1 52,539,617OH9242 350 61/47 26,960,898 24,874,218 92.3OH9242 350 60 10,316,775 9,671,753 93.8OH9242 350 60 14,676,814 12,879,812 87.8 51,954,487
T5 350 61/47 26,799,944 24,677,302 92.1T5 350 60 16,822,639 14,738,351 87.6T5 350 60 15,726,257 13,744,511 87.4 59,348,840
PI114490 350 61/47 17,721,226 16,422,842 92.7PI114490 350 60 17,115,349 14,902,672 87.1PI114490 350 60 17,890,649 15,248,587 85.2 52,727,224PI212816 350 61/47 17,631,906 16,450,422 93.3PI212816 350 60 18,238,179 15,354,882 84.2PI212816 350 84 21,829,622 18,500,235 84.8 57,699,707
![Page 9: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/9.jpg)
Variety Total GbTranscriptome
Size (Mb)No.
Contigs N50 (bp)Maximum
Contig (Kb)
FL7600 2.82 39.8 59,581 424 12.1
NC84173 2.77 39.2 60,534 496 13.3
OH9242 2.70 39.1 59,051 476 11.6
T5 3.04 40.6 60,031 632 14
PI114490 2.70 41 61,310 690 11.7
PI212816 3.00 41.1 66,118 471 14
Velvet Assemblies of TomatoIllumina Sequences
• With a k-mer length of 31 and a minimum contig length of 150bp:
![Page 10: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/10.jpg)
Sequence quality: Viewing an Atlantic potato contig from the Velvet assembly
![Page 11: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/11.jpg)
FL7600 (93.7 % id; 94.4 % coverage)
Snowden (97.9; 94.7)
Alignment of contigs relative to DM1-3 516R44
![Page 12: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/12.jpg)
Query SNPs Filtered SNPs
Atlantic Asm 224748 150669
Premier Asm 265673 181800
Snowden Asm 258872 166253
Identify intra-varietal SNPs
A/C SNP
![Page 13: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/13.jpg)
Filtered SNP counts
Ref Query d 10 d 20 d 30 d 40 d 50 d 60 d 100
atlantic atlantic 21336 17509 14493 12150 10277 8673 4435
atlantic premier 21789 18050 15084 12477 10584 8919 4620
atlantic snowden 19997 16518 13694 11378 9689 8048 4173
premier atlantic 21117 17096 14106 11785 9790 8222 4228
premier premier 22951 18431 15016 12377 10300 8703 4371
premier snowden 20972 16846 13709 11357 9479 7873 4113
snowden atlantic 20777 16998 13984 11619 9647 8131 4186
snowden premier 22101 17888 14701 12068 10124 8650 4223
snowden snowden 21083 16963 13792 11218 9359 7735 3896
Filtering on SNP quality and 1 SNP/ 150bp window
![Page 14: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/14.jpg)
depth of coverage
No. SNPs
Filtered SNP countsFiltering on SNP quality and 1 SNP/ 150bp window
0
5000
10000
15000
20000
25000
Validationrate
40
45
50
55
60
65
70
75
80
85
90
![Page 15: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/15.jpg)
Genotyping platforms….
Comments on quality control…
Data….
direct comparison of sequence
analysis of SNPs across populations
![Page 16: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/16.jpg)
COS
R-gene
Comparison of two genes on tomato chromosome 9 BAC
![Page 17: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/17.jpg)
COSII
Fresh Market vs Fresh Market Identities = 573/573 (100%), Gaps = 0/573 (0%)Fresh Market vs Processing Identities = 569/569 (100%), Gaps = 0/569 (0%)S. lycopersicum vs S. pimpinellifolium Identities = 339/341 (99%), Gaps = 0/341 (0%)Potato vs Potato Identities = 606/612 (99%), Gaps = 0/612 (0%)Tomato vs Potato Identities = 914/948 (96%), Gaps = 6/948 (0%)
![Page 18: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/18.jpg)
DIVERGED SEQUENCE
Fresh Market vs Fresh Market Identities = 959/959 (100%), Gaps = 0/959 (0%)Fresh Market vs Processing Identities=1560/1560(100%), Gaps=0/1560 (0%)S. lycopersicum vs S. pimpinellifolium Identities = 612/613 (99%), Gaps = 0/613 (0%)Tomato vs Potato Identities = 223/280 (79%), Gaps = 11/280 (3%)Potato vs Potato Identities = 246/278 (88%), Gaps = 7/278 (2%)
![Page 19: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/19.jpg)
What patterns do we expect to see for genes “under selection”?•Low Variation (fixed)•High Ka/Ks (mutations affect protein, possible diversifying selection)•Mutations (loss of function)•FST (genes that distinguish populations)
![Page 20: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/20.jpg)
![Page 21: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/21.jpg)
All 173 markers (K=6)
89 Coding markers (K=5)
84 Non-coding markers (K=6)
Processing Fresh-market Vintage Landrace
500K burnin/750K MCMC reps, 20 runs for each K from 3 to 8
Population structure: coding vs. non-coding
CA & OH OH
CA OH OH
CN
CN
![Page 22: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/22.jpg)
Distribution of FST for genes
ovate: 0fw2.2: 0sp6: 0.14
ovate: 0.26fw2.2: 0sp6: 0.73
ovate: 0.31fw2.2: 0sp6: 0.47
ovate: 0fw2.2: 0.5sp6: 1
ovate: 0fw2.2: 0.42sp6: 0.74
ovate: 0.14fw2.2: 0.46sp6: 0.05
![Page 23: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/23.jpg)
Marker Chrom SourceNo. of SNPs
LEOH14 unknown EST542533 2 Pathogenesis-related leaf protein LEOH16 5 EST301659 4 drought-induced protein LEOH17 multiple EST551464 5 putative alcohol dehydrogenaseLEOH20 unknown EST327354 2 ultraviolet-B-repressible protein LEOH23 2 EST546919 3 Vesicle-associated membrane LEOH25 9 EST511738 3 gamma hydroxybutyrate LEOH29 unknown EST243853 2 A96510 protein F27F5.25 LEOH31 9 EST583372 10 chlorophyll synthetase G4 LEOH32 9 EST358606 2 acetyl-CoA C-acyltransferase LEOH33 9 EST471439 5 endo-1,3-1,4-beta-D-glucanaseLEOH34 9 EST435427 6 tuberisation-related proteinLEOH35 9 EST549543 8 photosystem II protein W CosOH7 2 TOVBN23 5 putative MYB transcription factorLEVCOH11 1 AJ785180. 2 late embryogenesis (Lea)-like
Examples of highly polymorphic genes within S. lycopersicum
0.304 Associated with introgression0.043 Duplicated genes0.087 Ethylene induced0.174 Pathogen induced0.130 Abiotic Stress induced
Note: I am working on a replacement that compares Ka/Ks for selected tomato and potato genes
![Page 24: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/24.jpg)
Marker Chrom SourceNo. of SNPs
LEOH14 unknown EST542533 2 Pathogenesis-related leaf protein LEOH16 5 EST301659 4 drought-induced protein LEOH17 multiple EST551464 5 putative alcohol dehydrogenaseLEOH20 unknown EST327354 2 ultraviolet-B-repressible protein LEOH23 2 EST546919 3 Vesicle-associated membrane LEOH25 9 EST511738 3 gamma hydroxybutyrate LEOH29 unknown EST243853 2 A96510 protein F27F5.25 LEOH31 9 EST583372 10 chlorophyll synthetase G4 LEOH32 9 EST358606 2 acetyl-CoA C-acyltransferase LEOH33 9 EST471439 5 endo-1,3-1,4-beta-D-glucanaseLEOH34 9 EST435427 6 tuberisation-related proteinLEOH35 9 EST549543 8 photosystem II protein W CosOH7 2 TOVBN23 5 putative MYB transcription factorLEVCOH11 1 AJ785180. 2 late embryogenesis (Lea)-like
Examples of highly polymorphic genes within S. lycopersicum
non-synonymous 0.53synonymous 0.37non-coding 0.09
Note: I am working on a replacement that compares Ka/Ks for selected tomato and potato genes
![Page 25: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/25.jpg)
2 3 11 127 8 9 105 641
2
3
4
5
6
7
8
9
<0.001
<0.0001
Combined
12
1
10
11
r 2 value
Chr
omso
me
P v
alue
>0.05
<0.05
<0.01
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.000.70 0.80 0.90 1.000.00 0.10 0.20 0.30 0.40 0.50 0.60
12
r 2 value
6 7 8 9 10 11
10
<0.000111
12
1 2 3 4 5
4
<0.01
5
6
<0.0017
8
9
Fresh Market
Chr
omso
me
1
P v
alue
>0.05
2
<0.053
Processing Fresh Market Vintage Wild
Distribution of PM genes across populations is not random
![Page 26: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/26.jpg)
Visit us at http://solcap.msu.edu/
Tools, Downloads
![Page 27: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/27.jpg)
Conclusions~5.7 Gb PF potato transcriptome sequence (3 varieties)
~14.3 Gb PF tomato transcriptome sequence (6 varieties)
DM1-3 516R44 draft genome is an excellent scaffold for potato and tomato GAII transcriptome alignments.
SNPs are not evenly distributed in genes/genomes
Genes with signatures of selection (Ka/Ks; high FST) tend to be genes associated with response to abiotic and biotic stress.
Co-adapted complexes result from selection during plant breeding.
Lessons Learned: Control GAII Sequence of DM1-3 516R44 would permit bioinformatic optimization or pipelines rather than relying on empirical validation.
![Page 28: 1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f285503460f94c40915/html5/thumbnails/28.jpg)
Acknowledgments
Collaborators, OSUMatt Robbins
Sung-Chur SimTroy Aldrich
Collaborators, CAUWencai Yang
Collaborators, CAASSanwen Huang
FundingUSDA/AFRIThis project is supported by the Agriculture and Food Research Initiative of USDA’s National Institute of Food and Agriculture.
Collaborators, MSUDavid DouchesC Robin BuellJohn Hamilton
Kelly Zarka
Collaborators, CornellWalter de JongLucas MuellerJoyce van Eck
Collaborators, UCDAllen Van Deynze
Kevin StoffelAlex Kozic