the tomato genome re-seq project - university of florida - flinkers.pdf · ignores differences...

33
The tomato genome re-seq project http://www.tomatogenome.net 5 February 2013, Richard Finkers & Sjaak van Heusden

Upload: others

Post on 23-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

The tomato genome re-seq project

http://www.tomatogenome.net

5 February 2013, Richard Finkers & Sjaak van Heusden

Page 2: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Rationale

Genetic diversity in commercial tomato germplasm relatively narrow

Unexploited genetic diversity available in land races and old varieties?

Cultivated tomato has lost valuable traits during domestication

Wild species - source of genetic diversity

● Diverse habitat ● Variation in flowers and fruits ● Variation in mating systems

Most wild species can be crossed with cultivated tomato (introgression breeding)

Page 3: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Rationale

Tomato Genome (Re-) Sequencing Project • Identify alleles underpinning phenotypic diversity

across the entire genome and entire tomato clade

Page 4: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Acknowledgement: Sjaak van Heuden, Paris market

Page 5: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Tomato fruit shape variation

Rodríguez et al (2011) Plant physiology 156: 275-85

Page 6: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

EU-SOL core collection

https://www.eu-sol.wur.nl Information:

Marker data Phenotype data Passport data

Markers 20 (7000 -> 1000) 384 (1000 -> 200) 7500 ( 200 -> 34)

Selected landraces for (re-)sequencing

200 landraces

1000 landraces

> 7000 landraces

Acknowledgement: Dani Zamir et al. & Keygene N.V.

Page 7: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Landraces & old cultivar collection

Page 8: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Fruit phenotypes EU-SOL collection

Page 9: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Improving with exotic genetic libraries

Wild tomato species are valuable candidate for novel alleles

Dani Zamir, Nature Reviews Genetics 2, 983-989 (December 2001)

Page 10: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Improving with exotic genetic libraries

Moyle 2008

Phylogenetic relationships in the Solanum clade

Page 11: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

51

(re-)sequencing collection

Lycopersicon group

Arcanum group

Eriopersicon group

Neolycopersicon group

2 6 4

3 2 2 1 3 2 7 2

Tree according to Anderson et al. (2010), redrawn from Moyle 2008

Page 12: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Genome Alignment

Read mapping to cv. Heinz Genome structure

wild tomato relatives?

Page 13: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Lycopersicon group

Arcanum group

Eriopersicon group

Neolycopersicon group

Reference genomes: De novo assembly selection

Heinz1706

LA 2157

LYC 4

LA 716

Presenter
Presentation Notes
Rationale: (Nearly) homozygous accessions Inbred over a few generations Representative for re-seq read mapping
Page 14: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Data production

84 Resequenced genomes ● 500 bp, 2x100 bp Paired-end Illumina

● Average coverage 41x

3 de novo genomes (S. arcanum, S. habrochaites, S. pennellii) ● 170 bp, 2x 100 bp Paired end Illumina

● 2 kb, 2 x 100 bp Mate-paired end Illumina

● 8 kb matepair (454)

● 20 kb matepair (454)

● Average coverage 205x

Page 15: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Genomic sequencing libraries

Page 16: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

K-mer graph

0

100

200

300

400

500

600

700

800

900

1000

0 10 20 30 40 50 60 70 80 90 100

31

-mer

vol

um

e M

illio

ns

31-mer frequency

31-mer histogram

'001'

FIT

'045'

FIT

'046'

FIT

'053'

FIT

'054'

FIT

'058'

FIT

'072'

FIT

'074'

FIT

Data: 500 bp, 2x100 bp Paired-end Illumina

Acknowledgement: Theo Borm

Page 17: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

K-mer exploration

Fitted modi ● Homozygous ● Heterozygous ● Duplicated (2x)

Conclusions

● % heterozygosity is neglectable

● Duplicated portion is not neglectable

0

50

100

150

200

250

300

30 50 70 90

31

-mer

vol

um

e M

illio

ns

31-mer frequency

31-mer histogram '001'

FIT

'045'

FIT

'046'

FIT

'053'

FIT

'054'

FIT

'058'

FIT

'072'

FIT

'074'

FIT

Page 18: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Genome size estimates

Genomic K-mer based estimate Ignores differences GC-AT

ratio Underestimation

Nr Species

Est. Size (Mb)

Draft Size (Mb)

%CP

01 SL 723 1.9 Heinz 760

45 SP 749 1.9 46 SP 775 6.3

LA1589 739 53 SG 728 4.4 54 SC 760 6.2 58 SA 830 3.0 72 SH 779 7.1 74 SP 962 8.6

Acknowledgement: Theo Borm

The Tomato Genome Consortium Nature 485, 635–641 (2012)

Page 19: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Optimizing assembly strategy

Page 20: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Checking assebly integrity

Average completeness per 10 contigs: ALL-PATHS (96.62%) CLC-BIO (74.62%)

Heinz dot plot

SL2.40 ch11 – region (1 Mbp)

Page 21: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Status de novo assembly genomes

Page 22: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Status de novo assembly genomes

N50 N90 Longest Shortest Mean Median N

Contigs Total

length

Heinz 1706 reference

16,467,796

3,041,128

42,121,211 2000

242,428

2,847

3,223

781,345,411

S. habrochaites_allpaths

90,424

12,290

990,035 902

43,409

20,461

16,935

735,128,396

S. habrochaites_scaf

515,730

104,925

3,252,897 902

130,475

9,758

5,873

766,277,628

S. pennellii_allpaths

64,671

7,460

627,722 887

27,680

11,008

26,589

735,990,792

S. pennellii_scaf

206,135

38,969

1,269,801 887

49,209

5,932

15,886

781,730,072

S. arcanum_clc

18,651

2,524

241,690 200

2,869

428

290,145

832,461,203

Page 23: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Conclusions

Sequencing completed Quality and coverage threshold satisfied Cleaning resequencing data completed De novo assembly of S. habrochaites and S. pennelli

comparable with tomato reference De novo assembly of S. arcanum in progress Read mapping and SNP analysis finished

Page 24: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

And now the fun begins...

Page 25: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Average SNP rate/KB (vs. SL2.40)

Page 26: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Homozygous vs Heterozygous feature rate

Page 27: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Exploring the FW9-2-5 locus (Lin5)

Sucrose synthase gene Cloned from S. pennellii amino acid substitutions:

● 2878 (Asp in LP to Glu in LE)

● 2932 (Asp to Asn) ● 2953 (Val to Leu)

Fridman et al. Proc Natl Acad Sci U S A. 2000 Apr 25;97(9):4718-23.

Page 28: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

FW9-2-5 variation (Lin5)

S. galapagense

Page 29: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Needs

Whole genome variant catalogue Annotation for the three wild species genomes Pan genome reconstruction How good is our sampling?

Page 30: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Perspectives

Direct application for Reverse genetics studies ● Use identified allelic variation ● Calculate distance based on all genes?

Better understanding of genome organization ● Improve introgression breeding ● Homozygous vs. hetrerozygous features ● Scan for inversions

Diamond jewelry?

Page 31: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

150 tomato genome consortium

Page 32: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Questions

Project site:

● http://www.tomatogenome.net

Phenotype data & Images:

● https://www.eu-sol.wur.nl

SOL100:

● http://solgenomics.net or http://solgenomics.wur.nl

Page 33: The tomato genome re-seq project - University of Florida - Flinkers.pdf · Ignores differences GC-AT ratio ... PowerPoint-presentatie Author: Martin Brinkman Created Date: 3/4/2013

Acknowledgments

Data production ● Elio Schijlen ● Bas te Lintel Hekkert

Quality control

● Saulo Aflitos

Data management and assembly ● Sandra Smit ● Jan van Haarst ● Henri van de Geest ● Lars Smits

Project management

● Sander Peters ● Richard Finkers ● Andries Koops

● Huanwen Zhu ● Minling Xiao ● Tao Ma ● Xiaoli Wang

● Jiumeng Min ● Jie Chen ● Xiaoli Wang

● Jianbo Jian ● Yadan Luo ● Li Liao ● Tina(Na) Xu