solanum lycopersicum chromosome 4 sequencing update uk-sol– dec 2008 wellcome trust medical...

19
Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Upload: brianne-thomas

Post on 03-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Solanum lycopersicum Chromosome 4

Sequencing Update

UK-SOL– Dec 2008

Wellcome Trust Medical Photographic Library

Page 2: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Summary of Project from WTSI

Transfer of BAC selection/contig building QC-checking has moved to Imperial College London

Call for your help in annotation

Overview

Page 3: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Clone by Clone Sequencing Strategy

Subcloning & Shotgun sequencing

Overlapping Clones anchored

by mapped markers

Minimal tiling path

“Finishing”Order contigsGap closureSequence Quality

Contiguous Sequence

< 1 error in 10,000

……..TAGCTGTGTACGATGATC……….

Mapped Markers

Computer assembly – paired plasmid reads

BAC Library

Page 4: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

•Sequences Uploadedto SGN •BAC Registry Updated•Ready for Annotation

•Plasmid Prep•Sequencing & Processing

•Clone DNA Prep•Digest Confirmation•Library Construction (plasmid)

•Clone Selection and Verification•Clones entered into pipeline

Overview of Clone Pipeline

ShotgunSequencing

Finished SequenceFinal EMBL submission“Complete Sequence”

HTGS Phase 3

Mapping

Subcloning

Sequence Contigs >2Kb available on Sanger FTP site and Public Databases“Sequencing in Progress”

BACs assigned to chr4sequencing project

on SGN BAC registry

•Sequence Improvement•Contig Orientation and Gap Closure •Confirmation of Assemby (QC)

Finishing

HTGS Phase 1

HTGS Phase 2

Page 5: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

BAC Library & Map Resources

Library No. of clones

Average Insert

Genome equivalents Fingerprints End

Sequenced ?

LE_HBa 129,024 117 kb 15 X 10x (88,000 AGI)

Yes(188,130)

SL_MboI 52,992 135 kb 7 X 5x (43,000 WTSI)

Yes(112,507)

SL_EcoRI 72,264 95-100 kb 7 X - Yes(101,375)

Tomato EXPEN-2000 map

- 2585 mapped markers across genome- 242 Chr4 mapped markers - Overgo analysis at Cornell

BAC Libraries

FPC Map Construction1st FPC map build of HindIII Library by Arizona Genomics Institute2nd FPC map build incorporated MboI Library mid 2006 at WTSI

Page 6: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Fosmid Library End Sequencing

150 plates (1-150) End Sequenced at WTSI December 2007Approx. ~57600 fosmids (115200 FES)

Fosmid Analysis (January 2008)

107681 reads with total bp count of 70900576 giving average length = 658.4bp (after quality and vector clipping) (60.3% bases repeat masked)

Hits within existing chromosome 4 BACs:• 380 fosmids with good read pair alignments (within expected

size range)• 31 fosmids with bad read pair alignments

Hits to single end• 48 single ends - from fosmids with only 1 end sequenced• 1027 single ends - from fosmids where only 1 end is found

Page 7: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Selection of Minimum Tile Path

Fingerprinted BACs

Markers Overlapsidentifiedby fpcand BESalignment

Seed BACAnchoredby marker(Cornell)

Framework Markers in FPC

Verify overlapsby colony pcr

Anchor further BACs by hybridisation to marker sequences and FISH

Page 8: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Increasing Map Coverage using PseudoGoldenPath (PGP) Analysis

MAP GAP

Bridging clones identified from BES alignments to sequence

Sequenced clones

Page 9: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

FISH Map for Chr 4 on SGN

FISH is used:

• to confirm BAC assignment to chr 4

• to confirm contig order along chr 4

Steve Stack

Dora Szinay, Hans DeJong

Page 10: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

FISH Map for Chr 4 on SGN

FISH is used:

• to confirm BAC assignment to chr 4

• to confirm contig order along chr 4

Page 11: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

WTSI Tomato Clone Pipeline 2006-2008

Number of BACS

Pipeline Stage Dec 2006 Dec 2007 Dec 2008

Subcloning 35 17

Shotgun 14 3

Assembly Start 1 8

Auto-prefinishing 16 8

Finishing 9 108

QC Checking 1 2

Finished 18 86 174

Total 94 134 182

Phase 3

Phase 1

Phase 2

HTGS:

Page 12: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Chr 4 Map and Sequence Update

Chromosome 4 estimate : 19 Mb of euchromatin

80 contigs with sequenceDecember2006

December2007

December2008

Total sequence 5,007,106 bp 12,590,598 bp 19,018,752 bp

Unique sequence 4,860,935 bp 11,789,635 bp 18,778,752 bp

Total Finished Length 1,963,352 bp 9,211,278 bp 18,056,067 bp

Page 13: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Distribution of Contigs

Centromere

= Euchromatin

= Heterochromatin

{62 markers} {41 markers} {124 markers}

29 contigs 11 contigs 36 contigs

55 BACs(27 markers)

59 BACs(16 markers)

63 BACs(61 markers)

Average Contig Length = 250KbAverage BACs/Contig = 2.3Largest Contigs = ~450-500Kb

227 markers mapped to Chr_4

Unordered:4 contigs(5 BACs)

Page 14: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Some facts and figures

~81 contigs (80 contigs with sequence available).

Average contig length is just under 250 kb.

The average number of BACs per contig is 2.3.

The largest sequence contigs are in the range of 450kb-500kb with 5 or 6 BACs.

Page 15: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Summary of Progress on Chromosome 4

81 map contigs have been built

119 BACs/44 contigs definitely on chr4 in FISH/ IL mapped

57 BACs under confirmation of Chr4 location (28 on SGN 29 to be placed after confirmed location)

~60 Markers for which BACs have not been identified.

~13 BACs have been sequenced to HTGS3 and placed on chr0, definitely not on chr4 (others initiated, in same contig etc but stopped in pipeline).

22 Missing markers missing sequence?

Page 16: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Summary of what we will do next

1) Confirm chr4 location of BACs that lack chr4 marker sequence and or have conflicting map location. IL mapping.

2) Use missing marker sequences to identify further BACs (3D pools) and confirm chr4 location using IL mapping.

3) Use 3D BAC pools to identify BACs to extend current contigs.

4) Analyse output from GS-FLX and GA-Illumina sequencing runs on cDNA from chr4 IL and parental lines to identify SNPs and further chr4 markers.

5) Use any markers from (4) to isolate further BACs for sequencing.

Page 17: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Analysis of cDNA sequence to identify Chr4 specific sequences

454 (GS-FLX) Illumina (GA)Chr4 IL lines 22.11 MB 0.45 GBLA716 pennellii 17.25 MB 1.9 GBChr4 sub line 8.47 MB 1.5 GBHeinz cDNA 4.02 MB 1 GB 51.85 MBIL4-1 0.5 GBIL4-2   0.6 GBIL4-3 0.19 GBIL4-4 0.44 GB

6.9 GB

Page 18: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Call for your help

Need your help in checking and verifying the automated annotation.

Please respond to e-mails in 2009 calling for help in annotation your favourite genes.

Page 19: Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

Acknowledgements

Wellcome Trust Sanger Institute:Carol ChurcherJane RogersSean HumphrayClare Riddle and Mapping Core GroupKaren McLaren and Finishing Team 46Stuart McLaren and Pre-finishing Team 58Christine Lloyd and QC Team 57Karen OliverMatt JonesCarol Scott

Imperial College London:Gerard BishopDaniel BuchanJames AbbottSarah ButcherRosa Lopez-Cobollo

University of Nottingham:Graham Seymour

Scottish Crop Research Institute:Glenn Bryan

Cornell University: Lukas MuellerJim Giovannoni

MIPS/IBI Institute for Bioinformatics:Klaus MayerRemy Bruggmann

FISH ResourcesStephen Stack Group (Colorado)Hans de Jong (Wageningen)Dora Szinay (Wageningen)

FUNDING