solanum lycopersicum chromosome 4 sequencing update uk-sol– dec 2008 wellcome trust medical...
TRANSCRIPT
Solanum lycopersicum Chromosome 4
Sequencing Update
UK-SOL– Dec 2008
Wellcome Trust Medical Photographic Library
Summary of Project from WTSI
Transfer of BAC selection/contig building QC-checking has moved to Imperial College London
Call for your help in annotation
Overview
Clone by Clone Sequencing Strategy
Subcloning & Shotgun sequencing
Overlapping Clones anchored
by mapped markers
Minimal tiling path
“Finishing”Order contigsGap closureSequence Quality
Contiguous Sequence
< 1 error in 10,000
……..TAGCTGTGTACGATGATC……….
Mapped Markers
Computer assembly – paired plasmid reads
BAC Library
•Sequences Uploadedto SGN •BAC Registry Updated•Ready for Annotation
•Plasmid Prep•Sequencing & Processing
•Clone DNA Prep•Digest Confirmation•Library Construction (plasmid)
•Clone Selection and Verification•Clones entered into pipeline
Overview of Clone Pipeline
ShotgunSequencing
Finished SequenceFinal EMBL submission“Complete Sequence”
HTGS Phase 3
Mapping
Subcloning
Sequence Contigs >2Kb available on Sanger FTP site and Public Databases“Sequencing in Progress”
BACs assigned to chr4sequencing project
on SGN BAC registry
•Sequence Improvement•Contig Orientation and Gap Closure •Confirmation of Assemby (QC)
Finishing
HTGS Phase 1
HTGS Phase 2
BAC Library & Map Resources
Library No. of clones
Average Insert
Genome equivalents Fingerprints End
Sequenced ?
LE_HBa 129,024 117 kb 15 X 10x (88,000 AGI)
Yes(188,130)
SL_MboI 52,992 135 kb 7 X 5x (43,000 WTSI)
Yes(112,507)
SL_EcoRI 72,264 95-100 kb 7 X - Yes(101,375)
Tomato EXPEN-2000 map
- 2585 mapped markers across genome- 242 Chr4 mapped markers - Overgo analysis at Cornell
BAC Libraries
FPC Map Construction1st FPC map build of HindIII Library by Arizona Genomics Institute2nd FPC map build incorporated MboI Library mid 2006 at WTSI
Fosmid Library End Sequencing
150 plates (1-150) End Sequenced at WTSI December 2007Approx. ~57600 fosmids (115200 FES)
Fosmid Analysis (January 2008)
107681 reads with total bp count of 70900576 giving average length = 658.4bp (after quality and vector clipping) (60.3% bases repeat masked)
Hits within existing chromosome 4 BACs:• 380 fosmids with good read pair alignments (within expected
size range)• 31 fosmids with bad read pair alignments
Hits to single end• 48 single ends - from fosmids with only 1 end sequenced• 1027 single ends - from fosmids where only 1 end is found
Selection of Minimum Tile Path
Fingerprinted BACs
Markers Overlapsidentifiedby fpcand BESalignment
Seed BACAnchoredby marker(Cornell)
Framework Markers in FPC
Verify overlapsby colony pcr
Anchor further BACs by hybridisation to marker sequences and FISH
Increasing Map Coverage using PseudoGoldenPath (PGP) Analysis
MAP GAP
Bridging clones identified from BES alignments to sequence
Sequenced clones
FISH Map for Chr 4 on SGN
FISH is used:
• to confirm BAC assignment to chr 4
• to confirm contig order along chr 4
Steve Stack
Dora Szinay, Hans DeJong
FISH Map for Chr 4 on SGN
FISH is used:
• to confirm BAC assignment to chr 4
• to confirm contig order along chr 4
WTSI Tomato Clone Pipeline 2006-2008
Number of BACS
Pipeline Stage Dec 2006 Dec 2007 Dec 2008
Subcloning 35 17
Shotgun 14 3
Assembly Start 1 8
Auto-prefinishing 16 8
Finishing 9 108
QC Checking 1 2
Finished 18 86 174
Total 94 134 182
Phase 3
Phase 1
Phase 2
HTGS:
Chr 4 Map and Sequence Update
Chromosome 4 estimate : 19 Mb of euchromatin
80 contigs with sequenceDecember2006
December2007
December2008
Total sequence 5,007,106 bp 12,590,598 bp 19,018,752 bp
Unique sequence 4,860,935 bp 11,789,635 bp 18,778,752 bp
Total Finished Length 1,963,352 bp 9,211,278 bp 18,056,067 bp
Distribution of Contigs
Centromere
= Euchromatin
= Heterochromatin
{62 markers} {41 markers} {124 markers}
29 contigs 11 contigs 36 contigs
55 BACs(27 markers)
59 BACs(16 markers)
63 BACs(61 markers)
Average Contig Length = 250KbAverage BACs/Contig = 2.3Largest Contigs = ~450-500Kb
227 markers mapped to Chr_4
Unordered:4 contigs(5 BACs)
Some facts and figures
~81 contigs (80 contigs with sequence available).
Average contig length is just under 250 kb.
The average number of BACs per contig is 2.3.
The largest sequence contigs are in the range of 450kb-500kb with 5 or 6 BACs.
Summary of Progress on Chromosome 4
81 map contigs have been built
119 BACs/44 contigs definitely on chr4 in FISH/ IL mapped
57 BACs under confirmation of Chr4 location (28 on SGN 29 to be placed after confirmed location)
~60 Markers for which BACs have not been identified.
~13 BACs have been sequenced to HTGS3 and placed on chr0, definitely not on chr4 (others initiated, in same contig etc but stopped in pipeline).
22 Missing markers missing sequence?
Summary of what we will do next
1) Confirm chr4 location of BACs that lack chr4 marker sequence and or have conflicting map location. IL mapping.
2) Use missing marker sequences to identify further BACs (3D pools) and confirm chr4 location using IL mapping.
3) Use 3D BAC pools to identify BACs to extend current contigs.
4) Analyse output from GS-FLX and GA-Illumina sequencing runs on cDNA from chr4 IL and parental lines to identify SNPs and further chr4 markers.
5) Use any markers from (4) to isolate further BACs for sequencing.
Analysis of cDNA sequence to identify Chr4 specific sequences
454 (GS-FLX) Illumina (GA)Chr4 IL lines 22.11 MB 0.45 GBLA716 pennellii 17.25 MB 1.9 GBChr4 sub line 8.47 MB 1.5 GBHeinz cDNA 4.02 MB 1 GB 51.85 MBIL4-1 0.5 GBIL4-2 0.6 GBIL4-3 0.19 GBIL4-4 0.44 GB
6.9 GB
Call for your help
Need your help in checking and verifying the automated annotation.
Please respond to e-mails in 2009 calling for help in annotation your favourite genes.
Acknowledgements
Wellcome Trust Sanger Institute:Carol ChurcherJane RogersSean HumphrayClare Riddle and Mapping Core GroupKaren McLaren and Finishing Team 46Stuart McLaren and Pre-finishing Team 58Christine Lloyd and QC Team 57Karen OliverMatt JonesCarol Scott
Imperial College London:Gerard BishopDaniel BuchanJames AbbottSarah ButcherRosa Lopez-Cobollo
University of Nottingham:Graham Seymour
Scottish Crop Research Institute:Glenn Bryan
Cornell University: Lukas MuellerJim Giovannoni
MIPS/IBI Institute for Bioinformatics:Klaus MayerRemy Bruggmann
FISH ResourcesStephen Stack Group (Colorado)Hans de Jong (Wageningen)Dora Szinay (Wageningen)
FUNDING