classic and next -gen sequencing...
TRANSCRIPT
Classic and NextClassic and Next--Gen Gen
Sequencing TechnologiesSequencing Technologies
6th July 2009
David Miller
Australian Genome Research Facility
Overview of PresentationOverview of Presentation
MilestonesMilestones1953 Discovery of the structure of the DNA double helix.
1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.
1975 The first complete DNA genome to be sequenced is that of bacteriophage φX174
1977 Allan Maxam and Walter Gilbert publish "DNA sequencing by chemical degradation".[3] Frederick Sanger, independently, publishes "DNA sequencing by enzymatic synthesis".
1980 Frederick Sanger and Walter Gilbert receive the Nobel Prize in Chemistry
1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.
1986 Leroy E. Hood's laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine.
1987 Applied Biosystems markets first automated sequencing machine, the model
ABI 370.
1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae (at 75 cents (US)/base).
1995 Richard Mathies et al. publish dye-based sequencing.
1998 Phil Green and Brent Ewing of the University of Washington publish “phred” for sequencer data analysis.
History of DNA SequencingHistory of DNA Sequencing
Frederick SangerOnly person to receive two Noble prizes in Chemistry.
One of only four people ever to receive two Nobel prizes.
Only living person to receive two Nobel prizes.
• Brief Overview
• 15 year project began in 1990
• US $3 Billion dollar project
• 2000 first draft genome was announced
• 2003 complete genome announced
• 2006 final chromosome published in nature
Human Genome ProjectHuman Genome Project
Human Genome ProjectHuman Genome Project
Project Goals• identify all the approximately 20,000-25,000 genes in human
DNA,
• determine the sequences of the 3 billion chemical base pairs that make up human DNA,
• store this information in databases,
• improve tools for data analysis,
• transfer related technologies to the private sector, and
• address the ethical, legal, and social issues (ELSI) that may arise from the project.
Human Genome Project Human Genome Project Goals and Completion Dates
Area HGP Goal Standard Achieved Date Achieved
Genetic Map 2- to 5-cM resolution map (600 – 1,500 markers)
1-cM resolution map (3,000 markers)
September 1994
Physical Map 30,000 STSs 52,000 STSs October 1998
DNA Sequence 95% of gene-containing part of human sequence finished
to 99.99% accuracy
99% of gene-containing part of human sequence finished to
99.99% accuracy
April 2003
Capacity and Cost of Finished Sequence
Sequence 500 Mb/year at < $0.25 per finished base
Sequence >1,400Mb/year at <$0.09 per finished
base
November 2002
Human Sequence Variation 100,000 mapped human SNPs
3.7 million mapped human SNPs February 2003
Gene Identification Full-length human cDNAs 15,000 full-length human cDNAs March 2003
Model Organisms Complete genome sequences of
E. coli, S. cerevisiae, C. elegans,D. melanogaster
Finished genome sequences of E. coli,S. cerevisiae, C. elegans,D. melanogaster, plus whole-
genome drafts of several others, including C. briggsae, D.
pseudoobscura, mouse and rat
April 2003
Shotgun Sequencing ApproachShotgun Sequencing Approach
Classic Sequencing Classic Sequencing
TechnologiesTechnologies
1977
Allan Maxam and Walter Gilbert publish "DNA sequencing by chemical degradation".[3] Frederick Sanger, independently, publishes "DNA sequencing by enzymatic synthesis".
Both methods generate fragment pools which are resolved via electrophoresis with a resolution of 1 BP.
1986 4 Reactions to 1 Lane1986 4 Reactions to 1 Lane
Sequencing Reaction Products Progression of Sequencing Reaction
Classic Sequencing Classic Sequencing
TechnologiesTechnologies
ABI377ABI377
Capillary Based ElectrophoresisCapillary Based Electrophoresis
Raw DataRaw Data
ElectropherogramsElectropherograms
ABI3700ABI3700
MegaBACEMegaBACE 10001000
ABI3730xlABI3730xl
Sanger SequencingSanger Sequencing
Maximum yield / days <3,000,000bp <0.1% of the human genome 1000 days of sequencing for a 1 fold coverage.....
NextNext--Gen Sequencing Gen Sequencing
TechnologiesTechnologies
NextNext--Gen Sequencing Gen Sequencing
TechnologiesTechnologies
• Three platforms, three technologies
• All massively parallel sequencing
• Read lengths vary from ~36bp to >400bp
• Read numbers vary from ~ 1 Million to >150 Million per run
0.00E+00
5.00E+05
1.00E+06
1.50E+06
2.00E+06
2.50E+06
3.00E+06
3.50E+06
4.00E+06
4.50E+06
3730- 96
3730- 384
GS-20
GS-FLX
GS-FLXTI
GAII1x 18
GAII1x 26
GAII1x 35
GAII2x 35
GAIIx2x 35
GAII2x 50
GAIIx2x 50
GAII2x 75
GAIIx2x 75
NextNext--Gen Sequencing Gen Sequencing
TechnologiesTechnologies
Instrument Run Type
Raw
Data
Sto
rage (
Mb)
NextNext--Gen Sequencing Gen Sequencing
TechnologiesTechnologies
Roche GS-FLX Illumina GAII ABI SOLiDRoche GS-FLX
Roche GSRoche GS--FlxFlx
WorkflowWorkflow
• Library Preparation
• emPCR Setup
• emPCR Amplification
• Pyrosequencing
• Data Analysis
• Sample Fragmentation
Sample Fragmentation and Sample Fragmentation and
Library GenerationLibrary Generation
emPCRemPCREmulsion PCR is a method of clonal amplification which allows
for millions of unique PCRs to be performed at once through the generation of micro-reactors.
emPCR
The Water-in-Oil-Emulsion
PyrosequencingPyrosequencing
Data AnalysisData Analysis
Raw Image Files
Platform UpdatesPlatform Updates
Maximum yield / days 1,000,000,000bp 30% of the human genome ~3 days of sequencing for a 1 fold coverage.....
IlluminaIllumina GAIIGAII
IlluminaIllumina Library PreparationLibrary Preparation
Stylised graphic of the Paired-End library preparation taken from the Paired-End Protocol
DNA(0.1-1.0 ug)
Sample preparation Cluster growth
5’
5’3’
G
T
C
A
G
T
C
A
G
T
C
A
C
A
G
TC
A
T
C
A
C
C
TAG
CG
TA
GT
1 2 3 7 8 94 5 6
Image acquisition Base calling
T G C T A C G A T …
Sequencing
IlluminaIllumina Sequencing TechnologySequencing TechnologyRobust Reversible Terminator Chemistry Foundation
IlluminaIllumina GAII Development GAII Development
2009 Roadmap2009 Roadmap
SCS 2.4 and Pipeline 1.4
• Two recent software updates from Illumina
• Huge impact on data yields by not only increasing the number of clusters detectable, but also the number of clusters that pass quality filtering
• Originally aimed for ~50 Million reads / run
• Now aiming for ~150 Million reads / run
Platform UpdatesPlatform Updates
Maximum yield / days 2,000,000,000bp 60% of the human genome ~1.5 days of sequencing for a 1 fold coverage.....
ABI ABI SOLiDSOLiD
ABI ABI ColorSpaceColorSpace
ABI ABI SOLiDSOLiD
emPCRemPCR and Enrichmentand Enrichment
Bead DepositionBead Deposition
3’ Modification allows covalent bonding to the slide surface
Sequencing by LigationSequencing by Ligation
Base InterrogationsBase Interrogations
Genome Web
Summary
• Both SOLiD and GS-FLX use emPCR
• Both GS-FLX and GAII sequence by synthesis
• GAII uses “cluster generation” similar to the polony approach
• SOLiD sequencing by ligation
• GS-FLX provides ~100x decrease in costs compared to Sanger Sequencing
• GAII and SOLiD ~10x decrease in costs over GS-FLX (though this is probably increasing given the huge increase in output)
ApplicationsApplications
Targeted Enrichment
• Sequencing costs have shifted significantly from sequencing to up-front sample preparation, libraries, amplification…
• Recent applications from Roche Nimblegen and Agilent Technologies have allowed for sequence capture through the use of microarrays
• Resulted in two products, Nimblgen Sequence Capture Arrays and Agilent SureSelect
SureSelect In-Solution Capture
Nimblegen Sequence Capture Arrays
PETPET--SEQSEQ
PET-SEQ
P1 Tag1
Mate-Pair
P2Tag2InternalAdapter
DeletionsInsertions
Tag Reads
Reference
Inversions
Tag Reads
Reference
NextNext--NextNext--Gen Sequencing Gen Sequencing
TechnologiesTechnologies
Websites of NoteWebsites of Note
http://www.illumina.com/pages.ilmn?ID=203
http://www3.appliedbiosystems.com/AB_Home/applicationstechnologies/SOLiDSystemSequencing/index.htm
http://www.454.com/index.asp
http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
http://www.helicosbio.com/
http://www.pacificbiosciences.com/
http://www.nanoporetech.com/sequences/
THANKSTHANKS
6th July 2009
David Miller
Australian Genome Research Facility