detecting single molecules and sequencing dna
TRANSCRIPT
Detecting single molecules and sequencing DNARohan T. Ranasinghe
University Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW
Locations and timeline
1 mile
Locations and timeline
1 mile
http://www.cambridge2000.com
Old CavendishLaboratory
1953: Discovery of the structure of DNA
Locations and timeline
1 mile
http://www.cambridge2000.com
Old CavendishLaboratory
1953: Discovery of the structure of DNA
LMB1977: Sanger method for sequencing invented
http://www2.mrc-lmb.cam.ac.uk
Locations and timeline
1 mile
http://www.cambridge2000.com
Old CavendishLaboratory
1953: Discovery of the structure of DNA
Sanger Institute1993: Work on Human genome project at the Sanger starts
Genome Research Ltd.
LMB1977: Sanger method for sequencing invented
http://www2.mrc-lmb.cam.ac.uk
Locations and timeline
1 mile
http://www.cambridge2000.com
Old CavendishLaboratory
1953: Discovery of the structure of DNA
Chemistry department
http://www.flickr.com/photos/shai-bl/5584629687/sizes/m/in/photostream/
1997: Work on Solexa method for sequencing started
Sanger Institute1993: Work on Human genome project at the Sanger starts
Genome Research Ltd.
LMB1977: Sanger method for sequencing invented
http://www2.mrc-lmb.cam.ac.uk
Structure of DNA
http://www.themicrobiologist.comSolved in Cambridge in 1953 by James Watson and Francis Crick using data collected by Rosalind Franklin and Maurice Wilkins at King’s College London
The key to the structure was base pairing
Structure of DNA
http://www.themicrobiologist.comSolved in Cambridge in 1953 by James Watson and Francis Crick using data collected by Rosalind Franklin and Maurice Wilkins at King’s College London
The key to the structure was base pairing
Structure of DNA
http://www.flickr.com/photos/grahams__flickr/504365411/sizes/l/in/photostream/
Solved in Cambridge in 1953 by James Watson and Francis Crick using data collected by Rosalind Franklin and Maurice Wilkins at King’s College London
The key to the structure was base pairing
Structure of DNA
http://www.flickr.com/photos/major_clanger/5881631482/sizes/o/in/photostream/
http://www.flickr.com/photos/grahams__flickr/504365411/sizes/l/in/photostream/
Solved in Cambridge in 1953 by James Watson and Francis Crick using data collected by Rosalind Franklin and Maurice Wilkins at King’s College London
The key to the structure was base pairing
Structure of DNA
http://www.flickr.com/photos/major_clanger/5881631482/sizes/o/in/photostream/
http://www.flickr.com/photos/grahams__flickr/504365411/sizes/l/in/photostream/
Solved in Cambridge in 1953 by James Watson and Francis Crick using data collected by Rosalind Franklin and Maurice Wilkins at King’s College London
The key to the structure was base pairing
The fidelity of the Watson-Crick base pairs and the double helix structure are the cornerstones of DNA sequencing and modern forensic science
DNA Sequencing
Why would you want to sequence DNA?http://www.sikeston.k12.mo.us
DNA Sequencing
Why would you want to sequence DNA?http://www.sikeston.k12.mo.us
© Invitrogen
DNA Sequencing
Why would you want to sequence DNA?
A genome contains the information required to build an organism
http://www.sikeston.k12.mo.us
© Invitrogen
DNA Sequencing
Why would you want to sequence DNA?
A genome contains the information required to build an organism
http://www.sikeston.k12.mo.us
It’s a long book...
© InvitrogenWikipedia
DNA Sequencing
Why would you want to sequence DNA?
A genome contains the information required to build an organism
http://www.sikeston.k12.mo.us
It’s a long book...
© Invitrogen
~3,000,000,000 (3 ×109) letters in each of the ~1014 cells in a human
Wikipedia
DNA Sequencing
Why would you want to sequence DNA?
A genome contains the information required to build an organism
http://www.sikeston.k12.mo.us
It’s a long book...
© Invitrogen
~3,000,000,000 (3 ×109) letters in each of the ~1014 cells in a human
Distance between base pairs = 0.34 nm (0.34 ×10-9 m)
Wikipedia
DNA Sequencing
Why would you want to sequence DNA?
A genome contains the information required to build an organism
http://www.sikeston.k12.mo.us
It’s a long book...
© Invitrogen
~3,000,000,000 (3 ×109) letters in each of the ~1014 cells in a human
The DNA in one of your cells would be 2 m long in the B-form structure
Distance between base pairs = 0.34 nm (0.34 ×10-9 m)
Wikipedia
T
Sanger sequencing
CAGTCAGTCA
GA
C
G
TA
C
GTA
CG
T
AC
Based on copying of DNA:Genome Research Ltd.
Sanger sequencing
CAGTCAGTCA
GA
C
G
A
CTA
G
T
C
Based on copying of DNA:Genome Research Ltd.
Sanger sequencing
CAGTCAGTCA
GA
C
G
T
A
CTA
G
T
C
Based on copying of DNA:Genome Research Ltd.
Sanger sequencing
CAGTCAGTCA
GA
C
G
T
A
CTA
CG
T
C
Based on copying of DNA:Genome Research Ltd.
Sanger sequencing
CAGTCAGTCA
GA
C
G
T
A
CTA
CG
T
A
C
Based on copying of DNA:Genome Research Ltd.
Sanger sequencing
CAGTCAGTCA
GA
C
G
T
A
C
G
TA
CG
T
A
C
Based on copying of DNA:Genome Research Ltd.
T
Sanger sequencing
CAGTCAGTCA
GA
C
G
T
A
C
G
TA
CG
T
A
C
Based on copying of DNA:Genome Research Ltd.
Incorporation of fluorescent nucleotide terminates the copying process
T
Sanger sequencing
CAGTCAGTCA
GA
C
G
T
A
C
G
TA
CG
T
A
C
Based on copying of DNA: Repeat ~1030 timesGenome Research Ltd.
T
Sanger sequencing
CAGTCAGTCA
GA
C
G
T
A
C
G
TA
CG
T
A
C
Based on copying of DNA: Repeat ~1030 timesGenome Research Ltd.
Sanger sequencing
Copied sequence
G
C
T
A
C
G
A
T
G
C
T
A
C
G
A
T
G
C
T
A
Original sequence
Repeat 3 × 108 times to read genome(would take another 190 years at this speed!*)
*Note: original animation took ~20 seconds)
The human genome project
http://www.c-spanvideo.org/program/157909-1
Started: 1989 (in the USA)
The human genome project
First draft completed: 2000‘Finished’: 2003
http://www.c-spanvideo.org/program/157909-1
Started: 1989 (in the USA)
The human genome project
First draft completed: 2000‘Finished’: 2003
http://www.c-spanvideo.org/program/157909-1
Started: 1989 (in the USA)
The human genome project
Cost: $3,000,000,000
First draft completed: 2000‘Finished’: 2003
http://www.c-spanvideo.org/program/157909-1
Started: 1989 (in the USA)
The human genome project
Cost: $3,000,000,000
First draft completed: 2000‘Finished’: 2003
http://www.flickr.com/photos/93425126@N00/4394834217/in/set-72157623515077498/
http://www.c-spanvideo.org/program/157909-1
Started: 1989 (in the USA)
The human genome project
Cost: $3,000,000,000
First draft completed: 2000‘Finished’: 2003
http://www.flickr.com/photos/93425126@N00/4394834217/in/set-72157623515077498/
http://www.c-spanvideo.org/program/157909-1
Started: 1989 (in the USA)
UK effort on the Human Genome Project largely carried out in this building in the Sanger Centre
The human genome project
Cost: $3,000,000,000
First draft completed: 2000‘Finished’: 2003
http://www.flickr.com/photos/93425126@N00/4394834217/in/set-72157623515077498/
http://www.c-spanvideo.org/program/157909-1
Started: 1989 (in the USA)
UK effort on the Human Genome Project largely carried out in this building in the Sanger Centre
9 Chromosomes were sequenced here (about a third of the genome)
What does it mean to detect a single molecule?
Looking for a needle in a haystack?
What does it mean to detect a single molecule?
Looking for a needle in a haystack?
How many blades of grass on a football pitch?
What does it mean to detect a single molecule?
Looking for a needle in a haystack?
About 200,000,000 or 2×108
How many blades of grass on a football pitch?
What does it mean to detect a single molecule?
How many molecules in a vial of water?
Looking for a needle in a haystack?
About 200,000,000 or 2×108
How many blades of grass on a football pitch?
What does it mean to detect a single molecule?
18 mL (1 mole) of water contains Avogadro’s number of molecules: 6.02 ×1023
How many molecules in a vial of water?
Looking for a needle in a haystack?
About 200,000,000 or 2×108
How many blades of grass on a football pitch?
What does it mean to detect a single molecule?
18 mL (1 mole) of water contains Avogadro’s number of molecules: 6.02 ×1023
How many molecules in a vial of water?
Looking for a needle in a haystack?
About 200,000,000 or 2×108
How many blades of grass on a football pitch?
So 1 mole of grass blades would cover 6.02 ×1023 ÷ 2×108 = 3 ×1015 football pitches
What does it mean to detect a single molecule?
18 mL (1 mole) of water contains Avogadro’s number of molecules: 6.02 ×1023
How many molecules in a vial of water?
Looking for a needle in a haystack?
About 200,000,000 or 2×108
How many blades of grass on a football pitch?
So 1 mole of grass blades would cover 6.02 ×1023 ÷ 2×108 = 3 ×1015 football pitches
That’s a lot of haystacks...
What does it mean to detect a single molecule?
1 mole of grass blades = 3×1015 football pitches = 15×1012 km2
What does it mean to detect a single molecule?
1 mole of grass blades = 3×1015 football pitches = 15×1012 km2
Surface area of Earth = 5×108 km2
(1011 football pitches!)
What does it mean to detect a single molecule?
1 mole of grass blades = 3×1015 football pitches = 15×1012 km2
Surface area of Jupiter = 6×1010 km2
*Lab demonstration: 180 µL(15×1010 km2 of grass blades)
What does it mean to detect a single molecule?
1 mole of grass blades = 3×1015 football pitches = 15×1012 km2
Surface area of the Sun = 6×1012 km2
What does it mean to detect a single molecule?
1 mole of grass blades = 3×1015 football pitches = 15×1012 km2
Surface area of the Sun = 6×1012 km2
1 mole of grass blades would cover the surface area of about 2.5 Suns!
What does it mean to detect a single molecule?
1 mole of grass blades = 3×1015 football pitches = 15×1012 km2
Surface area of the Sun = 6×1012 km2
1 mole of grass blades would cover the surface area of about 2.5 Suns!
All images: nasa.gov
Sanger sequencing
Sanger sequencing uses about 2×1010 molecules per 100 letters
Solexa sequencing
Invented in 1997 in this department
Developed by a spin-out company in Saffron Walden
Sold for $650,000,000 in 2006
Solexa sequencing
Solexa sequencing uses about 103 molecules to read 100 letters
About as many blades of grass as on the penalty spot
Imaging technology: lab demonstration
http://thesportboys.wordpress.com/category/international/page/2/
Invented in 1997 in this department
Developed by a spin-out company in Saffron Walden
Sold for $650,000,000 in 2006
Solexa sequencing
CAGTCAGTCA
GCATATGTTC
AACGTGCTTG
CGTATA
TTGCA
GTCAG
Solexa sequencing
© Royal Society of Chemistry publishing
CAGTCAGTCA
GCATATGTTC
AACGTGCTTG
CGTATA
TTGCA
GTCAG
CAGTCAGTCA
GCATATCTTC
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
Densely packed microscopic “islands” of DNA generate information very quickly
Solexa sequencing
© Royal Society of Chemistry publishing
CAGTCAGTCA
GCATATGTTC
AACGTGCTTG
CGTATA
TTGCA
GTCAG
CAGTCAGTCA
GCATATCTTC
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
CAGTCAGTCA
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
GCATATGTTC
• “Recycled” template molecules ready for a incorporation of the next fluorescent letter
• Possible to read about 100 letters from each DNA strand, rather than 1
Solexa sequencing
© Royal Society of Chemistry publishing Genome Research Ltd.
CAGTCAGTCA
GCATATGTTC
AACGTGCTTG
CGTATA
TTGCA
GTCAG
CAGTCAGTCA
GCATATCTTC
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
CAGTCAGTCA
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
GCATATGTTC
Solexa sequencing
© Royal Society of Chemistry publishing Genome Research Ltd.
CAGTCAGTCA
GCATATGTTC
AACGTGCTTG
CGTATA
TTGCA
GTCAG
CAGTCAGTCA
GCATATCTTC
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
CAGTCAGTCA
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
GCATATGTTC
Cost to sequence a human genome: around $10,000Time to sequence a human genome: less than a weekFirst African, Asian and giant panda genomes sequencedSanger Institute owns 37 instruments
Solexa sequencing
© Royal Society of Chemistry publishing Genome Research Ltd.
CAGTCAGTCA
GCATATGTTC
AACGTGCTTG
CGTATA
TTGCA
GTCAG
CAGTCAGTCA
GCATATCTTC
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
CAGTCAGTCA
AACGTGCTTG
CGTATAG
TTGCAC
GTCAGT
GCATATGTTC
Cost to sequence a human genome: around $10,000Time to sequence a human genome: less than a weekFirst African, Asian and giant panda genomes sequencedSanger Institute owns 37 instruments
Summary
The structure of DNA, discovered in 1953 has been crucial to sequencing the human genome
The first human genome was sequenced using Fred Sanger’s method, invented in 1977. The project ran for 14 years, costing $3 billion
New methods for sequencing use single molecule detection to dramatically accelerate the decoding process
One approach using single molecule techniques, invented by Shankar Balasubramanian and David Klenerman in our department in 1997 is now widely used for sequencing worldwide
The cost of sequencing has fallen to $10,000 and takes less than a week