next generation dna sequencing: why should we care€¦ · next generation dna sequencing: why...

57
Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class, Nov 23, 2010

Upload: others

Post on 18-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Next Generation DNA sequencing:

why should we careShoudan Liang

(Keith Baggerly & Bradley Broom)Microarray Analysis Class, Nov 23, 2010

Page 2: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

sequencing capacity is growing exponentially

• first human genome sequenced over ten years at $3 billion.

• 2007, Watson’s genome was sequenced in two months by 454 at $2 million.

• Last year, the cost (list price of reagent) of human genome re-sequencing using Solexa is$250,000.

• ABI SOLiD claim to be able to re-sequence at $10,000 this year.

Page 3: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

The cost of sequencing DNA has dropped by more than a million folds in the last ten years

Page 4: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,
Page 5: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

sequencing is expected to follow Moore’s-like law

• Moore’s law: computing power a dollar can buy doubles every 18 months

• rate limiting step in NEXT GEN sequencing is imaging. CCD camera in sequencer will increase in capacity following Moore’s law.

• DNA sequencing with semiconductor : merging of two technologies?

Page 6: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Pixels per dollar of Kodak digital cameras

en.wikipedia.org

Page 7: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

• a DNA sequence is a bar-code, and therefore an addressing system of a genome

• share similarities with microarray in measuring amount of DNA by genome locations

Applications in genome research

Page 8: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,
Page 9: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Digital gene expressionSolexa vs Gold Standard

Wang et al Nature. 2008 Nov 27;456(7221):470-6.

787 RefSeq human transcripts in brain and UHR

TaqMan is considered a gold standard

Page 10: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Ingolia NT, et al Science. 2009 Feb 12.

Page 11: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Diversity of The Human Genome

{15 million SNPs; 1 million short indels; 20,000 structural variants were identified from total sequence data}Amount of diversity observed per individual genome:• 250-300 loss-of-function variants• 50-100 variants implicated in inherited disorders• ~10,000 non-synonymous cSNP differences compared to a published reference genome

Whole-genome low-coverage: 179 from 4 populationsWhole-genome high-coverage: 2 Mom-Dad-Child triosExon-targeted sequencing: 697 from 7 populations

Page 12: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

technology looking for problems

• currently, USA has 600 next-generation sequencers. The rest of the world another 500 or so.

• Number of human genomes to be sequenced by the end of next year is about 30,000.

Page 13: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Modified from Shendure & Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

platformFeature generation

Cost per megabase

Cost per instrument

most common error

Read-length

Roche GS-FLX (454) Emulsion PCR $20 $500,000 Indel 400 bp

Illumina GA (Solexa) Bridge PCR $2 $430,000 Subst. 36 bp

ABI SOLiD Emulsion PCR $2 $591,000 Subst. 35 bp

HeliScope Single molecule $1 $1,350,000 Del 30 bp

Pacific Biosciences Single molecule - - Del/Subst.

long

Complete Genomics Nanoball $0.01 NA Subst. 35 bp

Page 14: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

• Sanger

• 454

• ABI SOLiD

• Illumina Solexa

• Complete Genomics

• Pacific Biosciences

next gen sequencing: hardware

Page 15: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Sanger

Page 16: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

elongation after primer

http://web.utk.edu/~khughes/SEQ

Page 17: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

enlongation stops when didexoy base is encountered

http://web.utk.edu/~khughes/SEQ

Page 18: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

producing a ladder

http://web.utk.edu/~khughes/SEQ

Page 19: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

that can be read on gel

http://web.utk.edu/~khughes/SEQ

Page 20: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

tracing of the ladder

http://web.utk.edu/~khughes/SEQ

Page 21: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Sanger Next Gen

Nature Biotech. (2008) 26: 1135

Page 22: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

two methods of single molecule PCR

a: emulsion PCR (454 & SOLiD)b: bridge PCR (Solexa)

Nature Biotech. (2008) 26: 1135

Page 23: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

base extensionligation

sequencing by synthesis

Page 24: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

454

Page 25: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Schematic representation of the progress of the enzyme reaction in solid-phase pyrosequencing

Ronaghi M. Genome Res. 2001;11:3-11

©2001 by Cold Spring Harbor Laboratory Press

Page 26: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,
Page 27: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,
Page 28: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Flowgram

Page 29: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

ABI SOLiD

Page 30: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

prepare library of single stranded DNA

Page 31: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

single molecule PCR

Page 32: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Beads on surface

Page 33: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

ABI SOLiDSequencing

Page 34: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

16 di-nucleotides probes in 4 steps

Page 35: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

ABI SOLiDcycling

15

Page 36: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Illumina Solexa

14

Page 37: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

bridge PCR

13

Page 38: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,
Page 39: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

11

• 8 lanes per run

• 200 pictures per lane

• 4X36 pictures for 36-mer

• 1/4 million pictures per 3-day run -> 0.5TB of data

Page 40: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

new HiSeq 2000

• sequence up to 100bp

• 1 billion tags per experiment

• 25Gbase per day

• reagent cost is about 10 times cheaper than the current product

Page 41: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Complete Genomics

Page 42: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

construct a circular DNA

10

Page 43: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

rolling circle amp

9

Page 44: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

high density packing

8

Page 45: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

• each DNA nano-ball is 80 bp genomic DNA (plus 4 adaptors) repeated 200 times.

• reads are equivalent to two 35-bp on paired ends of 500bp DNA.

• rolling circle amplification replaces emulsion or bridge PCR

• 1’’X3’’ silicon slide holds one billion DNA nano-balls

7

Page 46: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

• reagent cost is 1/1000 of Solexa

• demonstrated 8.8 Gb per machine run per day.

• a completed genome sequence on company’s web site

• June 2009, launch of commercial run: 200Gb per machine run lasting 8 days.

• data center: 60,000 processors and 30 petabytes storage.

6

Page 47: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

• according to Dr Drmanac, CSO of Complete Genomics, Inc

• next generation of machine will have

• $10 per genome reagent cost

• $20 per genome of instrument cost

Page 48: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Pacific Biosciences

5

Page 49: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

4

Page 50: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

3

Page 51: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Unlike Sanger sequencing, which average over many molecules, in next gen sequencing PCR errors do not average away

Page 52: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Application: 3D genome

Page 53: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,
Page 54: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

LETTERS

A three-dimensional model of the yeast genomeZhijun Duan1,2*, Mirela Andronescu3*, Kevin Schutz4, SeanMcIlwain3, Yoo Jung Kim1,2, Choli Lee3, Jay Shendure3,Stanley Fields2,3,5, C. Anthony Blau1,2,3 & William S. Noble3

Layered on top of information conveyed by DNA sequence andchromatin are higher order structures that encompass portions ofchromosomes, entire chromosomes, and even whole genomes1–3.Interphase chromosomes are not positioned randomly within thenucleus, but instead adopt preferred conformations4–7. DisparateDNA elements co-localize into functionally defined aggregates or‘factories’ for transcription8 and DNA replication9. In buddingyeast,Drosophila andmany other eukaryotes, chromosomes adopta Rabl configuration, with arms extending from centromeres adja-cent to the spindle pole body to telomeres that abut the nuclearenvelope10–12. Nonetheless, the topologies and spatial relationshipsof chromosomes remain poorly understood. Here we developed amethod to globally capture intra- and inter-chromosomal inter-actions, and applied it to generate a map at kilobase resolution ofthe haploid genome of Saccharomyces cerevisiae. The map recapi-tulates known features of genome organization, thereby validatingthe method, and identifies new features. Extensive regional andhigher order folding of individual chromosomes is observed.Chromosome XII exhibits a striking conformation that implicatesthe nucleolus as a formidable barrier to interaction between DNAsequences at either end. Inter-chromosomal contacts are anchoredby centromeres and include interactions among transfer RNAgenes, among origins of early DNA replication and among siteswhere chromosomal breakpoints occur. Finally, we constructed athree-dimensional model of the yeast genome. Our findings pro-vide a glimpse of the interface between the form and function of aeukaryotic genome.

Chromosome conformation capture (3C) and its derivatives havebeen used to detect long-range interactions within and between chro-mosomes13–20. We developed a method for identifying chromosomalinteractions genome-wide by coupling chromosome conformationcapture-on-chip (4C)14 andmassively parallel sequencing (Fig. 1 andSupplementary Methods). Because all 3C-based technologies areencumbered by low signal-to-noise ratios18,21, we established themethod’s reliability by assessing: (1) random intermolecular ligationsfrom each of five control libraries (Fig. 2a, Supplementary Tables 1and 2 and Supplementary Methods); (2) restriction site-based biases(Fig. 2b, Supplementary Figs 1 and 2 and Supplementary Table 3); (3)reproducibility between independent sets of experimental librariesthat differed in DNA concentration at the 3C step, which criticallyinfluences signal-to-noise ratios (Supplementary Table 1, Fig. 2b andc and Supplementary Fig. 2); (4) consistency between theHindIII andEcoRI libraries (Supplementary Figs 3–5 and Supplementary Tables4–8), and (5) a set of 24 chromosomal interactions using conven-tional 3C (Fig. 2d, Supplementary Fig. 6). These results show that ourmethod is reliable and robust (detailed in Supplementary Methods).We established yeast genome architecture features using interactionsfrom the HindIII libraries at a false discovery rate (FDR) of 1%, and

confirmed them with interactions from the EcoRI libraries at thesame threshold.

From our HindIII libraries, we identified 2,179,977 total interac-tions at an FDR of 1%, corresponding to 65,683 interactions betweendistinct pairs of HindIII fragments. We used these data to generateconformational maps of all 16 yeast chromosomes. The overall pro-pensity of HindIII fragments to engage in intra-chromosomal inter-actions varied little between chromosomes, ranging from 436interactions per HindIII fragment on chromosome XI to 620 inter-actions per HindIII fragment on chromosome IV (SupplementaryTable 9). These results indicate broadly similar densities of self-interaction (intra-chromosomal interaction) between chromosomesand indicate that the density of self-interaction does not vary withchromosome size (Supplementary Fig. 7).

Some large segments of chromosomes showed a striking propen-sity to interact with similarly sized regions of the same chromosome.For example, two regions on chromosome III (positions 30–90 kilo-bases (kb), and 105–185 kb) showed an excess of interactions (Fig. 3

*These authors contributed equally to this work.

1Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington 98195-8056, USA. 2Department of Medicine, University of Washington Seattle,Washington 98195-8056, USA. 3Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, USA. 4Graduate Program in Molecular and CellularBiology, University of Washington, Seattle, Washington 98195-5065, USA. 5Howard Hughes Medical Institute.

Nucleus

1. Crosslinking

2. RE1 cutting

5. RE2 cutting7. RE1 cutting

6. Circularization

8. EcoP15l adaptor

9. Biotinylated adaptor & circularization

10. EcoP15l cutting

11. Biotin isolation

2. Deproteinization

3. Intra-molecular ligation

RE1RE1

RE1

RE1

RE1

RE1

RE1

RE1 RE1

EcoP15l EcoP15l

RE2

RE2

RE1

RE1

Biotinylated adaptor

RE1

EcoP15l EcoP15l

Biotinylatedadaptor

RE1

N25–27N25–27

RE1RE2

Figure 1 | Schematic depiction of the method. Our method relies on the 4Cprocedure by using cross-linking, two rounds of alternating restrictionenzyme (RE) digestion (6-bp-cutter RE1 for the 3C-step digestion and 4-bp-cutter RE2 for the 4C-step digestion) and intra-molecular ligation. At step 7,each circle contains the 6-bp restriction enzyme recognition site originallyused to link the two interacting partner sequences (RE1).Diverging from4C,we relinearize the circles using RE1, then sequentially insert two sets ofadaptors, one of which permits digestion with a type IIS or type IIIrestriction enzyme (such as EcoP15I). Following EcoP15I digestion,fragments are produced that incorporate interacting partner sequence ateither end, which can be rendered suitable for deep sequencing (seeSupplementary Methods).

Vol 465 |20 May 2010 |doi:10.1038/nature08973

363Macmillan Publishers Limited. All rights reserved©2010

Nature 465 363 (2010)

Page 55: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

Application: tumorigenesis

Page 56: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

S Yachida et al. Nature 467, 1114-1117 (2010) doi:10.1038/

nature09515

Geographic mapping of metastatic clones within the primary carcinoma

and proposed clonal evolution of Pa08.

Page 57: Next Generation DNA sequencing: why should we care€¦ · Next Generation DNA sequencing: why should we care Shoudan Liang (Keith Baggerly & Bradley Broom) Microarray Analysis Class,

• happy thanksgiving