ge thenome institute - genome.gov | national human genome ... · nhgri current topics in genome...

24
NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine Mardis, Ph.D. 1 the Genome at Washington University Institute © Elaine R. Mardis Elaine R. Mardis Professor in Genetics Co-director, The Genome Institute Next-Generation Sequencing Technologies NHGRI Current Topics in Genome Analysis © Elaine R. Mardis Current Topics in Genome Analysis 2012 Elaine Mardis, Ph.D. No Relevant Financial Relationships with Commercial Interests

Upload: others

Post on 12-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

1  

the

Genomeat Washington University

Institute

© Elaine R. Mardis

Elaine R. Mardis Professor in Genetics Co-director, The Genome Institute

Next-Generation Sequencing Technologies

NHGRI Current Topics in Genome Analysis

© Elaine R. Mardis

Current Topics in Genome Analysis 2012

Elaine Mardis, Ph.D.

No Relevant Financial Relationships with ���Commercial Interests

Page 2: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

2  

© Elaine R. Mardis

Overview

•  Next-Generation Sequencing (NGS) Instruments •  Roche/454 •  Illumina •  Life Technologies •  Pacific Biosciences •  Ion Torrent •  Oxford Nanopore

•  NGS Applications across the spectrum of genomics •  Examples from our work •  Future Directions

© Elaine R. Mardis

© Elaine R. Mardis

The Trajectory of Throughput: 10 years

E.R. Mardis, Nature (2011) 470: 198-203

Page 3: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

3  

© Elaine R. Mardis

Capillary technology

Applied Biosystems 3730xl (2004)

$15,000,000

Next-gen technology

Illumina HiSeq (2011)

$10,000

Comparative costs: sequencing a human genome

© Elaine R. Mardis

Next-generation Sequencer basics Platforms and their attributes

Page 4: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

4  

© Elaine R. Mardis

Next-generation DNA sequencing instruments

•  All commercially-available sequencers have the following shared attributes: •  Random fragmentation of starting DNA, ligation with custom linkers

= “a library” •  Library amplification on a solid surface (either bead or glass) •  Direct step-by-step detection of each nucleotide base incorporated

during the sequencing reaction •  Hundreds of thousands to hundreds of millions of reactions imaged

per instrument run = “massively parallel sequencing” •  Shorter read lengths than capillary sequencers •  A “digital” read type that enables direct quantitative comparisons •  A sequencing mechanism that samples both ends of every fragment

sequenced (“paired end” reads)

© Elaine R. Mardis

Paired-end reads

•  All next-gen platforms now offer paired end read capability, e.g. sequences can be derived from both ends of the library fragments.

•  Differences exist in the _distance_ between read pairs, based on the approach/platform. •  “paired ends” : linear fragment sequenced at both ends in

two separate reactions •  “mate pairs” : circularized fragment of >1kb, sequenced by

a single reaction read or by two separate end reads (platform dependent)

•  In general, paired end reads offer advantages for sequencing large and complex genomes because they can be more accurately placed (“mapped”) than can single ended short reads.

Page 5: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

5  

© Elaine R. Mardis

Roche/454 Library Construction and emPCR

Single Stranded Adapter Ligated Library • Random Fragmentation • Adapter Ligation

•  emPCR for clonal amplification on bead

© Elaine R. Mardis

Roche/454 Pyrosequencing

DNA Capture Bead Containing Millions of

Copies of a Single Clonal Fragment

A A T C G G C A T G C T A A A A G T C A

Anneal Primer T

PP i

ATP

Light + oxy luciferin

Sulfurylase

Luciferase

APS

luciferin

Centrifugation

Load Enzyme Beads Load beads into

PicoTiter™Plate

Sequencing by Synthesis

Page 6: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

6  

© Elaine R. Mardis

454 Instrumentation

Instrument Run Time (hr)

Read Length

(bp)

Yield (Mb/run)

Error Type

Error Rate (%)

Purchase Cost

(x1000)

454 FLX+ 18-20 700 900 Indel 1 $30 A

454 FLX Titanium 10 400 500 Indel 1 $500

454 GS Jr. Titanium 10 400 50 Indel 1 $108

A – Requires the 454 FLX Titanium. This is the upgrade cost.

Notable:

•  Mate pair paired end reads of 3kb, 8kb and 20 kb separation without an increase in run time.

•  Cost per run makes sequencing an entire human genome cost-prohibitive relative to other technologies (~ $20/Mbp)

•  Great platform for targeted validation

© Elaine R. Mardis

Illumina Sequencing: Library Preparation

Page 7: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

7  

© Elaine R. Mardis

Illumina Sequencing by Synthesis

Excitation

Emission Incorporate

Detect De-block

Cleave fluor

© Elaine R. Mardis

Illumina Instrumentation

Instrument Run Time (days)

Read Length

(bp)

Yield (Gb/run)

Error Type

Error Rate (%)

Purchase Cost

(x1000)

GAIIx 14 150 x 150 96 Sub >0.1 $525

HiSeq 2000 8 100 x 100 200 x 2 Sub >0.1 $700

HiSeq 2000 v3 10 100 x 100 <600 Sub >0.1 $700

MiSeq 1 150 x 150 2 Sub >0.1 $125

•  2010: HiSeq 2000 •  Two flow cells per run

•  100 Gbp/FC or two genome equivalents per run •  New scanning mechanics - scans both surfaces of FC lanes

•  2011: HiSeq 2000

•  Improved chemistry (v. 3): increased yield and accuracy

•  2011: MiSeq

Page 8: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

8  

© Elaine R. Mardis

Life Technologies: sequencing by ligation

•  custom adapter library •  emPCR on magnetic beads •  sequencing by ligation using fluorescent probes from a common primer •  sequential rounds of ligation from a series of primers •  fixed/known nucleotides for each probeset identify two bases each cycle, or “two base encoding”

© Elaine R. Mardis

SOLiD Instrumentation

Instrument Run Time (days)

Read Length

(bp)

Yield (Gb/run)

Error Type

Error Rate (%)

Purchase Cost

(x1000)

SOLiD 4 12 50 x 35 PE 71 A-T Bias >0.06 $475

SOLiD 5500 xl 8 75 x 35 PE 60 x 60 MP

155 A-T Bias >0.01 $595

5500 xl •  Front-end automation addresses bottlenecks at emPCR, breaking, and enrichment

of beads •  6-lane Flow Chip with independent lanes/2 per run •  Cost per whole genome data set is predicted to be $6K by 2011 •  Very high accuracy data due to two-base encoding •  ECC Module – An optional 6th primer that increases accuracy to 99.999% •  Direct conversion of color space to base space •  True paired-end chemistry enabled – Ligation reaction can be used in either

direction

Page 9: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

9  

© Elaine R. Mardis

Third generation sequencers??

•  Recently, new sequencing platforms were introduced. •  The Pacific Biosciences sequencer is a single molecule

detection system that marries nanotechnology with molecular biology.

•  The Ion Torrent uses pH rather than light to detect nucleotide incorporations.

•  The MiSeq is a scaled down version of the HiSeq, with faster chemistry and scanning.

•  All offer a faster run time, lower cost per run, reduced amount of data generated relative to 2nd Gen platforms, and the potential to address genetic questions in the clinical setting.

© Elaine R. Mardis

Comparisons to Third-Generation Sequencers

Company Platform Name Sequencing Amplification Run Time

Roche 454 Ti DNA Polymerase “Pyrosequencing”

emPCR 10 hours

Illumina Hi-Seq/MiSeq

DNA Polymerase Bridge amplification

10 days/24 hours

Life SOLiD/5500

DNA Ligase emPCR 12 days

Ion Torrent PGM Synthesis H+ detection

emPCR 2 hours

Pacific Biosciences

RS Synthesis NONE 45 min

Page 10: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

10  

© Elaine R. Mardis

Pacific Biosciences RS

Shearing (Covaris/Hydroshear)

Polish ends

SMRTbell™ ligation

Sequencing primer annealing

DNA polymerase binding

Load library/polymerase complex onto SMRT cell

Zero Mode Waveguides (ZMWs) Ver. 1.1 = 1 x 45,000 per SMRT cell Ver. 1.2 = 2 x 75,000 per SMRT cell

Sample Prep Library/Polymerase Complex

Movie 1 (v1.1 & v1.2)

Raw reads

Post-filter reads

Mapped reads

Movie 2 (only v1.2)

Raw reads

Post-filter reads

Mapped reads

Sequencing

© Elaine R. Mardis

The SMRTbell™ Library Types

Circular Consensus Small ~250bp Single Movie Multiple passes Many sub-reads

Sub Reads

VLR (“very long read”) Larger ~6kbp Single 45 minute Movie Single long read provides linking information

Standard Large ~2kbp Single Movie Few sub-reads

Sub Reads

Reads

Consensus Read

Read

Page 11: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

11  

© Elaine R. Mardis

PacBio RS Instrumentation

Instrument Run Time (Hours)

Read Length

(bp)

Yield (Mb)

Error Type

Error Rate (%)

Purchase Cost

(x1000)

RS 14 (~8

SMRTCells) 2500 45 per

SMRTCell Insertions 15 $695

mean mapped sub-read accuracy: 86.2% mean mapped sub-read length: 3,416 bp maximum mapped read length: 8,580 bp maximum 95th percentile mapped read length: 5,807 bp

1x45 min movie 8 SMRT cells

Strobe polymerase/strobe reagent/strobe protocol (45 min movie)

2x45 min movies 8 SMRT cells

Strobe polymerase/standard reagent/standard protocol

© Elaine R. Mardis

ION Torrent Personal Genome Machine (PGM)

Page 12: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

12  

© Elaine R. Mardis

Ion Torrent Yield Trajectory

© Elaine R. Mardis

Ion Torrent Yield with Automation Modules

Page 13: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

13  

© Elaine R. Mardis

Oxford Nanopore Sequencing

Exonuclease-aided sequencing

Pore translocation sequencing

© Elaine R. Mardis

Applying Next Generation Sequencing

•  Genomes: re-sequencing or de novo •  point mutation/indel/structural variation

discovery •  Protein:DNA binding

- Chromatin IP/histone binding - Nucleosome/transcription factor binding, etc.

•  ncRNA discovery/sequencing/variants •  Transcriptome sequencing (RNA-seq) •  Genome-wide methylation of DNA (Methyl-seq) •  Clinical sequencing for therapeutic decisions

E.R. Mardis, Annual Reviews in Genetics & Genomics (2008) E.R. Mardis, Nature (2011) 470: 198-203

Page 14: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

14  

© Elaine R. Mardis

•  Prepare paired end libraries as whole genome fragment/shotgun by random shearing of genomic DNA, adapter ligation, size selection.

•  Produce paired end data from each end of billions of library fragments, over-sampling about 30-fold to cover at a depth sufficient to find all types of genome alterations.

•  Computer programs align the read pair sequences onto the reference genome and several algorithms are used to discover variants genome-wide.

Whole Genome Sequencing: Data Production and Alignment

© Elaine R. Mardis

Cancer Genomics

© R.K.Wilson 2011

Whole Genome Tumor:Normal Comparison

Ley et al., Nature 2008

• Caucasian female, mid-50s at diagnosis

• De novo M1 AML • Family history of

AML and lymphoma • 100% blasts in

initial BM sample • Relapsed and died

at 23 months • Normal

cytogenetics • Informed consent

for whole genome sequencing

• Solexa sequencer, 32 bp unpaired reads

• 10 somatic mutations detected

Page 15: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

15  

© Elaine R. Mardis

Disease progression model for Patient AML1

Diagnosis: Multiple leukemic clones

present

Clinical remission: loss of most

leukemic clones

Relapse: Acquisition of new mutations in a pre-existing clone

© Elaine R. Mardis

BreakDancer: detecting somatic structural variation

•  Read pair analysis with BreakDancer identifies putative SVs based on sets of read pairs that don’t map as expected, with respect to distance apart and/or orientation.

•  The identified reads are used to produce an assembly, then validated.

K. Chen et al., Nature Methods 6: 677-81 (2009)

Page 16: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

16  

© Elaine R. Mardis

SV Ref:

Var:

TIGRA_SV: assembling SVs to nucleotide resolution

TIGRA_SV targeted local Assembly

Integration

Features: • de Bruijin graphic approach • Multiple Kmer, read threading • Decode all alleles

A tandem duplication

An indel

AGCTGT---CA!AGCTGTTGTCA!

AGCTGTCA!AGC---CA!

Contig2.1.-5.1.3

•  Split reads •  Read pairs •  Read depth

© Elaine R. Mardis

Clinical case: “AML52”

37 y.o. female with de novo AML; M3 morphology

Complex cytogenetics, persistent leukemia

First remission, referred to WU for SCT. rBM: normal morphology, cytogenetics; negative

for PML/RARA.

Allogeneic SCT

Consolidation + ATRA

Chemo + ATRA

Chemo only

???

Page 17: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

17  

© Elaine R. Mardis

An insertion-derived fusion of PML and RARA

Tumor

Chromosome 17:

Chromosome 15:

PML-RARA: PML exons 1-3 fused to RARA exons 3-9 (bcr3 variant in frame)

RARA-LoxL1: fusion out of frame

LoxL1-PML: truncated protein - premature stop in novel LoxL1 exon 5a

© Elaine R. Mardis Welch et al., JAMA April 20, 2011

Page 18: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

18  

© Elaine R. Mardis

Combining Platforms: de novo Assembly

Two VLR PacBio reads contiguate an Illumina assembly gap

© Elaine R. Mardis

Rapid Genotyping by Ion Torrent

Total Number of Bases [Mbp] 20.35 ‣ Number of Q17 Bases [Mbp] 12.49 ‣ Number of Q20 Bases [Mbp] 9.66 Total Number of Reads 218,782 Mean Length [bp] 93 Longest Read [bp] 202

Amplicon Target Sizes

Page 19: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

19  

© Elaine R. Mardis

Hybrid Capture

•  Hybrid capture - fragments from a whole genome library are selected by combining with probes that correspond to most (not all) human exons or gene targets.

•  The probe DNAs are biotinylated, making selection from solution with streptavidin magnetic beads an effective means of purification.

•  An “exome” by definition, is the exons of all genes annotated in the species’ reference genome.

•  Custom capture reagents can be synthesized to target specific loci that may be of interest in a clinical context.

© Elaine R. Mardis

Merkel Cell Polyoma Virus Capture

•  Merkel Cell Polyoma virus •  MCPyV shows frequent genomic deletions and sequence

mutations that make it difficult to amplify the virus from cases of MCC by PCR

•  The circular genome does not contain a defined linearization sequence

•  Only FFPE material available for majority of cases

•  For proof-of-principle experiments: •  Biotinylated PCR amplicons designed to target entire 5Kb viral

genome •  Hybrid capture and sequencing •  Analysis to identify insertion points in human genomes

Page 20: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

20  

© Elaine R. Mardis

Viral Coverage Plots (4 FFPE Samples)

Duncavage et al., JMD 2011

© Elaine R. Mardis

Finding the Junction (Integration Site) using SLOPE

Page 21: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

21  

© Elaine R. Mardis

RNA Sequencing

RNA isolate Size selection for nc RNA classes

Fragment, RT w/randoms

polyA priming, RT, ds DNA SAGE tags

Adapter-ligated fragments for next-gen sequencing

Alignment to reference database & discovery - Expression levels

- Novel splice isoforms - Allelic bias in transcription

- Fusion transcripts

© Elaine R. Mardis

TNRC6B splice site mutation -> alt. splicing

Tumor WGS

Normal WGS

Tumor RNA-seq

Metastatic breast cancer (to brain)

Acceptor site mutation

Page 22: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

22  

© Elaine R. Mardis

Samples

Microbial Communities

Reference Sequences Metagenomics

16S rRNA WGS

Data Coordination, Analysis and

Concordance to Clinical Data

Subjects

Sources of strains

Goals: - Defining Core Microbiomes in “healthy” subjects - Catalog changes in Microbiomes with disease

Human Microbiome Project

Community Phenotypes

© Elaine R. Mardis

Stability of virome over time

Alp

hapa

pillo

mav

irus

Pol

yom

aviru

s

Ros

eolo

viru

s

Lym

phoc

rypt

oviru

s

Mas

tade

novi

rus

Alp

hapa

pillo

mav

irus

Gam

map

apill

omav

irus

Hum

an

papi

llom

aviru

s

Pol

yom

avir

us

Ros

eolo

viru

s

Lym

phoc

rypt

ovir

us

Mas

tade

novi

rus

Gam

map

apill

omav

irus

Hum

an p

apill

omav

irus

Kristine Wylie, WUGI

GI

Oral

Vaginal

Nasal

Page 23: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

23  

© Elaine R. Mardis

400 MRSA Genomes Legend:

CD

MRSA.6326.326W87

MRSA.6330.330N40

MRSA.6347.347W106

MRSA.6365.G96

MRSA.6433.W211

MRSA.6433.N82

MRSA.6433.G99

MRSA.R103G

MRSA.6421.421N69

MRSA.6254.254W53

MRSA.6382.382W140

MRSA.6415.2790W210

MRSA.6415.4563W208

MRSA.6222.222N8

MRSA.6266.266.1G39

MRSA.6432.A46

MRSA.6398.N80

MRSA.6436.W212

MRSA.R103 714N

MRSA.R402 6488W

MRSA.R402N

MRSA.6370.370W132

MRSA.6342.342W101

MRSA.6341.341A26

MRSA.6331.331W92

MRSA.6412.4124774W170

MRSA.6224.224W20

MRSA.6329.329N39

MRSA.6329.329W90

MRSA.6245.245W45

MRSA.6245.245.1N21

MRSA.6245.245A12

MRSA.6245.245.1A13

MRSA.6245.245.1G25

MRSA.6216.216G6

MRSA.6216.216W14

MRSA.6246.246G26

MRSA.6246.246W46

MRSA.6326.326.1N37

MRSA.6231.231W27

MRSA.6422.4227588W179MRSA.6422.422N70

MRSA.6331.331.1G55MRSA.6332.332W95MRSA.6209.209W5MRSA.6386.N78

MRSA.6386.386A39MRSA.6331.331.1G54

MRSA.6331.3314371W93

MRSA.6500.3300G82MRSA.6501.3301A40

MRSA.6500.3300W155

MRSA.6348.348W107MRSA.6374.374W134

MRSA.6335.335G56

MRSA.6344.344W102MRSA.6345.345N44

MRSA.6345.345W103 MRSA

.630

7.30

7A16

MRSA.6307.307N28

MRSA.6307.307G42

MRSA.6307.307W72MRSA.6261.261G35

MRSA.6341.341.1N43MRSA.6215.2157462W12

MRSA.6407.4076690W166MRSA.6407.4074440W167

MRSA.6407.407N65

MRSA.6255.255G31MRSA.6255.255W54

MRSA.6415.7010W209MRSA.6415.N81

MRSA.6430.430G89

MRSA.6430.430W186

MRSA.6430.430N72

MRSA.6361.361W121

MRSA.6395.395W150

MRSA.6346.3463077W105

MRSA.6437.W213

MRSA.6437.N83

MRSA.6359.359W119

MRSA.6366.3667367W128

MRSA.6366.366W127

MRSA.6366.366G72

MRSA.6366.366N51

MRSA.6384.384A37

MRSA.6387.387G78

MRSA.6387.387N59

MRSA.6386.386W144

MRSA.6402.402N63

MRSA.6386.386G77

MRSA.6389.W206

MRSA.6389.A45

MRSA.6389.N79

MRSA.6401.4016575W160

MRSA.6349.349G62

MRSA.6258.258N25

MRSA.6258.258G33

MRSA.6258.258W56

MRSA.6266.2661353W66

MRSA.6266.2667011W67

MRSA.6266.266W65

MRSA.6266.2665921W68

MRSA.6266.266A15

MRSA.6266.266G38

MRSA.6427.427W184

MRSA.6427.427N71

MRSA.6243.W196

MRSA.6242.242W42

MRSA.6252.2527094W52

MRSA.6252.252G29

MRSA.6252.252W51

MRSA.6502.3302W157

MRSA.6501.3301W156

MRSA.6501.3301G83

MRSA.6501.3301N61

MRSA.6260.260G34

MRSA.6260.260W58

MRSA.6340.340A25

MRSA.6340.340W100

MRSA.6340.340G57

MRSA.6341.341N42

MRSA.6394.394G79

MRSA.6391.391W148

MRSA.6421.4217599W177

MRSA.6303.G93

MRSA.6303.W199

MRSA.6267.267.1N27

MRSA.6267.267W69

MRSA.6403.4036488W163

MRSA.6399.399W154

MRSA.6204.204W4

MRSA.6305.305W71

MRSA.6335.3352985W190

MRSA.6335.335W96

MRSA.6250.250W49

MRSA.6339.339W99

MRSA.6

328.32

8W89

MRSA.6

384.38

4W142

MRSA.6

406.40

6A41

MRSA.6

351.35

1W113

MRSA.6

319.31

9W81

MRSA.6

377.37

7W136

MRSA.6

376.37

6G74

MRSA.6

398.39

8W153

MRSA.631

1.311.1A

20

MRSA.631

1.311.1N

32

MRSA.631

1.311N31

MRSA.6311.

311A19

MRSA.6311.

311W75

MRSA.643

8.W214

MRSA.624

1.241N18

MRSA.624

1.241W41

MRSA.6

416.41

63680W

172

MRSA.6

416.41

6W171

MRSA.6

416.41

6.1G87

MRSA.6

346.34

6W104

MRSA.6

336.33

6W97

MRSA.6

337.33

7W98

MRSA.6265.265G37

MRSA.6265.265W64

MRSA.6350.350W109

MRSA.6349.349.1G63

MRSA.6

349.34

9W108

MRSA.6321.W200

MRSA.6214.214N4

MRSA.6214.214W10

MRSA.6380.380W138

MRSA.6268.268.1G41

MRSA.6268.268G40

MRSA.6268.268W70

MRSA.6308.308W73

MRSA.6385.385N58

MRSA.6385.385G76

MRSA.6385.385W143

MRSA.6385.385A38

MRSA.6385.3855370W192

MRSA.6365.365N50

MRSA.6365.365W126

MRSA.6327.327G50

MRSA.6327.327W88

MRSA.6369.369W131

MRSA.6320.320A21

MRSA.6320.320W82

MRSA.6320.320N33

MRSA.6320.320.1N34

MRSA.6320.320G46

MRSA.6396.3961643W152

MRSA.6396.396G80

MRSA.6396.396W151

MRSA.6355.355A32

MRSA.6355.355W117

MRSA.6355.355G67

MRSA.6355.355N47

MRSA.6309.309W74

MRSA.6309.309.1A18

MRSA.6309.309.1N30

MRSA.6263.2636238W63

MRSA.6263.263W62

MRSA.6328.328A23

MRSA.6327.327.1G51

MRSA.6328.328G52

MRSA.6324.324G48

MRSA.6323.323W83

MRSA.6219.W195

MRSA.6219.G90

MRSA.6219.N74

MRSA.6219.G91

MRSA.6226.226W21

MRSA.6226.226A6

MRSA.6226.226N9

MRSA.6261.261A14

MRSA.6261.261W59

MRSA.6377.377A36MRSA.6377.377N56

MRSA.6229.229W24

MRSA.6223.223G12

MRSA.6223.223W19

MRSA.6363.W203

MRSA.6232.2326105W29

MRSA.6232.232G16

MRSA.6232.232W28

MRSA.6388.388.1N60

MRSA.6390.3905178W147

MRSA.6390.390W146

MRSA.6236.2366121W33

MRSA.6236.236.1G20

MRSA.6236.2366527W35

MRSA.6236.2366195W34

MRSA.6236.236G19

MRSA.6236.236W32

MRSA.6215.2151332W13

MRSA.6215.215A2

MRSA.6215.215G5

MRSA.6215.215W11

MRSA.6221.221G11

MRSA.6221.221W18

MRSA.6388.W205

MRSA.6218.218G9

MRSA.6218.218.1G10

MRSA.6218.218W16

MRSA.6419.419W175

MRSA.6218.218.1A4

MRSA.6218.218N6

MRSA.6360.360N49

MRSA.6360.360W120

MRSA.6210.210G3

MRSA.6210.210W6

MRSA.6227.227N10

MRSA.6227.227.1N11

MRSA.6227.227G13

MRSA.6227.227W22

MRSA.6401.401N62

MRSA.6356.356N48

MRSA.6356.356G68

MRSA.6356.356A33

MRSA.6356.356W118

MRSA.6397.W207

MRSA.6233.233A7

MRSA.6233.233N14

MRSA.6234.234W31

MRSA.6411.4113370W168

MRSA.6412.412W169

MRSA.6412.412N68

MRSA.6431.G98

MRSA.6334.N76

MRSA.6334.W201

MRSA.6203.203G2

MRSA.6203.203W3

MRSA.6203.203A1

MRSA.6308.308A17

MRSA.6362.362G69

MRSA.6362.362W122

MRSA.6431.431A42

MRSA.6431.4313236W187

MRSA.6431.431N73

MRSA.6352.352G64

MRSA.6352.352W114

MRSA.6401.4016336W161

MRSA.6353.353G65MR

SA.6247.G92

MRSA.6354.G94

MRSA.6354.W202

MRSA.6247.N75

MRSA.6247.A43

MRSA.6247.W197

MRSA.6

247.29

47W198

MRSA.6331.3314694W94

MRSA.6368.368W130

MRSA.6217.217.1A3

MRSA.6217.2173205W18

9MRSA.6344.

344A27MRSA.6344.3

44G58MRSA.6396.396

.1G81MRSA.6217.217.1N5MRSA.6217.217.1G8MRSA.6217.217G7

MRSA.6217.217W15MRSA.6383.383W141

MRSA.6342.342342807W191

MRSA.6230.2302975W26

MRSA.6230.230W25

MRSA.6230.230N12

MRSA.6230.230.1G15

MRSA.6230.230.1N13

MRSA.6376.376.1A35

MRSA.6345.345G59

MRSA.6376.376.1N55

MRSA.6376.376A34

MRSA.6376.376W135

MRSA.6376.376.1G75

MRSA.6376.376N54

MRSA.6418.418W174

MRSA.6254.254G30

MRSA.6248.248W47

MRSA.6387.387W145

MRSA.6211.211W7

MRSA.6504.3304W158

MRSA.6506.3306W159

MRSA.6251.251N22

MRSA.6251.251W50

MRSA.6370.370N52

MRSA.6324.324.1N36

MRSA.6325.325W86

MRSA.6

257.25

7W55

MRSA.6

257.25

7G32

MRSA.6

257.25

7N24MRS

A.6262

.26211

99W61

MRSA.6

262.26

2G36

MRSA.6

262.26

2N26

MRSA.6

262.26

2W60

MRSA.6

379.37

9N57

MRSA.6

379.37

9W137

MRSA.6

421.42

17968W

178

MRSA.6

429.42

9W185

MRSA.6

249.24

9.1G28

MRSA.6

249.24

9G27

MRSA.6

249.24

9W48

MRSA.631

4.314G44

MRSA.631

4.314354

1W77MRSA

.6314.31

4.1G45

MRSA.631

4.314W76

MRSA.6375.

N77MRSA.6375.

G97MRSA.6375.W2

04MRSA.6238.23

8A8MRSA.6238.23

8W37MRSA.6238.238N

16MRSA.6238.2938W3

8

MRSA.6350.

3500593W11

2MRSA.

6350.350.1

A28MRSA.

6350.35013

96W110

MRSA.642

6.426W18

3MRSA

.6345.34

5.1G60

MRSA.6402.4026402W162

MRSA.6424.4247207W182

MRSA.6266.A44

MRSA.6381.381W139

MRSA.6420.420W176MRSA.6244.244.1A11MRSA.6244.2444815W44

MRSA.6244.244W43MRSA.6241.241G22MRSA.6244.244G24

MRSA.6244.244A10MRSA.6244.244N20

MRSA.6222.222A5MRSA.6353.353.1N46MRSA.6353.353.1G66MRSA.6353.353N45MRSA.6353.3535901W116MRSA.6353.353.1A31MRSA.6330.330.1N41MRSA.6353.353A30MRSA.6353.353W115MRSA.6417.417G88MRSA.6213.213N3

MRSA.6213.213G4MRSA.6213.213W9

MRSA.6242.242N19MRSA.6239.239G21

MRSA.6239.239W39

MRSA.6239.239A9MRSA.6239.239N17

MRSA.6328.328N38

MRSA.6234.234G18

MRSA.6233.233W30

MRSA.6240.240W40

MRSA.6364.364G70

MRSA.6364.364083W124

MRSA.6364.3647947W125

MRSA.6364.364.1G71

MRSA.6364.364W123

MRSA.6201.2011355W2

MRSA.R201G

MRSA.R201W

MRSA.6201.201N1

MRSA.6201.201.1N2

MRSA.8869.201G1

MRSA.6351.351A29

MRSA.6330.330.1G53

MRSA.6330.330W91

MRSA.6410.410N66

MRSA.6410.4102889W194

MRSA.6410.4106835W193

MRSA.6410.410G85

MRSA.6410.410.1G86

MRSA.6410.410.1N67

MRSA.6212.212W8

MRSA.6220.220W17

MRSA.6394.394W149

MRSA.6348.348G61

MRSA.6323.323G47

MRSA.6324.324W84

MRSA.6324.324.1G49

MRSA.6324.324.1A22

MRSA.6324.3247192W85

MRSA.6228.228.1G14

MRSA.6228.228W23

MRSA.6259.259W57

MRSA.R406 2601W

MRSA.R406N

MRSA.6406.406G84MRSA.6406.406N64MRSA.6406.4062601W165MRSA.6406.4063760W164

MRSA.6367.367W129

MRSA.6237.237.1N15

MRSA.6237.237W36

MRSA.6218.218.1N7

MRSA.6423.4232663W181

MRSA.6423.4237501W180

MRSA.6439.

W215

MRSA.6363.

G95

MRSA.6317.

3176744W79

MRSA.6317.31

7W78

MRSA.6435.43

5W188

MRSA.6336.33

6A24

MRSA.6371.371N

53

MRSA.6371.371G73

MRSA.6371.371W133

© Elaine R. Mardis

Conclusions

•  2nd and 3rd generation sequencing instruments are revolutionizing biological research.

•  Earliest impacts have been on cancer genomics and metagenomics.

•  The extreme need for bioinformatics-based analytical approaches to interpret these large data sets has revitalized the field and introduced statistical and mathematical rigor.

•  Integration across data sets from DNA, RNA, methylation, proteomics, etc. presents the next challenge but provides comprehensive analytical power to inform biology.

•  With newer instruments, clinical applications have potential for implementation, with appropriate interpretive algorithms.

Page 24: Ge thenome Institute - Genome.gov | National Human Genome ... · NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies February 22, 2012 Elaine

NHGRI Current Topics in Genome Analysis 2012 Week 6: Next-Generation Sequencing Technologies

February 22, 2012 Elaine Mardis, Ph.D.

24  

© Elaine R. Mardis

Acknowledgements

•  The Genome Institute Vince Magrini Todd Wylie Sean McGrath Amy Ly Jason Walker Jasreet Hundal Lisa Cook Li Ding George Weinstock Richard K. Wilson

•  Clinical Collaborators (WUSM) Tim Ley Phil Tarr John Pfeifer