comparison of commercially available target enrichment … · 2015. 9. 6. · comparison of...

Comparison of Commercially Available Comparison of Commercially Available Target Enrichment Methods for NextTarget Enrichment Methods for NextTarget Enrichment Methods for Next Target Enrichment Methods for Next

Generation Sequencing with the Illumina Generation Sequencing with the Illumina PlatformPlatform

March 20-23, 2010

Sacramento, CA

Anoja Perera, Scottie Adams, David Bintzler, Kip Bodi, Ken Dewar, Anoja Perera, Scottie Adams, David Bintzler, Kip Bodi, Ken Dewar, Deborah Grove, Jan Kieleczawa, Robert Lyons, Aaron Noll, Sushmita Deborah Grove, Jan Kieleczawa, Robert Lyons, Aaron Noll, Sushmita

Si h R b t St Mi h l Zi iSi h R b t St Mi h l Zi iSingh, Robert Steen, Michael ZianniSingh, Robert Steen, Michael Zianni

Why do capture?Why do capture?Why do capture?Why do capture?U i Ill i HiS 2000Using Illumina HiSeq 2000:

1 run = 10 days = 200 Gb => 1 Human Genome at a cost of ~$10,000

- GWAS

- Exome sequencing

- Candidate gene sequencing

Capture Methods

Stanford.edu

Many different methodsMany different methodsMany different methods…Many different methods…Some key features:1. Multiplex PCR Some key features:

•High specificity and coverage

•200-20 000 primer pairs target 10Kb to200 20,000 primer pairs target 10Kb to 10Mb

•Genomic DNA input: 2 μμgg

•Requires capital equipment

2. Hybridization approaches; in-solution or array based

Agilent SureSelectAgilent SureSelectAgilent SureSelectAgilent SureSelectSome key features:Some key features:Some key features:Some key features:

eArray: free of charge custom eArray: free of charge custom d i t ld i t larray design toolarray design tool

RNA probes: 100 basesRNA probes: 100 basesOne array can capture up to One array can capture up to y p py p p3.3Mb after masking3.3Mb after maskingGenomic DNA input: 3Genomic DNA input: 3μμg or g or lesslessNo capital equipment!No capital equipment!Sample cost: ~$1000Sample cost: ~$1000Automation friendly andAutomation friendly andAutomation friendly and Automation friendly and ScalableScalableIf done manually many handsIf done manually many hands--

tton stepson steps

http://www.chem.agilent.com/Library/brochures/5990-3532en_lo%20CMS.pdf

Febit HybSelectFebit HybSelectFebit HybSelectFebit HybSelectSome key features:Some key features:

“Microfluidic Biochip” contains 8 separate micro “Microfluidic Biochip” contains 8 separate micro channels for independent capturechannels for independent captureRequires capital equipment or use of a serviceRequires capital equipment or use of a serviceRequires capital equipment or use of a service Requires capital equipment or use of a service provider provider 30 min hands30 min hands--on timeon time

Genomic DNA : 1Genomic DNA : 1--5 5 μμggOligo length: 60Oligo length: 60--mermerOne array can capture up to 125Kb afterOne array can capture up to 125Kb afterOne array can capture up to 125Kb after One array can capture up to 125Kb after maskingmaskingSample cost: over $2000Sample cost: over $2000

http://www.febit.com/microarray-sequencing/services/hybselect/

NimbleGen Array CaptureNimbleGen Array CaptureNimbleGen Array CaptureNimbleGen Array CaptureSome key features:Some key features:yy

Requires a hybridization and Requires a hybridization and elution systems or use of a elution systems or use of a yyservice providerservice providerCharged array design (waived if Charged array design (waived if more than 5 arrays)more than 5 arrays)

Genomic DNA input: 3Genomic DNA input: 3μμgg or lessor lessOne array capture 5One array capture 5--30 Mb after 30 Mb after maskingmaskingmaskingmaskingValidated arrays for human Validated arrays for human studiesstudiesSample cost: ~$1000Sample cost: ~$1000Sample cost: ~$1000Sample cost: ~$10002010: In2010: In--solution capture is solution capture is availableavailable

http://www.nimblegen.com/products/seqcap/

2009/10 DSRG study2009/10 DSRG study2009/10 DSRG study2009/10 DSRG study•Coriell DNA: ‘The Human Reference Genetic Material Repository DNA Sample’ (catalog ID: NS12911) http://huref.jcvi.org/( g ) p j g

•Two types of regions selected

(total ~3.5Mb):

1 2Mb continuous region1. 2Mb continuous region

2. 31 individual genes*g* The genes selected ranged widely in regards to size (2kb to 400kb), exon numbers, GC content, number of transcripts and repetitive nature of the sequences All companies were provided with ensemblnature of the sequences. All companies were provided with ensembl gene IDs and genomic locations.

2009/10 DSRG study2009/10 DSRG study2009/10 DSRG study2009/10 DSRG studyDSRG:DSRG:

Select DNA and regions *** Illumina paired-end sample prep kit provided to all participants!

Febit NimbleGenAgilent

Illumina library prep + In-solution capture

Illumina library prep + Array-based capture

Illumina library prep + Array-based capture

DSRG: Sequence samples on Illumina GAII at 2 different centersSequence samples on Illumina GAII at 2 different centers

D t A l i t 2 diff t tData Analysis at 2 different centers

QC and Illumina Run StatisticsQC and Illumina Run StatisticsQC and Illumina Run StatisticsQC and Illumina Run StatisticsSamples were run on the Agilent High SensitivitySamples were run on the Agilent High SensitivitySamples were run on the Agilent High Sensitivity Samples were run on the Agilent High Sensitivity chip to assess the quantity and qualitychip to assess the quantity and qualitySamples were loaded in equal nM Samples were loaded in equal nM concentrations for each technology on two concentrations for each technology on two gygyIllumina pairedIllumina paired--end flowcellsend flowcellsTwo lanes were dedicated for each technology Two lanes were dedicated for each technology and the flowcells were run at different centersand the flowcells were run at different centers

Illumina Primary Analysis

Elizabeth Ketterer, Kendra YoungElizabeth Ketterer, Kendra Young

Data AnalysisData AnalysisData AnalysisData Analysis1. Combined the datasets from both sequencing centers for 1. Combined the datasets from both sequencing centers for

each replicate each replicate 2. Filtered each data set so that sequences have quality 2. Filtered each data set so that sequences have quality

score > 10 for 100% of the basesscore > 10 for 100% of the basesscore 10 for 100% of the basesscore 10 for 100% of the bases3. Mapped reads against the hg19/GRCh37 genome using 3. Mapped reads against the hg19/GRCh37 genome using

bowtie 0.12.0 bowtie 0.12.0 4 N li d th d t t t l i4 N li d th d t t t l i4. Normalized the data sets to equal sizes4. Normalized the data sets to equal sizes5. A series of ‘5. A series of ‘perlperl’ scripts were then used to calculate ’ scripts were then used to calculate

coverage per position in every targeted region, creating coverage per position in every targeted region, creating g p p y g g , gg p p y g g , ga coverage map a coverage map

6. Coverage maps were imported into the R statistical 6. Coverage maps were imported into the R statistical computing environment (2 1 0) to find the sensitivitycomputing environment (2 1 0) to find the sensitivitycomputing environment (2.1.0), to find the sensitivity, computing environment (2.1.0), to find the sensitivity, specificity, and reproducibility for each samplespecificity, and reproducibility for each sample

7. Plots and figures were generated using the "ggplot2" 7. Plots and figures were generated using the "ggplot2" lib d MS E llib d MS E llibrary and MS Excellibrary and MS Excel

Kip Bodi, Aaron Noll

Sensitivity: How much was Sensitivity: How much was captured?captured?

90

100

age

•Converted hg18 genome build co-ordinates to hg19 co-ordinates

60

70

80

east

1Xco

vera

•Determine the overlapping regions between the two different targeted regions

30

40

50

ases

with

atle g

•Reanalyze all the data for only the overlapping regions.

10

20

30

%Ta

rget

edb

3.5Mb

0Agilent: Replicate 1 Agilent: Replicate 2 Febit: Lane 1 Febit: Lane 2 Nimblegen: Replicate

1Nimblegen: Replicate

2

Technology 2.1Mb


Sensitivity: How much of the 2.1Mb Sensitivity: How much of the 2.1Mb was captured by at least 1 readwas captured by at least 1 read??

90.00%

100.00%

reag

e

60.00%

70.00%

80.00%

east

1X

cove

r

30 00%

40.00%

50.00%

ases

with

at l

e

10.00%

20.00%

30.00%

% T

arge

ted

ba

0.00%Agilent: Replicate1 Agilent: Replicate2 Febit: Lane1 Febit: Lane2 NimbleGen:

Replicate1NimbleGen:Replicate2

Capture method

%


Specificity: How many of the readsSpecificity: How many of the readsSpecificity: How many of the reads Specificity: How many of the reads mapped to the intended targets?mapped to the intended targets?

100.0%

70.0%

80.0%

90.0%

arge

t

50.0%

60.0%

ces

that

are

on

ta

20.0%

30.0%

40.0%

% S

eque

nc

0.0%

10.0%

Agilent: Replicate1 Agilent: Replicate2 Febit: Lane1 Febit: Lane2 NimbleGen: Replicate1 NimbleGen: Replicate2

Kip Bodi

Capture method

% Coverage vs Read Depth% Coverage vs Read Depth% Coverage vs. Read Depth% Coverage vs. Read Depth100 00%

80.00%

90.00%

100.00%

60.00%

70.00%

rage Agilent

30 00%

40.00%

50.00%

% C

over Febit

NimbleGen

10.00%

20.00%

30.00%

0.00%1X 5X 10X 20X 30X 40X 50X 60X 70X 80X 90X 100X

Read Depth


Coverage of Overlapping GenesCoverage of Overlapping GenesCoverage of Overlapping GenesCoverage of Overlapping Genes

Aaron Noll

Coverage of the Continuous Coverage of the Continuous RegionRegion

Kip Bodi

In SummaryIn SummaryIn Summary…In Summary…D i tiD i ti A il tA il t F bitF bit Ni bl GNi bl GDescriptionDescription AgilentAgilent

InIn--solutionsolutionFebitFebitArrayArray

NimbleGenNimbleGenArrayArray

Cost per sampleCost per sampleSample input requirementSample input requirement

√

√√ √Quality of captured Quality of captured samplesampleReproducibilityReproducibility

√

√ √ReproducibilityReproducibilitySensitivitySensitivitySpecificitySpecificity

√

√

√

√SpecificitySpecificityCoverageCoverageScalabilityScalability

√

√

√ScalabilityScalability √

Future DirectionsFuture DirectionsFuture Directions…Future Directions…

R t A il t t ith bR t A il t t ith bRepeat Agilent capture with probes Repeat Agilent capture with probes designed to the 3.5Mb region of hg19designed to the 3.5Mb region of hg19FebitFebit capture: What went wrong? Repeat?capture: What went wrong? Repeat?Determine reasons for low coverage areasDetermine reasons for low coverage areasDetermine reasons for low coverage areas Determine reasons for low coverage areas that are common to all three technologiesthat are common to all three technologiesPerform SNP analysisPerform SNP analysisPerform SNP analysisPerform SNP analysis-- check how well each technology can check how well each technology can detect SNPs in a given regiondetect SNPs in a given region-- look for false positives and negativeslook for false positives and negativesp gp g

Many Thanks!!!Many Thanks!!!Many Thanks!!!Many Thanks!!!Agilent/Vector BiotechAgilent/Vector Biotech

Al d WAl d W DSRGDSRGIlluminaIlluminaJ th Pi tJ th Pi tAlexander WongAlexander Wong

Fred ErnaniFred ErnaniGarrick Peters Garrick Peters Katie WeaverKatie Weaver

DSRGDSRGDavid BintzlerDavid BintzlerDeborah GroveDeborah GroveJan KieleczawaJan Kieleczawa

Jonathan PinterJonathan Pinter

Data Analysis:Data Analysis:Aaron NollAaron NollKatie Weaver Katie Weaver

Ken OlingerKen OlingerNick MaparaNick Mapara

Ken DewarKen DewarMichael ZianniMichael ZianniRobert LyonsRobert Lyons

Aaron NollAaron NollKip Bodi Kip Bodi

OthOthFebitFebit

Han LeeHan LeeJack LeonardJack Leonard

Robert SteenRobert SteenScottie AdamsScottie AdamsSushmita SinghSushmita Singh

Others:Others:Christine Brennan Christine Brennan Constance EspositoConstance EspositoElizabeth KettererElizabeth Ketterer

Natalie OliveiraNatalie Oliveira

NimbleGenNimbleGen

Elizabeth KettererElizabeth KettererKaren StaehlingKaren StaehlingKendra YoungKendra Young

Daniel BurgessDaniel BurgessLance BrownLance BrownMichael FrawleyMichael FrawleyXinmin ZhangXinmin Zhang

comparison of commercially available target enrichment … · 2015. 9. 6. · comparison of...

Documents