comparison of commercially available target enrichment … · 2015. 9. 6. · comparison of...
TRANSCRIPT
Comparison of Commercially Available Comparison of Commercially Available Target Enrichment Methods for NextTarget Enrichment Methods for NextTarget Enrichment Methods for Next Target Enrichment Methods for Next
Generation Sequencing with the Illumina Generation Sequencing with the Illumina PlatformPlatform
March 20-23, 2010
Sacramento, CA
Anoja Perera, Scottie Adams, David Bintzler, Kip Bodi, Ken Dewar, Anoja Perera, Scottie Adams, David Bintzler, Kip Bodi, Ken Dewar, Deborah Grove, Jan Kieleczawa, Robert Lyons, Aaron Noll, Sushmita Deborah Grove, Jan Kieleczawa, Robert Lyons, Aaron Noll, Sushmita
Si h R b t St Mi h l Zi iSi h R b t St Mi h l Zi iSingh, Robert Steen, Michael ZianniSingh, Robert Steen, Michael Zianni
Why do capture?Why do capture?Why do capture?Why do capture?U i Ill i HiS 2000Using Illumina HiSeq 2000:
1 run = 10 days = 200 Gb => 1 Human Genome at a cost of ~$10,000
- GWAS
- Exome sequencing
- Candidate gene sequencing
Capture Methods
Stanford.edu
Many different methodsMany different methodsMany different methods…Many different methods…Some key features:1. Multiplex PCR Some key features:
•High specificity and coverage
•200-20 000 primer pairs target 10Kb to200 20,000 primer pairs target 10Kb to 10Mb
•Genomic DNA input: 2 μμgg
•Requires capital equipment
2. Hybridization approaches; in-solution or array based
Agilent SureSelectAgilent SureSelectAgilent SureSelectAgilent SureSelectSome key features:Some key features:Some key features:Some key features:
eArray: free of charge custom eArray: free of charge custom d i t ld i t larray design toolarray design tool
RNA probes: 100 basesRNA probes: 100 basesOne array can capture up to One array can capture up to y p py p p3.3Mb after masking3.3Mb after maskingGenomic DNA input: 3Genomic DNA input: 3μμg or g or lesslessNo capital equipment!No capital equipment!Sample cost: ~$1000Sample cost: ~$1000Automation friendly andAutomation friendly andAutomation friendly and Automation friendly and ScalableScalableIf done manually many handsIf done manually many hands--
tton stepson steps
http://www.chem.agilent.com/Library/brochures/5990-3532en_lo%20CMS.pdf
Febit HybSelectFebit HybSelectFebit HybSelectFebit HybSelectSome key features:Some key features:
“Microfluidic Biochip” contains 8 separate micro “Microfluidic Biochip” contains 8 separate micro channels for independent capturechannels for independent captureRequires capital equipment or use of a serviceRequires capital equipment or use of a serviceRequires capital equipment or use of a service Requires capital equipment or use of a service provider provider 30 min hands30 min hands--on timeon time
Genomic DNA : 1Genomic DNA : 1--5 5 μμggOligo length: 60Oligo length: 60--mermerOne array can capture up to 125Kb afterOne array can capture up to 125Kb afterOne array can capture up to 125Kb after One array can capture up to 125Kb after maskingmaskingSample cost: over $2000Sample cost: over $2000
http://www.febit.com/microarray-sequencing/services/hybselect/
NimbleGen Array CaptureNimbleGen Array CaptureNimbleGen Array CaptureNimbleGen Array CaptureSome key features:Some key features:yy
Requires a hybridization and Requires a hybridization and elution systems or use of a elution systems or use of a yyservice providerservice providerCharged array design (waived if Charged array design (waived if more than 5 arrays)more than 5 arrays)
Genomic DNA input: 3Genomic DNA input: 3μμgg or lessor lessOne array capture 5One array capture 5--30 Mb after 30 Mb after maskingmaskingmaskingmaskingValidated arrays for human Validated arrays for human studiesstudiesSample cost: ~$1000Sample cost: ~$1000Sample cost: ~$1000Sample cost: ~$10002010: In2010: In--solution capture is solution capture is availableavailable
http://www.nimblegen.com/products/seqcap/
2009/10 DSRG study2009/10 DSRG study2009/10 DSRG study2009/10 DSRG study•Coriell DNA: ‘The Human Reference Genetic Material Repository DNA Sample’ (catalog ID: NS12911) http://huref.jcvi.org/( g ) p j g
•Two types of regions selected
(total ~3.5Mb):
1 2Mb continuous region1. 2Mb continuous region
2. 31 individual genes*g* The genes selected ranged widely in regards to size (2kb to 400kb), exon numbers, GC content, number of transcripts and repetitive nature of the sequences All companies were provided with ensemblnature of the sequences. All companies were provided with ensembl gene IDs and genomic locations.
2009/10 DSRG study2009/10 DSRG study2009/10 DSRG study2009/10 DSRG studyDSRG:DSRG:
Select DNA and regions *** Illumina paired-end sample prep kit provided to all participants!
Febit NimbleGenAgilent
Illumina library prep + In-solution capture
Illumina library prep + Array-based capture
Illumina library prep + Array-based capture
DSRG: Sequence samples on Illumina GAII at 2 different centersSequence samples on Illumina GAII at 2 different centers
D t A l i t 2 diff t tData Analysis at 2 different centers
QC and Illumina Run StatisticsQC and Illumina Run StatisticsQC and Illumina Run StatisticsQC and Illumina Run StatisticsSamples were run on the Agilent High SensitivitySamples were run on the Agilent High SensitivitySamples were run on the Agilent High Sensitivity Samples were run on the Agilent High Sensitivity chip to assess the quantity and qualitychip to assess the quantity and qualitySamples were loaded in equal nM Samples were loaded in equal nM concentrations for each technology on two concentrations for each technology on two gygyIllumina pairedIllumina paired--end flowcellsend flowcellsTwo lanes were dedicated for each technology Two lanes were dedicated for each technology and the flowcells were run at different centersand the flowcells were run at different centers
Illumina Primary Analysis
Elizabeth Ketterer, Kendra YoungElizabeth Ketterer, Kendra Young
Data AnalysisData AnalysisData AnalysisData Analysis1. Combined the datasets from both sequencing centers for 1. Combined the datasets from both sequencing centers for
each replicate each replicate 2. Filtered each data set so that sequences have quality 2. Filtered each data set so that sequences have quality
score > 10 for 100% of the basesscore > 10 for 100% of the basesscore 10 for 100% of the basesscore 10 for 100% of the bases3. Mapped reads against the hg19/GRCh37 genome using 3. Mapped reads against the hg19/GRCh37 genome using
bowtie 0.12.0 bowtie 0.12.0 4 N li d th d t t t l i4 N li d th d t t t l i4. Normalized the data sets to equal sizes4. Normalized the data sets to equal sizes5. A series of ‘5. A series of ‘perlperl’ scripts were then used to calculate ’ scripts were then used to calculate
coverage per position in every targeted region, creating coverage per position in every targeted region, creating g p p y g g , gg p p y g g , ga coverage map a coverage map
6. Coverage maps were imported into the R statistical 6. Coverage maps were imported into the R statistical computing environment (2 1 0) to find the sensitivitycomputing environment (2 1 0) to find the sensitivitycomputing environment (2.1.0), to find the sensitivity, computing environment (2.1.0), to find the sensitivity, specificity, and reproducibility for each samplespecificity, and reproducibility for each sample
7. Plots and figures were generated using the "ggplot2" 7. Plots and figures were generated using the "ggplot2" lib d MS E llib d MS E llibrary and MS Excellibrary and MS Excel
Kip Bodi, Aaron Noll
Sensitivity: How much was Sensitivity: How much was captured?captured?
90
100
age
•Converted hg18 genome build co-ordinates to hg19 co-ordinates
60
70
80
east
1Xco
vera
•Determine the overlapping regions between the two different targeted regions
30
40
50
ases
with
atle g
•Reanalyze all the data for only the overlapping regions.
10
20
30
%Ta
rget
edb
3.5Mb
0Agilent: Replicate 1 Agilent: Replicate 2 Febit: Lane 1 Febit: Lane 2 Nimblegen: Replicate
1Nimblegen: Replicate
2
Technology 2.1Mb
Kip Bodi, Aaron Noll
Sensitivity: How much of the 2.1Mb Sensitivity: How much of the 2.1Mb was captured by at least 1 readwas captured by at least 1 read??
90.00%
100.00%
reag
e
60.00%
70.00%
80.00%
east
1X
cove
r
30 00%
40.00%
50.00%
ases
with
at l
e
10.00%
20.00%
30.00%
% T
arge
ted
ba
0.00%Agilent: Replicate1 Agilent: Replicate2 Febit: Lane1 Febit: Lane2 NimbleGen:
Replicate1NimbleGen:Replicate2
Capture method
%
Kip Bodi, Aaron Noll
Specificity: How many of the readsSpecificity: How many of the readsSpecificity: How many of the reads Specificity: How many of the reads mapped to the intended targets?mapped to the intended targets?
100.0%
70.0%
80.0%
90.0%
arge
t
50.0%
60.0%
ces
that
are
on
ta
20.0%
30.0%
40.0%
% S
eque
nc
0.0%
10.0%
Agilent: Replicate1 Agilent: Replicate2 Febit: Lane1 Febit: Lane2 NimbleGen: Replicate1 NimbleGen: Replicate2
Kip Bodi
Capture method
% Coverage vs Read Depth% Coverage vs Read Depth% Coverage vs. Read Depth% Coverage vs. Read Depth100 00%
80.00%
90.00%
100.00%
60.00%
70.00%
rage Agilent
30 00%
40.00%
50.00%
% C
over Febit
NimbleGen
10.00%
20.00%
30.00%
0.00%1X 5X 10X 20X 30X 40X 50X 60X 70X 80X 90X 100X
Read Depth
Kip Bodi, Aaron Noll
Coverage of Overlapping GenesCoverage of Overlapping GenesCoverage of Overlapping GenesCoverage of Overlapping Genes
Aaron Noll
Coverage of the Continuous Coverage of the Continuous RegionRegion
Kip Bodi
In SummaryIn SummaryIn Summary…In Summary…D i tiD i ti A il tA il t F bitF bit Ni bl GNi bl GDescriptionDescription AgilentAgilent
InIn--solutionsolutionFebitFebitArrayArray
NimbleGenNimbleGenArrayArray
Cost per sampleCost per sampleSample input requirementSample input requirement
√
√√ √Quality of captured Quality of captured samplesampleReproducibilityReproducibility
√
√ √ReproducibilityReproducibilitySensitivitySensitivitySpecificitySpecificity
√
√
√
√SpecificitySpecificityCoverageCoverageScalabilityScalability
√
√
√ScalabilityScalability √
Future DirectionsFuture DirectionsFuture Directions…Future Directions…
R t A il t t ith bR t A il t t ith bRepeat Agilent capture with probes Repeat Agilent capture with probes designed to the 3.5Mb region of hg19designed to the 3.5Mb region of hg19FebitFebit capture: What went wrong? Repeat?capture: What went wrong? Repeat?Determine reasons for low coverage areasDetermine reasons for low coverage areasDetermine reasons for low coverage areas Determine reasons for low coverage areas that are common to all three technologiesthat are common to all three technologiesPerform SNP analysisPerform SNP analysisPerform SNP analysisPerform SNP analysis-- check how well each technology can check how well each technology can detect SNPs in a given regiondetect SNPs in a given region-- look for false positives and negativeslook for false positives and negativesp gp g
Many Thanks!!!Many Thanks!!!Many Thanks!!!Many Thanks!!!Agilent/Vector BiotechAgilent/Vector Biotech
Al d WAl d W DSRGDSRGIlluminaIlluminaJ th Pi tJ th Pi tAlexander WongAlexander Wong
Fred ErnaniFred ErnaniGarrick Peters Garrick Peters Katie WeaverKatie Weaver
DSRGDSRGDavid BintzlerDavid BintzlerDeborah GroveDeborah GroveJan KieleczawaJan Kieleczawa
Jonathan PinterJonathan Pinter
Data Analysis:Data Analysis:Aaron NollAaron NollKatie Weaver Katie Weaver
Ken OlingerKen OlingerNick MaparaNick Mapara
Ken DewarKen DewarMichael ZianniMichael ZianniRobert LyonsRobert Lyons
Aaron NollAaron NollKip Bodi Kip Bodi
OthOthFebitFebit
Han LeeHan LeeJack LeonardJack Leonard
Robert SteenRobert SteenScottie AdamsScottie AdamsSushmita SinghSushmita Singh
Others:Others:Christine Brennan Christine Brennan Constance EspositoConstance EspositoElizabeth KettererElizabeth Ketterer
Natalie OliveiraNatalie Oliveira
NimbleGenNimbleGen
Elizabeth KettererElizabeth KettererKaren StaehlingKaren StaehlingKendra YoungKendra Young
Daniel BurgessDaniel BurgessLance BrownLance BrownMichael FrawleyMichael FrawleyXinmin ZhangXinmin Zhang