Probe Designs Are Very Different
There are four common filaggrin mutations in Scottish/Irishpopulations.Only NimbleGen is able to capture them all reliably.Nextera is misleading. Illumina do not publish their probesets,cannot know if mutations are well covered or not.
MethodologyFour patient samples had individual libraries made for each of thekits run in duplicate across two lanes per kit.The 24 datasets were aligned to the human genome (ensembl r71)with bowtie2 (v2.1.0) and had PCR duplicates removed with Picard(v1.89).Variant calling was performed with GATK (v2.2-8-g99996f2) usingvendor-provided probeset definitions and annotated with VariantEffect Predictor (v2.7).
Which is Best?
Samples were genotyped with Illumina OmniExpress Exome arrayand the results compared to the three WES platforms.15,068 positions common to genotype array and WES.Globally, Agilent and NimbleGen kits performed similarly with theIllumina kit being significantly worse.The Epidermal Differentiation Complex (EDC) region whichincludes 63 genes which are required for the normal developmentof the stratum corneum in skin.Within the EDC, NimbleGen has best probe coverage, best 20xcoverage and lowest disagreement with the genotyping arrayresults (Table 1).
AcknowledgementsWe thank the patients for providing the samples used in this study.
Exome SequencingSequencing libraries are prepared as for a normal genomic DNA
sample, but the DNA fragments are hydridised in solution to probesdesigned to enrich for coding regions in the genome. Non-codingDNA is washed away and captured fragments are eluted ready forsequencing on an Illumina HiSeq2000.
Quality Assessment
Agilent v4 shows bestbase coverage: 30x(largest circle).Agilent v5 shows worstduplicate rate: ~17%.Nextera & Agilent v5poor on-target rate: <50%NimbleGen shows goodon-target rate (~74%) andlow duplicate reads (~4%)
Variant Calling Reproducibility
Technical reproducibility ishigh with Agilent andNimbleGen (~91%), butNextera is worse: ~85%.Sample reproducibility isquite variable: 35-50%Background noise is high.
Sample Clustering
Samples cluster primarilyby kit.Protocol has more impacton results than biology.Clusters are robust asdetermined by bootstrapscores (au > 95%).
Christian Cole1,2, J Ward1,2,3, M Lee2,3, D Ross2,3, N wilson3, FJD Smith2, SJ Brown2, A Irvine4, WHI McLean2, GJ Barton1, and M Febrer3,
1Computational Biology, College of Life Sciences, University of Dundee, UK; 2Centre for Dermatology and Genetic Medicine, College of Life Sciences and College of Medicine, Dentistryand Nursing, University of Dundee, UK; 3Genomic Sequencing Unit, College of Life Sciences, University of Dundee, UK; 4Department of Dermatology, Our Lady’s Children’s Hospital,Dublin, Ireland
NOT ALL EXOMES ARE EQUAL:A COMPARISON OF THREE KITS
Aim
Whole exome sequencing as a protocol is highly dependent on the probedesign of the kit manufacturers. Here we present results from four humanpatient samples run against Illumina’s Nextera, Agilent’s SureSelect v5 andNimblegen’s SeqCap v3 library preparation kits sequenced across four lanesof a HiSeq2000. The data were processed through a variant calling pipelinebased around the Genome Analysis ToolKit (GATK). A comparison is madewith Illumina OmniExpressExome genotyping array data for validation. Thesignificant differences found have a particular relevance to dermatologyrelated studies which are an important focus for DGEM in Dundee, but alsoare more generally applicable to exome sequencing.
p.R501Xc.2285del4p.R2447Xp.S3247X
Filaggrin
Chromosome 1
Illumina Nextera
NimbleGen SeqCap v3
Agilent SureSelect v4
Agilent SureSelect v5
Common Mutations in Atopic Eczema
smpl
1_la
ne1
smpl
1_la
ne2
smpl
2_la
ne1
smpl
2_la
ne2
smpl
3_la
ne1
smpl
3_la
ne2
smpl
4_la
ne1
smpl
4_la
ne2
smpl
3_la
ne1
smpl
3_la
ne2
smpl
4_la
ne1
smpl
1_la
ne1
smpl
1_la
ne2
smpl
2_la
ne1
smpl
2_la
ne2
smpl
3_la
ne1
smpl
3_la
ne2
smpl
4_la
ne1
smpl
4_la
ne2
smpl
1_la
ne1
smpl
1_la
ne2
smpl
2_la
ne1
smpl
2_la
ne2
smpl
3_la
ne1
smpl
3_la
ne2
smpl
4_la
ne1
smpl
4_la
ne2
smpl4_lane2smpl4_lane1smpl3_lane2smpl3_lane1smpl2_lane2smpl2_lane1smpl1_lane2smpl1_lane1smpl4_lane2smpl4_lane1smpl3_lane2smpl3_lane1smpl2_lane2smpl2_lane1smpl1_lane2smpl1_lane1smpl4_lane1smpl3_lane2smpl3_lane1smpl4_lane2smpl4_lane1smpl3_lane2smpl3_lane1smpl2_lane2smpl2_lane1smpl1_lane2smpl1_lane1
20 40 60 80 100% Agreement
AgilentNextera
Nim
blegenv4
v5
Agilentv4v5
Nextera Nimblegen
0 10 20 30 40 50 60
020
4060
8010
0
Duplicate Reads (%)
On−
targ
et R
eads
(%
)
Circle area = base coverage π
●
●
●
●
Agilentv4Agilentv5NexteraNimblegen
100 100 100100 100100100 100100100 100100 100
100 100
5861 97728910081 70
85
97
au
next
era_
smpl
4_la
ne1
next
era_
smpl
4_la
ne2
next
era_
smpl
3_la
ne1
next
era_
smpl
3_la
ne2
next
era_
smpl
1_la
ne1
next
era_
smpl
1_la
ne2
next
era_
smpl
2_la
ne1
next
era_
smpl
2_la
ne2
nim
bleg
en_s
mpl
4_la
ne1
nim
bleg
en_s
mpl
4_la
ne2
nim
bleg
en_s
mpl
3_la
ne1
nim
bleg
en_s
mpl
3_la
ne2
nim
bleg
en_s
mpl
2_la
ne1
nim
bleg
en_s
mpl
2_la
ne2
nim
bleg
en_s
mpl
1_la
ne1
nim
bleg
en_s
mpl
1_la
ne2
agile
ntv4
_sm
pl4_
lane
1ag
ilent
_sm
pl4_
lane
1ag
ilent
_sm
pl4_
lane
2ag
ilent
v4_s
mpl
3_la
ne1
agile
ntv4
_sm
pl3_
lane
2ag
ilent
_sm
pl3_
lane
1ag
ilent
_sm
pl3_
lane
2ag
ilent
_sm
pl2_
lane
1ag
ilent
_sm
pl2_
lane
2ag
ilent
_sm
pl1_
lane
1ag
ilent
_sm
pl1_
lane
2
Bootstrap: 1000Distance: euclidean Clustering: complete
p = 0.048 p = 0.033
0
3000
6000
9000
Agilent Illumina NimblegenKit
Cou
nt (+
/− S
E)
No. Agreeing Variantsp = 0.0012 p = 6.7x10 -4
p = 0.028
0
250
500
750
Agilent Illumina NimblegenKit
Cou
nt (+
/− S
E)
No. Disagreeing Zygocity
WES Kit EDC Coverage
WES Variants
20x Coverage
ArrayGenotypes
WES Disagreement
Agilent 37% 376 82% 105 1.9%
Illumina - 1011 46% 191 5.2%
NimbleGen 69% 669 92% 138 1.4%