sept2016 sv pb_honey
TRANSCRIPT
GIAB SV Data Jamboree @ NIST
PBHoney Spots UpdateWill Salerno9.15.2016
PBSuitehttp://sourceforge.net/projects/pb-jelly/
Honey Spots
● Honey Spots is the “indel” caller for Long-Read SV detection○ Tails is “split read”
● Designed for smaller SVs ○ 50 bp to 2 Kbp
● Two components○ SpotCaller: Discover
putative SVs○ ConsensusCaller: Evaluate
SVs
Align Reads & Create Error
Channels
Process Signal & Create “Spots”
Identify Alt Reads & Create
Consensus Sequence
Remap Consensus & Call SV
Honey Spots Update
● Performance Optimizations○ NA12878 (41x) 2 days to 6
hours● Less restrictive filtering
○ More sensitive calling● “Tails” can contribute to Spots
signal
Align Reads & Create Error
Channels
Process Signal & Create “Spots”
Identify Alt Reads & Create
Consensus Sequence
Remap Consensus & Report All Spots
Tails Calls <10 Kbp
Existing Data SetsAJ Proband AJ Mother AJ Father NA12878 HS1011
Coverage 45x 19x 21x 41x 23x
● Eight short-read SV detection methods● PBHoney (old version)● 10x PacBio, 48x Short-Read, BioNano, aCGH
Honey Spots Performance
SampleSVTyp
e SizeDist CountTruthSet
CallsTruthSet
RecoveredRecovery
Rate
HS1011
INS
(50, 100) 6,459 113 74 65.49%(101, 500) 6,277 2,850 2,383 83.61%
(501, 1000) 673 305 241 79.02%(1000, 2000) 103 259 192 74.13%
DEL
(50, 100) 5,405 25 19 76.00%(101, 500) 4,067 3,159 2,582 81.73%
(501, 1000) 600 226 170 75.22%(1000, 2000) 536 46 15 32.61%
NA12878
INS
(50, 100) 8,833 . . .(101, 500) 8,460 . . .
(501, 1000) 676 . . .(1000, 2000) 66 . . .
DEL
(50, 100) 5,010 2 2 100.00%(101, 500) 3,930 1,484 1,446 97.44%
(501, 1000) 509 201 182 90.55%(1000, 2000) 466 197 185 93.91%
AJ Trio Deletions: Trio Discovery
Remove loci with any sample represented
more than once
Do discovery in Trio
Filter Proband to
altZMWs >= 10
Merge Trio With
50bp Bookends Distance
Force Call Missing in Parents
Discovery
Filter Proband altZMWs
>=10 50bp Merge
Single Sample
Filter
Present in Proband
and Parent(s)
Missing in Parents
Discovery but Forced
Total Proband
with Parent Support
Proband 10,753 8,137 7,785 7,305 6,175 886 7,061
Father 7,994 . 7,727 7,300 4,784 663 5,447
Mother 7,448 . 7,217 6,813 4,636 651 5,287
Total 26,195 23,579 11,896 11,369 6,175 886 7,061
Honey Force Calling
Candidate Regions
Identify Matching Spots Reads Near
Region
Output Evidence Identify Matching Tails Reads Near
Region
Identify ‘Reference’ Supporting Reads Spanning Region
● A Candidate Region is an SV’s location, type, size.● Reads are fetched within Region ±BUFFER.● Matching Reads are those having variant of the same type within ±SIZE and ±DISTANCE.● Reference supporting Reads span Region and show no variant evidence.● Looking for a minimum of one read.
AJ Trio Dels: Proband Discovery, Parent Force Calling
Do discovery in Proband
Filter Proband to
altZMWs >= 10Force in Parents
Discovery
Filter Proband altZMWs
>= 10Forced in
FatherForced in
MotherForced in Parent(s)
Proband 10,753 8,137 6,268 6,206 7,565
AJ Trio Insertions: Trio Discovery
Discovery
Filter Proband altZMWs
>=10 50bp Merge
Single Sample
Filter
Present in Proband
and Parent(s)
Missing in Parents
Discovery but Forced
Total Proband
with Parent Support
Proband 24,585 13,134 12,324 11,317 7,266 2,986 10,252
Father 11,758 . 11,236 10,303 5,632 2,322 7,954
Mother 10,633 . 10,146 9,253 5,344 2,308 7,652
Total 26,195 35,525 20,227 19,051 7,266 2,986 10,252
Remove loci with any sample represented
more than once
Do discovery in Trio
Filter Proband to
altZMWs >= 10
Merge Trio With
50bp Bookends Distance
Force Call Missing in Parents
AJ Trio Ins: Proband Discovery, Parents Force Calling
Discovery
Filter Proband altZMWs
>= 10Forced in
FatherForced in
MotherForced in Parent(s)
Proband 24,585 13,134 10,245 10,139 11,839
Do discovery in Proband
Filter Proband to
altZMWs >= 10Force in Parents
Next-Gen Sequencing Informatics Group @ HGSC● Bioinformatics Core for the Human Genome Sequencing Center● Primary and Secondary Analysis for Production Pipelines
○ Illumina Fleet (X Ten, 2000/2500), PacBio (RS II and Sequel)○ Research and CAP/CLIA○ WGS, WES, Custom Capture, Clinical Panels
● Structural Variation● Annotation● Hadoop Data Warehouse● EMR/EHR Integration● 11 Members and Growing!
CHARGE