aug2014 spiral genetics anchored assembly

Download Aug2014 spiral genetics anchored assembly

Post on 24-Jun-2015

484 views

Category:

Health & Medicine

4 download

Embed Size (px)

DESCRIPTION

Aug2014 spiral genetics anchored assembly

TRANSCRIPT

  • 1. SV Detection viaAnchored AssemblyHow can we best call structural variants?Becky Drees,Jeremy Bruestle, Cheinan Marks

2. SV Detection via Anchored AssemblyBrief Description of Anchored Assembly MethodTesting vs GIAB Variant Set & Validated SV SetsHow Do We Describe SVs from Detected Breakpoints?Please do not distribute without permission.! 3. Input dataAny Specieswith a draft genomeExisting NGS DataNo special library prep~20x per ploidyPlease do not distribute without permission. 4. Step 1: Read CorrectionA* error correction1000 2000 3000 4000 50000K-mer Quality Score Distribution0 200 400 600 800 1000 1200K-mer CountPlease do not distribute without permission.Total K-mer Quality Score! Similar to Euler or Quake Corrects the read withoutusing referenceinformation Reduces error from 1% to0.01% 5. Step 2: Remove Reference MatchesPlease do not distribute without permission.! Remove reads that are anexact match to reference Significantly reduces thecomplexity of the graph Reduces requiredmemory usage (40GB forwhole human genome) 6. Step 3: Read Overlap GraphRead overlapassemblyR7 R8R3 R6 R98 9 8 9Please do not distribute without permission.! Construct a read overlapgraph with the remainingreads Provides more contextthan a kmer-based deBruijn graph7 7 77787R1 R2R3 R5 7. Step 4: AnchoringPlease do not distribute without permission.! Anchor assemblies toreference coordinates Provide breakpointinformation while keepingreference bias lowAnchoring 8. Step 5: Variant ValidationVariant validationT T A G A T A A C APlease do not distribute without permission.! Assemble variant sequencefrom read overlap graph Computes minimal costvariation (similar to Smith-Waterman) Calls variants and QC toremove likely false positivesA A T G A C T T A G . . AG A C T T A G A T AA CC T T A G A T A A CA T TA G A T A A C A T TGG A T A A C A T T GG A C T T A G A T A A C A T T GT A GReferenceAssembledR2R3R4R5R6 9. NA12878 SNP Detection vs GIABPlease do not distribute without permission.Anchored)Assembly)only)13,307)Genome)in)a)Bo8le)only)144,463)!2,596,897)Sensi@vity:))95%)Precision:))99.5%) 10. NA12878 Indel Detection vs GIABPlease do not distribute without permission. 11. NA12878 SV InsertionsChr. MillsPindel50xAA50x AA200x1 2475799172 2576951 n n2 78558069 n n n2 187143096 n2 191002548 n n n3 43972635 n n n3 100737223 n n n3 100868475 n n n3 195823764 n n n5 78035993 n n n7 1528948 n n n7 20898768 22717662 n n n9 97387403 n9 137361862 n12 103954170 n n13 76345722 n n n13 11376093913 114103496 n n15 26060663 n n15 92686723 n17 3924078217 77134774 n18 74794821 n n18 76182038 n n n19 1278240 n n n19 2247173 n n n20 55992535 n n21 39080014 n nX 94894756 n nMills et al. Eichler Lab, U. Washington, Sanger validatedPlease do not distribute without permission. 12. NA12878 SV DeletionsPlease do not distribute without permission. 13. How to describe SVs from breakpoints?#CHROMPOSIDREFALTQUALFILTER11500000bnd_ATT[1:1501108[100PASSINFOFORMATSAMPLEDP=26;NS=1;SVTYPE=BND;MATEID=bnd_B;AID=1234DP:ED:OV26:72:89#CHROMPOSIDREFALTQUALFILTER11501108bnd_BG]1:1500000]G100PASSINFOFORMATSAMPLEDP=26;NS=1;SVTYPE=BND;MATEID=bnd_A;AID=1234DP:ED:OV26:72:89Please do not distribute without permission.As breakend records:As SV events: 14. How to describe SVs from breakpoints?Assembled breakpoints can reveal variation that is hard to categorize Different events can produce similar breakpoints Multiple breakpoints can represent a single rearrangement eventPlease do not distribute without permission.CHR$1$bnd_K$ bnd_L$ bnd_M$ bnd_N$200000$ 190000$ 197000$200231$ 15. How to describe SVs from breakpoints?A single breakpoint can contain multiple sequence changes:! Inserted sequence at deletion breakpoints Deleted or duplicated sequence at insert breakpoints Deleted or duplicated sequence at inversion breakpointsdeleted sequence duplicated sequencePlease do not distribute without permission.CHR$1$1700000$ 1704100$1700100$ 1704250$Inverted(sequence( 16. How to describe SVs from breakpoints?Many assemblies anchor to multiple genome locations Variation in duplicated genome regions Variation in repetitive elements Transposonsanchors to multiple placesPlease do not distribute without permission.CHR$1$Alu$unique anchor 17. Contact More information Trial on own data!becky@spiralgenetics.comniranjan@spiralgenetics.com!info@spiralgenetics.comPlease do not distribute without permission. 18. Questions?Please do not distribute without permission. 19. Anchored Assembly SNP DistributionPlease do not distribute without permission. 20. Anchored Assembly SV DistributionPlease do not distribute without permission.