genome variation graphs with the vg toolkit · genome variation graphs with the vg toolkit jean...

17
Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Upload: others

Post on 03-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Genome variation graphs with the vg toolkit

Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Page 2: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Variation Graphs

An approach to incorporating information on human diversity into the genomic reference.

2

Page 3: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Sequencing reads map better on variation graphs3

Page 4: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

is a complete, open source solution

for graph construction, read mapping,

and variant calling.

https://github.com/vgteam/vg

4

Page 5: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Garrison et al (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology.

Better read mapping in regions with variants5

Page 6: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Better allele balances at heterozygous sites

Deletion Size Insertion SizeSNP

Alt

Alle

le F

ract

ion

Garrison et al (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology.

6

Page 7: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Structural Variants (SVs) Genomic variants >50bp. E.g. insertions, deletions, inversions.

7

Page 8: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Graph-based SV genotypers: vg, Paragraph, BayesTyper

Traditional SV genotypers: Delly, SVTyper

Genotyping SVs from short-read data

High-quality SV catalogs from long-read sequencing studies(HGSVC 2019, GIAB 2019, SVPOP 2019).

Hickey et al (2019). Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv.

8

Page 9: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Graph from de novo assemblies

Experiment with 12 yeast strains.

- better read mapping.- SV better supported by reads.

Hickey et al (2019). Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv.

de no

vo

asse

mblies

VCF

9

Page 10: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Variant normalization

constructgraph

extract

sequences

realign

Robin Rounthwaite

10

Page 11: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Transcriptome analysis with variation graphs

Jonas Sibbesen

https://github.com/jonassibbesen/rpvg

11

Page 12: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

New graph formats optimized for memory and speed

VG: Legacy

PG (PackedGraph): Small

HG (HashGraph): Fast

OG (Optimized Dynamic Graph Index): Balanced

XG: Small and fast, but static

Jordan Eizenga, Emily Fujimoto, Erik Garrison

https://github.com/vgteam/libbdsg

12

Page 13: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

13

Graph Burrows-Wheeler Transform

Stores 2,504 haplotypes from 1K Genomes Project in 14.6 GiB

Can scale past tens of thousands of haplotypes

Sirén et al. (2018). Haplotype-aware graph indexes. In 18th International Workshop on Algorithms in Bioinformatics (WABI)

https://github.com/jltsiren/gbwtgraph

Page 14: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Faster short read mapping with giraffe

Xian Chang, Jouni Siren, Adam Novak

Assuming that most indels are in the graph and a large haplotype database.

- Minimizer-based indexing.- Restrict to known haplotypes.- Gapless extension.- Faster clustering using a snarl tree.

14

Page 15: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

https://github.com/vgteam/sequenceTubeMap

Visualization

Beyer et al. (2019). Sequence tube maps: making graph genomes intuitive to commuters. Bioinformatics (Oxford, England).

15

Page 16: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Pipelines in Toil and WDL

https://github.com/vgteam/toil-vg

https://github.com/vgteam/vg_wdl

toil-vg WDL

Graph construction x

Read mapping x x

Variant calling( SNV/indels & SVs) x x

Charles Markello, Glenn Hickey, Mike Lin, Eric T. Dawson

16

Page 17: Genome variation graphs with the vg toolkit · Genome variation graphs with the vg toolkit Jean Monlong Updates from the GRC & GIAB Oct 15, 2019

Acknowledgements

Benedict PatenAdam NovakErik GarrisonJordan EizengaGlenn HickeyJouni SirenDavid Heller

Check out Charles Markello’ Platform Talk (PgmNr 15) on applying vg for rare variant discovery!

https://github.com/vgteam/vg

17

Jonas SibbesenXian ChangCharles MarkelloYohei RosenRobin RounthwaiteSusanna MorinEmily Fujimoto

Eric DawsonMike LinWolfgang Beyer

Richard DurbinDaniel Zerbino