guryev ngs diag sv guryev -...
TRANSCRIPT
Victor Guryev European Research Institute for the Biology of Ageing
September 29, 2014
Genomic resequencing in Medical diagnostics course Erasmus MC, Rotterdam
/a
/g
Low coverage whole genome and deep exome sequencing of 2,500 individuals to discover 95% of variants at 1% frequency
500 bp
90-100 bp Median base coverage: 12x
[Francioli et al, 2014] [Boomsma et al, 2013]
1) Twice as many bases per slide ! 2) Structural information !!!
G
A
A
T
CONTIG 1 CONTIG 2
Molecular haplotyping (phasing)
Structural variants
Genome assembly
SINE SINE
Better repeat coverage
Profiling of transcript isoforms
+ Fosmid libraries: 40kb tags
Protocol for Illumina [Williams et al 2012]
[van Heesch et al, 2013]
Average coverage: 5 WGS /site
5 WGS/site 5 WGS/s 10 WGS/site
Per individual genome (compared to reference genome) 3.7M SNPs 360k short indels (1-20bp) 5.2k medium deletions ( 20 – 100 bp) 3.3k large deletions ( 100+ bp)
Q: Medium coverage de novo variants validation? A: Possible, but a lot of validation will be required because of FP due to under-covered positions in parents: 917 candidates -> 284 de novo indels;
601 candidates -> 41 de novo SVs.
AluYa5
Repeat content in human genome
L1
Simple repeats
AluYb8
SVA
Mechanism: polymerase errors Prevalence: up to 10% of all indels are non-simple
Tool example: GATK Haplotype Caller
Father
Mother
Child
Mechanism: gene conversion Prevalence: currently only several cases
Tool example: assembly, discordant pairs
Tool example: Mobster [Thung et al, 2014. Genome Biology 15:488]
Mechanism: (retro)transposition
Prevalence: very high (>13,000 MEIs in GoNL)
210 Chr15: 40.85Mb Chr7:26.24Mb 1 534
Chr15: 40.85Mb Chr7: 26.24 Mb
to chr7
to chr15
------------------------------deletion------------------------
Tool example: Discordant pairs (123SV) Mechanism: (retro)transposition Prevalence: GoNL about 40 cases
New ZF gene
ZF92
ZF98
ZF492
ZF730 ZF675 ZF724
ZF681
ZF430
ZF100
ZF431
ZF257
ZF273 ZF429 ZF734
ZF728
ZF726
ZF708
ZF85
ZF93
ZF253 ZF737
ZF724
Including: FGRF1, EVC, FKBP10, PLOD2
2 genes New ZF expressed =>
13 genes
GO-term analysis: anomality of the limb diaphyses
and bowing of the long bones
37kb, AF=28%
WGS Father
WGS Child
WGS Mother
WES Father
WES Mother
WES Child
• Same methodologies are applicable for WES • RD analysis: need additional correction to account for variation in enrichment • Very limited sensitivity if SV breakpoint is outside of enriched area
Tool examplea: GATK HaplotypeCaller, CONIFER, ExomeCNV
Father, WGS
Mother, WGS
Child, WGS
Father, WES
Mother, WES
Child, WES
Gene annotation
Father, WGS
Mother, WGS
Child, WGS
Father, WES
Mother, WES
Child, WES
Gene annotation
1. Next to the ‘SNP world’, there is a whole new world of structural alterations in our genomes.
2. Study design: # of libraries, type (PE, MP), insert sizes, read length 3. Combine methods for SV discovery: read depth, read-pair, split-reads, de novo assembly 4. Do verifications ( PCR/Sanger-Seq, aCGH, Enrichment -> NGS)
GoNL SV Team Victor Guryev UMCG Wigard Kloosterman UMCU Laurent C. Francioli UMCU Jayne Y. Hehir-Kwa UMCN Tobias Marschall CWI/MPI Alexander Schoenhuth CWI Matthijs Moed LUMC Eric-Wubbo Lameijer LUMC Abdel Abdellaoui VU Slavik Koval EMC/LUMC Joep de Ligt UMCN Najaf Amin EMC Freerk van Dijk UMCG Lennart Karssen EM/Polyomica Leon Mei LUMC Kai Ye LUMC/WASHU
University of Washington Fereydoun Hormozdiari Evan E. Eichler
GoNL steering committee Paul de Bakker UMCU Dorret Boomsma VU Cornelia van Duin EMC Gert-Jan van Ommen LUMC Eline Slagboom LUMC Morris Swertz UMCG Cisca Wimenga UMCG
BGI Shenzen Jun Wang
ERIBA, RuG, UMC Groningen Diana Spierings Marianna Bevova Rene Wardenaar Tristan de Jong Peter Lansdorp