vaast: deciphering genetic disease with next-generation sequencing
TRANSCRIPT
VAASTDeciphering Genetic Disease with Next-Generation Sequencing
Barry Moore, M.S.Research ScientistDepartment of Human GeneticsDepartment of Biomedical Informatics
Outline
The VAAST Analysis Pipeline
Ogden Syndrome: Application of VAAST to a Genetic Disease of Unknown Cause
The Future of VAAST Development
$10,000,000Venter Genome
$1,000,000Watson
$5,000You?
geneA geneB geneX geneY geneZ
Disease
Healthy
Next Generation Sequencing
Variant Annotation
Variant Selection
Variant Analysis
Variant
Annotation
Tool
Variant
Selection
Tool
Variant
Annotation
Analysis
Search
Tool
GVF
VAT(Variant Annotation Tool)
VST(Variant Selection Tool)
Reference Genome
Annotated Variants
Merged Variant Sets
Reference Genes
VAAST Pipeline
Annotated Variants
Annotated Variants
3.5 Million Variants
Fasta GFF3
GVF
CDR
GVF
VAT(Variant Annotation Tool)
VST(Variant Selection Tool)
Reference Genome
Annotated Variants
Merged Variant Sets
Reference Genes
VAAST Pipeline
Annotated Variants
Annotated Variants
3.5 Million Variants
Fasta GFF3
GVF
CDR
Variant Type•sequence_alteration•deletion•insertion•duplication•inversion•substitution•SNV•MNP•complex substitution•translocation
Variant Effect•sequence_variant•gene_variant•five_prime_UTR_variant•three_prime_UTR_variant•exon_variant•splice_region_variant•splice_donor_variant•splice_acceptor_variant•intron_variant•coding_sequence_variant•stop_retained•stop_lost•stop_gained•synonymous_codon•non_synonymous_codon•amino_acid_substitution•frameshift_variant•inframe_variant
GVF
VAT(Variant Annotation Tool)
VST(Variant Selection Tool)
Reference Genome
Annotated Variants
Merged Variant Sets
Reference Genes
VAAST Pipeline
Annotated Variants
Annotated Variants
3.5 Million Variants
Fasta GFF3
GVF
CDR
Variant Type•sequence_alteration•deletion•insertion•duplication•inversion•substitution•SNV•MNP•complex substitution•translocation
Variant Effect•sequence_variant•gene_variant•five_prime_UTR_variant•three_prime_UTR_variant•exon_variant•splice_region_variant•splice_donor_variant•splice_acceptor_variant•intron_variant•coding_sequence_variant•stop_retained•stop_lost•stop_gained•synonymous_codon•non_synonymous_codon•amino_acid_substitution•frameshift_variant•inframe_variant
VAAST
Prioritized Candidate
Genes
Background Genomes
Target Genomes
CDR CDR
VAAST Report
• Probabilistic
• Feature Based
• Both Allele and AAS Frequencies
• Considers Inheritance Model
• Fast
• Standardized Ontology Based Format
• Modular and Flexible in Design
Key Features of VAAST
VAAST Uses Variant Frequencies in a Probabilistic Fashion
Likelihood Ratio Test
Maximum Likelihoodof the Null Model(No Difference)
Maximum Likelihoodof the Alternate Model(There is Difference)
VAAST Uses Variant Frequencies in a Probabilistic Fashion
VAAST Uses Variant Frequencies in a Probabilistic Fashion• VAAST gives us the likelihood of the composite genotype
at GENE X in the target given the background.
• Do allele frequencies differ between Background and Target genomes within a given gene or feature?
• Composite likelihood calculation assumes independence across sites. To control for LD, statistical significance is estimated by permutation test.
• Multiple test correction for number of features (~20,000) is two orders of magnitude better than for the number of variants (~3,500,000).
1 genome target1 genome background
Noise Decreases Dramatically with Increasing Number of Genomes
1 genome target10 genome background
1 genome target250 genome background
1 genome target250 genome background
Trio Data
G:RG:A
G:A
G:R
G:A
G:R
Mom Dad
R:Q
R:Q R:Q
R:*
CHR 16: DHODH
CHR 5: DNAH5
•Ng et al, Nature Genetics 42, 30–35 (2010) doi:10.1038/ng.499•Roach, et al, Science , 328 636, 2101
Alleles Responsible for Miller Syndrome in Utah Kindred
R:*
R:*
Mom Dad
Son Daughter Son Daughter
DNAH5
DHODH
Schematic of VAAST Analysis of Utah Miller Kindred Using a Single Quartet
DOMINANT RECESSIVE
-500
-400
-300
-200
-100
0
100
200156
132
2189 3
Ave
. ra
nk g
en
om
e-w
ide
2 allele copies
4 allele copies
6 allele copies
SIZE OF CASE COHORT
443 genomes in background
Average Rank for 100 Dominant and Recessive Diseases
DOMINANT RECESSIVE
-500
-300
-100
100
300
500
700639
373
61
219 3
Ave
. ra
nk g
enom
e-w
ide 2 of 6 allele copies
4 of 6 allele copies
6 of 6 allele copies
443 genomes in background
Impact of Missing Data
Outline
The VAAST Analysis Pipeline
Ogden Syndrome: Application of VAAST to a Genetic Disease of Unknown Cause
The Future of VAAST Development
An Rare X-linked Mendelian Disorder
• A Utah family coming to the University Hospital for 20+ years
• About half of the male offspring die around 1 year of age
• Aged appearance
• Craniofacial anomalies
• Hypotonia
• Global developmental delays
• Cardiac arrhythmias
Four Affected Boys over Two Generations
I
II
III
• Agilent SureSelect In-Solution X Chromosome Capture
• Covaris S series Sonication (150-200 bp)
• 76 bp single-end reads on one lane each of the Illumina GAIIx
Exome Sequencing
• Sequence alignment with bwa
• Remove duplicate reads with PICARD
• Realign indel regions with GATK
• Variant calling with Samtools, GATK
Variant Calling
VAAST Identifies NAA10 as Candidate Gene
• About 20 min. run time
• 3 candidate genes (NAA10 ranked 2) proband only
• 1 candidate gene (NAA10) with pedigree
Identifying Candidate Genes
Additional Analyses
• Microarray based CNV analysis
• No likely causal variants found
• Sanger sequencing confirmation
• Variant segregates perfectly with disease in 13 family members
• Haplotype sharing (STR genotyping)
• ~11 MB shared between two affected boys
• A second family discovered – same mutation
• IBD relatedness analysis – independent mutational events
N(alpha)-acetyltransferase
• N-alpha-acetylation is one of the most common protein modifications that occurs during protein synthesis.
• NatA (catalytic subunit NAA10 (hARD1)
• Eight exons, Crick strand, highly conserved
• A:G transition causes p.Ser37Pro
Functional Analyses
• Quantitative in vitro N-terminal acetylation assay (RP-HPLC).
• Four peptide substrates previously shown to be acetylated by NatA (NAA10)
• Assays indicate loss-of-function allele.
Functional Analyses
• Probabilistic Disease Gene Finder
• Feature Based not Variant Based
• Both Allele and AAS Frequencies
• Considers Inheritance Model
• As few as two target genomes can be sufficient to identify causative gene.
• Background Genomes are “Reusable”
• Not Limited to Human Analyses
VAAST in Summary
VAAST: Future Directions
• Indel support
• Splice-site
• No-call support
• Pedigree support
• Phylogenetic conservation
AcknowledgementsVAAST Development•Chad Huff•Hao Hu•Lynn Jorde•Barry Moore•Martin Reese•Marc Singleton•Jinchuan Xing•Mark Yandell
Ogden Syndrome•John Carey•Steven Chin•Heidi Deborah Fain•Gholson Lyon•John Optiz•Theodore J. Pysher•Alan Rope•Reid Robison•Sarah T. South
•Chad Huff•Evan Johnson•Barry Moore•Christa Schank•Kai Wang•Jinchuan Xing
Yandell Lab•Michael Campbell•Daniel Ence•Guozhen Fan•Steven Flygare•Hao Hu•Zev Kronenberg•Barry Moore•Marc Singleton•Robert Ross•Mark Yandell
•Thomas Arnesen•Rune Evjenth•Johan R. Lillehaug
•Leslie G. Biesecker•Jennifer J. Johnston•Cathy A. Stevens
•Brian Dalley•Tao Jiang•Jefferey Swensen
•Hakon Hakonarson•Lynn B. Jorde•Mark Yandell
Acknowledgements