functionally annotate genomic variants

16
The Queensland Brain Institute | Functionally annotate variants The answer is not always 42 ! 5/30/22 [by Swamibu]

Upload: denis-bauer

Post on 10-May-2015

2.562 views

Category:

Technology


1 download

DESCRIPTION

This seminar aims at answering the question of what to make of the identified variants, specifically how to evaluate the quality, prioritize and functionally annotate the variants.

TRANSCRIPT

Page 1: Functionally annotate genomic variants

The Queensland Brain Institute |

Functionally annotate variantsThe answer is not always 42 !

April 11, 2023

[by Swamibu]

Page 2: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Quick recap: DNA sequence read mapping

• Alignment -> Improving -> Variant calling -> Filtering

• Resulting file type: vcf• “What are the differences to the reference

genome?”

by Darwin Bell

Searching the haystack 3.5 million SNPs

Page 3: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Finding the causal variant in ideal situations*

• Spot the variant that is common amongst all affected but absent in all unaffected

• This variant is in a gene with known function and causes the protein to be disrupted

* e.g. some rare autosomal disease

Page 4: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

In reality

• You can’t spot the difference– You deal with ~3.5 million SNPs– You need to employ methods that systematically identify

variants that stand out: GWAS–

• GWAS taught us that it is unlikely to find a causal common variant for complex diseases– Rare Variant ?– A bunch of rare and common variants ?– An even more complex model ?

1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092

Page 5: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Product Time

fastq 5 days

bam, vcf,… 3 weeks

paper >6 months

Per one-flowcell project

Production Informatics and Bioinformatics

Map to genome and generate raw genomic features (e.g. SNPs)

Analyze the data; Uncover the biological meaning

Produce raw sequence readsBasic ProductionInformatics

Advanced Production Inform.

BioinformaticsResearch

Statistical genetics

Page 6: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Discount erroneous SNPs ?

• Maybe most of my SNPs are not real and by excluding them I can find the causal variant?

• Biological verification– Re-sequencing with a *different* method (e.g. Sanger)

• “Yes the individual has a variant at location X”

– But you can’t do that for > 3 Million SNPs

• Bioinformatics verification– All quality measures are just proxies because we do not

know which variants are real

Page 7: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Quality control for variants

• Transition (A->G; C->T) to Transversion (purine<->pyrimidine) rate

• Concordance with known variants: dbSNP, HapMap, 1000genomes

• Mendelian Errors

“of de novo germline base substitution mutations to be aprox. 10(-8) per base pair per generation”

1000 genomes Project illumina

Page 8: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Just look at exons ?

• We know that there is a reduction of genetic variation in the neighborhood of genes, due to selection at linked sites (1000 genomes project).

• We could focus on them to get started– Variant in a protein coding region likely to be functional– We are more likely to find the meaning of a variant in a

protein coding region

1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092

Page 9: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Influence of a variant in protein coding region

• Nonsynonomous SNPs– Introduce stop codon– Disrupt structure

• Disrupt domain

• Indels – Cause frame shift

• Synonomous SNPs– Alter translation efficiency

• But, on average, each “normal” person is found to carry– 250 to 300 loss-of-function variants in annotated genes– 50 to 100 variants previously implicated in inherited

disorders.1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092

Page 10: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Intergenic variants are also important

• Disrupt regulatory elements– Transcription factor binding sites– Splicer– ncRNA transcripts– mRNA editing

• Causing changes in the expression of proteins that have a downstream effect on their regulatory targets

Exons Gene Blue

PromoterEnhancer Silencer ncRNAExons Gene Green

Splicing

Page 11: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Catching a villain does not bring down the mob

• Autosomal translocation disrupting the function of the DISC gene is causing SZ in a family

• However, this is a rare event and can not explain heritability of SZ in the larger population.

Millar JK, Wilson-Annan JC, Anderson S, Christie S, Taylor MS, Semple CA, Devon RS, Clair DM, Muir WJ, Blackwood DH, Porteous DJ (May 2000). "Disruption of two novel genes by a translocation co-segregating with schizophrenia". Hum. Mol. Genet. 9 (9): 1415–23. doi:10.1093/hmg/9.9.1415. PMID 10814723.

chr1 chr11

Disc

Page 12: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Isolating SNPs that collectively explain liability

• Different populations may have their own “version” of a change that has the same downstream effects.– Unlikely a “one-variant one-phenotype”-case for many

diseases

• Prioritize variants or sets of variants to focus analysis on– Variants likely to be functional– Involved in the same pathway

• Model disease liability on this “subset” -> Statistical genetics: find variants with rel. large effect sizes that are able to explain a proportion of disease heritability in the population.1000 Genomes Project Consortium. A map of human genome variation from population-scale

sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092

Page 13: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Functional variants

• SIFT– Assigns a pre-computed score that says how likely this

substitution is tolerated given the sequence of homologous proteins.

• PolyPhen– Machine learning method predicting the impact of a

sequence on the protein’s structure.

• ANNOVAR– Annotate SNPs if they overlap functional elements, e.g.

domains, transcription factor binding site, splice variant,…

Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics challenges for personalized medicine. Bioinformatics. 2011 Jul 1;27(13):1741-8. PMID: 21596790

Page 14: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Custom filer approach with Excel

• Filter annotated variants with your requirements using excel to quickly identify a manageable list of “interesting” variants

• Approach taken by the Daimantina (Paul Leo)

exonic

Carried by 90% of affected

Carried by 10% of un-affected

Loss of function

Page 15: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Three things to remember

1. A “one-variant one-phenotype” model is rather unlikely

2. Variants in non-protein-coding regions are also important

3. New methods (bioinf and statistical genetics) need to be developed to address this problem

Addressed in upcoming discussion session run by Dr. Jake Gratten

Page 16: Functionally annotate genomic variants

The Queensland Brain Institute | April 11, 2023

Next week:

Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.