genome variation part 2 - university of new south wales · 2017. 4. 20. · genome variation...
TRANSCRIPT
Genome variation – part 2
Prince of Wales Clinical School
Dr Jason Wong
Introductory bioinformatics for human genomics workshop, UNSWDay 2 – Friday 21th January 2016
Aims of the session
• Introduce ExAC and gnomAD
• Learning about how to use unique features of ExAC.
• Using the ExAC browser interface.
Limitations of 1000 genomes
• 1000 genomes is able address some of the shortcomings of dbSNP.
• BUT - ~2,000 genomes only has power to detect “rare” variants that are present in 1 in 2000 people.
• After dividing into different ethnicities, power as a control dataset becomes limited.
Exome Aggregation Consortium (ExAC)
• More that 1 million exomes/genomes have already sequenced world-wide
• Why not just use all of them?
Lek et al. 2016 Nature 536, 285-291
Exome Aggregation Consortium (ExAC)
Source: www.genome.gov
Largest collection of protein-coding variants
• Over 10 million variants – one every 6 based pairs. Most are unique/novel.
Source: www.genome.gov
Source: www.genome.gov
Lots more diversity
What are some things that ExAC allows you to do?
• A much better population frequency filter.
• Estimate penetrance of specific variants.
• Great for checking exome mapability.
• Identify genes with constrained evolution (i.e. negative selection).
• Identify copy number variation.
Filtering potentially pathogenic variants
• Severe disease causing variants should not have “high” allele frequencies (>0.1%) across ExAC population.
– Note that 0.1% of ExAC is already at least 61 people.
• Review of 197 pathogenic variants with >1% in ExAC found virtually all to be spurious claims.
M Lek et al. Nature 536, 285–291 (2016) doi:10.1038/nature19057
Assessing penetrance
• Penetrance refers to the likelihood that a disease variant actually ends up causing disease.
• Example in cystic fibrosis
– CFTR Δ504F is very high penetrance
(ExAC: 0.0%)
– CFTR R117H is low (incomplete) penetrance
(ExAC: 0.15% - 185 alleles)
Prion disease example
Minikel et al. Science Translational Medicine 8:322ra9 (2016)
http://www.cureffi.org/2016/10/19/estimation-of-penetrance/
Evaluating coverage
• ExAC used the same pipeline to analyse 60,706 exomes.
• Regions of poor exome capture/mapability are evident by low coverage in ExAC (Guilin plots).
Functional gene constraint
• Functionally important genes (or loci) should be depleted of lost of function mutations.
Samocha et al 2014 Nat Genet 46:944-950
Constraint score
Generally genes can be categorised into:
(1) Completely tolerate of loss-of-function variation (observed = expected)
(2) Intolerant to two loss-of-function variants (i.e. recessive genes, observed ≈ 0.5 x expected)
(3) Intolerant of single loss-of-function variants (i.e. dominant genes, observed ≈ 0.1 x expected)
Example
• Huntington’s disease is caused by autosomal dominant inheritance of loss of function in HTT
Copy number variation (CNV)
• The human genome is diploid, so there are two copies of most genes.
• High depth exome sequencing allows the use of deviations of sequencing depth across samples to measure CNV.
Potential copy gain
Average sample CNV sample
Ruderfer et al. 2016 Nature Genetics 48, 1107-1111
Example: ASTN2
• ASTN2 deletion associated with autism spectrum disorder (Lionel et al. 2014 Hum Mol Genet,
23:2752)
• ExAC shows deletions in up to 16 “healthy people”
• May reflect relatively large psychological disorders in cohort?
ExAC browser
http://exac.broadinstitute.org/
Type “APC” for the Adenomatous polyposis coli tumour suppressor gene responsible for Familial Adenomatous Polyposis (FAP)
Functional gene constraint
Sequencing coverage
Copy number variation
Examining variants
FAP is autosomal dominant (i.e. one mutant allele is sufficient for disease). All mutations have very low VAF. Therefore up to 18 FAP patients in ExAC?
Focus on loss of function (LoF) mutations as they are easier to interpret.
Deletion does not actually affect splicing!
Mutation occurs in phase with adjacent variant resulting inGGA>TTA!
Accessing ExAC raw data
• All data underlying the browser can be accessed via downloads
FTP has latest and old versions of data ftp://ftp.broadinstitute.org/pub/ExAC_release
Accessing ExAC from UCSC
To load session, user: jasewong session name: bioinf_workshop_SNP_2016
How about whole genomes?
• With increasing numbers of exomes and whole genomes ExAC will eventually be superseded by gnomAD.
Source: https://macarthurlab.org/blog/
Interface is very similar to ExAC
http://gnomad.broadinstitute.org/
Extra coverage information for genome (green line)But no functional constraint and CNV information (yet!).
Even more LoF variants…
Non-coding region
Paste in chr19:45408461-45408628 (promoter of APOE)
Downloading gnomAD files
But files are so big…
Further reading
• ExAC paper (https://www.ncbi.nlm.nih.gov/pubmed/27535533)
• ExAC guide (https://macarthurlab.org/2014/11/18/a-guide-to-the-
exome-aggregation-consortium-exac-data-set/)
• gnomAD guide (https://macarthurlab.org/2017/02/27/the-genome-
aggregation-database-gnomad/)