genome variation part 2 - university of new south wales · 2017. 4. 20. · genome variation...

44
Genome variation – part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for human genomics workshop, UNSW Day 2 – Friday 21 th January 2016

Upload: others

Post on 01-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Genome variation – part 2

Prince of Wales Clinical School

Dr Jason Wong

Introductory bioinformatics for human genomics workshop, UNSWDay 2 – Friday 21th January 2016

Page 2: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Aims of the session

• Introduce ExAC and gnomAD

• Learning about how to use unique features of ExAC.

• Using the ExAC browser interface.

Page 3: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Limitations of 1000 genomes

• 1000 genomes is able address some of the shortcomings of dbSNP.

• BUT - ~2,000 genomes only has power to detect “rare” variants that are present in 1 in 2000 people.

• After dividing into different ethnicities, power as a control dataset becomes limited.

Page 4: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Exome Aggregation Consortium (ExAC)

• More that 1 million exomes/genomes have already sequenced world-wide

• Why not just use all of them?

Lek et al. 2016 Nature 536, 285-291

Page 5: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Exome Aggregation Consortium (ExAC)

Source: www.genome.gov

Page 6: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Largest collection of protein-coding variants

• Over 10 million variants – one every 6 based pairs. Most are unique/novel.

Source: www.genome.gov

Page 7: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Source: www.genome.gov

Lots more diversity

Page 8: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

What are some things that ExAC allows you to do?

• A much better population frequency filter.

• Estimate penetrance of specific variants.

• Great for checking exome mapability.

• Identify genes with constrained evolution (i.e. negative selection).

• Identify copy number variation.

Page 9: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Filtering potentially pathogenic variants

• Severe disease causing variants should not have “high” allele frequencies (>0.1%) across ExAC population.

– Note that 0.1% of ExAC is already at least 61 people.

• Review of 197 pathogenic variants with >1% in ExAC found virtually all to be spurious claims.

Page 10: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

M Lek et al. Nature 536, 285–291 (2016) doi:10.1038/nature19057

Page 11: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Assessing penetrance

• Penetrance refers to the likelihood that a disease variant actually ends up causing disease.

• Example in cystic fibrosis

– CFTR Δ504F is very high penetrance

(ExAC: 0.0%)

– CFTR R117H is low (incomplete) penetrance

(ExAC: 0.15% - 185 alleles)

Page 12: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Prion disease example

Minikel et al. Science Translational Medicine 8:322ra9 (2016)

http://www.cureffi.org/2016/10/19/estimation-of-penetrance/

Page 13: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Evaluating coverage

• ExAC used the same pipeline to analyse 60,706 exomes.

• Regions of poor exome capture/mapability are evident by low coverage in ExAC (Guilin plots).

Page 14: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Functional gene constraint

• Functionally important genes (or loci) should be depleted of lost of function mutations.

Samocha et al 2014 Nat Genet 46:944-950

Page 15: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Constraint score

Generally genes can be categorised into:

(1) Completely tolerate of loss-of-function variation (observed = expected)

(2) Intolerant to two loss-of-function variants (i.e. recessive genes, observed ≈ 0.5 x expected)

(3) Intolerant of single loss-of-function variants (i.e. dominant genes, observed ≈ 0.1 x expected)

Page 16: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Example

• Huntington’s disease is caused by autosomal dominant inheritance of loss of function in HTT

Page 17: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Copy number variation (CNV)

• The human genome is diploid, so there are two copies of most genes.

• High depth exome sequencing allows the use of deviations of sequencing depth across samples to measure CNV.

Potential copy gain

Average sample CNV sample

Ruderfer et al. 2016 Nature Genetics 48, 1107-1111

Page 18: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Example: ASTN2

• ASTN2 deletion associated with autism spectrum disorder (Lionel et al. 2014 Hum Mol Genet,

23:2752)

• ExAC shows deletions in up to 16 “healthy people”

• May reflect relatively large psychological disorders in cohort?

Page 19: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

ExAC browser

http://exac.broadinstitute.org/

Type “APC” for the Adenomatous polyposis coli tumour suppressor gene responsible for Familial Adenomatous Polyposis (FAP)

Page 20: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 21: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Functional gene constraint

Page 22: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Sequencing coverage

Page 23: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Copy number variation

Page 24: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Examining variants

FAP is autosomal dominant (i.e. one mutant allele is sufficient for disease). All mutations have very low VAF. Therefore up to 18 FAP patients in ExAC?

Focus on loss of function (LoF) mutations as they are easier to interpret.

Page 25: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Deletion does not actually affect splicing!

Page 26: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 27: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Mutation occurs in phase with adjacent variant resulting inGGA>TTA!

Page 28: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 29: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 30: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 31: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 32: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 33: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 34: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Accessing ExAC raw data

• All data underlying the browser can be accessed via downloads

FTP has latest and old versions of data ftp://ftp.broadinstitute.org/pub/ExAC_release

Page 35: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Accessing ExAC from UCSC

To load session, user: jasewong session name: bioinf_workshop_SNP_2016

Page 36: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for
Page 37: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

How about whole genomes?

• With increasing numbers of exomes and whole genomes ExAC will eventually be superseded by gnomAD.

Source: https://macarthurlab.org/blog/

Page 38: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Interface is very similar to ExAC

http://gnomad.broadinstitute.org/

Page 39: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Extra coverage information for genome (green line)But no functional constraint and CNV information (yet!).

Page 40: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Even more LoF variants…

Page 41: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Non-coding region

Paste in chr19:45408461-45408628 (promoter of APOE)

Page 42: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Downloading gnomAD files

Page 43: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

But files are so big…

Page 44: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for

Further reading

• ExAC paper (https://www.ncbi.nlm.nih.gov/pubmed/27535533)

• ExAC guide (https://macarthurlab.org/2014/11/18/a-guide-to-the-

exome-aggregation-consortium-exac-data-set/)

• gnomAD guide (https://macarthurlab.org/2017/02/27/the-genome-

aggregation-database-gnomad/)