potential of human genome sequencing · potential of human genome sequencing paul pharoah reader in...
TRANSCRIPT
![Page 1: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/1.jpg)
Potential of human genome sequencing
Paul Pharoah
Reader in Cancer Epidemiology
University of Cambridge
![Page 2: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/2.jpg)
Exposure Outcome
Strength of association
Genetic model
Prevalence
Measurement
Key considerations
Quantitative trait
Binary
![Page 3: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/3.jpg)
Timeline
1866 1990 2000 20101980
Understanding of (inherited) genetic basis of complex disease
Understanding of architecture of human genome
Developments in technology
![Page 4: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/4.jpg)
What is meant by the genetic model?
• Allele frequency– Common >5%– Uncommon 1-5%– Rare 0.1 – 1%– Very rare <0.1%
• Mode of inheritance– Dominant/recessive– Co-dominant
• Effect size
![Page 5: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/5.jpg)
What is likely genetic model?
• Common alleles conferring relative risk >2 not been identified for any complex phenotype
• Uncommon and rare alleles conferring modest risks (RR = 2-4) have been identified
![Page 6: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/6.jpg)
Finding common variation for complex phenotypes
• Association study
• Compare allele frequencies in subjects with (cases) and without (controls) phenotype of interest
• Statistical methods generally straightforward
![Page 7: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/7.jpg)
Candidate gene studies: results
• Many putative associations published
• None replicated robustly
• Inappropriate reliance on P<0.05 as significant
![Page 8: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/8.jpg)
An aside on P-values
• The P-value is the probability of data IF null hypothesis is true
• It is NOT the probability that the null hypothesis is true (true negative probability)
• TNN depends on prior and power
• If prior and power are small, even a very small P is more likely to represent a false positive
• Number of tests irrelevant
![Page 9: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/9.jpg)
Prior• The probability of association for locus under study
• Given that a locus detectable by association must have a detectable effect size– explain >0.1% variance
• Thus 1,000 loci at most
• 10 million common variants
• Prior 1:10,000 at bestLow signal to noise ratio
• May improve this by 10x with good candidate selection
![Page 10: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/10.jpg)
Genome-wide association studies
• Possible to select a set of SNPs that tag all the common variation in genome
• Genotyping technology enable this to be done on large sample sizes
• Still too expensive to genotype everything
• Phased study design
![Page 11: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/11.jpg)
Data analysis
• Simple test of association SNP by SNP
• Genome-wide significance– Based on the principle of adjusting for multiple testing
– 10M SNPs are correlated, so not 10M independent tests
– 5 x 10-8 is approx equal to 0.05 corrected
• However, small biases can have disproportionate effects at the extremes of the distribution of test statistics
• Large samples sizes are needed
![Page 12: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/12.jpg)
Power to detect association
N = 4000
Prevalence of radiosensitivity phenotype = 10 percentP < 5 x 10-8
N = 8000
![Page 13: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/13.jpg)
Phased GWAS designs
Genome-wide array 200k – 1M 2000 cases and 2000 controls
Test for association
Test for association
Custom array 4000 cases and 4000 controls
P<0.05
Top “hits”10,000 cases and 10,000 controls
P<0.00001
P>0.05
P>0.00001
SNPs with P<10-8 declared significant
![Page 14: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/14.jpg)
![Page 15: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/15.jpg)
GWAS results
• Over 1,400 loci for 250 complex phenotypes
• Relative risks modest
• Proportion of genetic component of phenotypic variance explained <20%
• “Missing heritability”– Range of underlying genetic models possible
• Large scale GWAS for radiosensitivity phenotypes just being established– Radiogenomics Consortium
![Page 16: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/16.jpg)
Searching for uncommon and rare variants
• Understanding of human genomic architecture continues to increase
• 1000 Genomes Project– Resequencing different populations
• Many studies now sequencing exomes of subjects with selected phenotypes
• ~40 million different variants detected by 1000GP
![Page 17: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/17.jpg)
Next generation association studies
• High throughput genotyping arrays now being designed to capture rarer variation– Illumina Omni 5M
• >1% variation across the genome
– Illumina and Affymetrix exome arrays• ~270K variants
• Selected from an analysis of 12,000 exomes
• Variant present in at least 3 exomes
• Testing for association as with common variants
![Page 18: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/18.jpg)
Problems
• Population substructure may cause substantial bias that is difficult to correct for using standard methods developed for analysis of common variants
• Small errors in genotype calling may result in bias
• Sample size remains a major problem
![Page 19: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/19.jpg)
Power to detect rare variants
N = 4000
Prevalence of radiosensitivity phenotype = 10 percentP < 5 x 10-8
N = 8000
![Page 20: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/20.jpg)
Resequencing
• Alternative to genotyping
• Whole genome sequencing now ~$5,000
• Exome sequencing $500
• Targeted sequencing 10 genes $50
![Page 21: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/21.jpg)
Problems
• Data management– Very large data files
– Exome 1-5 Gbytes
• Data analysis– Variant calling
– Each genome will include very large number of variants including “private” variants of unknown function
– Methods for statistical analysis not yet developed
• Sample size
![Page 22: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/22.jpg)
Conclusions
• Only fraction of the inter-individual variation in radiation sensitivity explained by well known sensitivity syndromes
• Technological advances have made it possible to genotype/sequence thousands of subjects making large scale genetic epidemiological studies feasible
• Considerable obstacles still to be overcome
![Page 23: Potential of human genome sequencing · Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge. ... Data analysis • Simple test](https://reader030.vdocuments.mx/reader030/viewer/2022040606/5eb54c3726cfaa487f23866e/html5/thumbnails/23.jpg)
Questions?