Genetics for Epidemiologists
Lecture 4: Genetic Association Studies
National Human Genome Research
Institute
National Institutes of
Health
U.S. Department of Health and
Human Services
U.S. Department of Health and Human Services
National Institutes of HealthNational Human Genome Research
InstituteTeri A. Manolio, M.D., Ph.D.
Director, Office of Population Genomics andSenior Advisor to the Director, NHGRI,
for Population Genomics
Topics to be Covered
• Case-control and cohort studies in genomic research
• Candidate gene studies
• Genome-wide association studies
• Randomized/experimental designs
Collins FS, Nature 2004; 429:475-77.
Desirable Characteristics of Large US Cohort Study
• Large sample size• Full representation of US minority groups• Broad range of ages• Broad range of genetic backgrounds and
environmental exposures• Family-based recruitment for at least part of
cohort to control for population stratification• Broad array of clinical and laboratory data,
regular follow up for events, additional exposure assessment
After Collins FS, Nature 2004; 429:475-77.
Desirable Characteristics of Large US Cohort Study
(continued)• Technologically advanced dietary, lifestyle,
and environmental exposure data• Collection and storage of biological
specimens• Sophisticated data management system• Broad access to materials and data• Goals should not be “hypothesis-limited”• Comprehensive community engagement
from the outset• State of the art (?dynamic) consent to allow
multiple uses of data and regular feedbackAfter Collins FS, Nature 2004; 429:475-77.
Larson, G. The Complete Far Side. 2003.
Manolio TA et al. Nature 2006; 7:812-820.
Willett WC et al. Nature 2007; 445:257-258.
Collins FS et al. Nature 2007; 445:259.
Pros and Cons of Case-Control Studies
Advantages• May be the only way to study rare
diseases or those of long latency• Existing records can occasionally be used
if risk factor data collected independent of disease status
• Can study multiple etiologic factors simultaneously
• May be less time-consuming and expensive
• If assumptions met, inferences are reliable
Pros and Cons of Case-Control Studies
Disadvantages• Relies on recall or records for
information on past exposures; validation can be difficult or impossible
• Selection of appropriate comparison group may be difficult
• Multiple biases may give spurious evidence of association between risk factor and disease
• Usually cannot study rare exposures• Temporal relationship between exposure
and disease can be difficult to determine
“But,” they say, “This Is Genetics!”(you dumb epidemiologist)
“This Is Different!”
• Genes are measured the same way in cases and controls
• Information on key exposure is easy to validate• No recall or reporting involved• Temporal relationship between genes and disease
is piece of cake“BUT,” I SAY,• Bias-free ascertainment of cases and controls is
still major concern; cases in most clinical series unlikely to be representative
• Assessment of risk modifiers or gene-environment interactions is likely to be incomplete or flawed
Appreciation of Weaknesses of Case-Control Studies
http://www.mainlesson.com/display.php3?author=treadwell&book=primer&story=chickenlittle
Larson, G. The Complete Far Side. 2003.
Candidate Genes
Genetic Studies in Unrelated Individuals
(pre-2005): Candidate Gene Studies• Goal: characterize candidate genes and
variants related to disease• Not typically intended to “find genes,”
generally begun after disease-related variants identified
• Assess generalizability of family-based observations (genetic heterogeneity)
• Assess importance of allelic variation at population level (PAR, penetrance)
• Identify modification of genetic association by environmental factors (GxE interaction)
Population Studies of Genetic Variants: Angiotensin I-Converting
Enzyme (ACE)
Larsen: Williams Textbook of Endocrinology, 10th ed., 2003
ACE Gene Identification
• cDNA sequence determined for human testicular ACE, identical from residue 27 to C terminus to C-terminal domain of endothelial ACE (Ehlers et al, PNAS 1989)
• Assigned to chromosome 17q23 by in situ hybridization (Mattei et al, Cytogenet Cell Genet 1989)
• Linked to elevated blood pressure in rat model of hypertension (Jacob et al, Cell 1991)
• Mapped to human chromosome 17q22-q24 (Jeunemaitre et al, Nat Genet 1992)
ACE Gene Polymorphisms
• Insertion/deletion polymorphism identified through restriction fragment length polymorphism (RFLP) analysis
• Two alleles results from 250-bp insertion in intron 16; allele frequencies = 0.41 for I allele and 0.59 for D allele
• Accounted for 47% variance in serum ACE in 80 subjects
ACE Genotype
II ID DD
ACE (µg/L) 299 (49) 392 (67) 494 (88)
Ln-ACE (µg/L)
5.7 (0.2) 6.0 (0.2) 6.2 (0.2)
Rigat et al, J Clin Invest 1990; 86:1343-46.
Nature 1992; 359:641-44.
Cambien et al, Nature 1992; 359:641-44.
Frequency of ACE Genotypes in 1,304 MI Cases and Controls
0
10
20
30
40
50
60
DD ID II
Gen
otyp
e Fr
eque
ncy
(%)
Cases Controls
OR = 1.34, p = 0.007
197 200 309 390 104 104
Cambien et al, Nature 1992; 359:641-44.
Frequency of ACE Genotypes in 1,304 MI Cases and Controls
0
10
20
30
40
50
60
70
80
DD ID/II DD DD/ID
Gen
otyp
e Fr
eque
ncy
(%)
Cases Controls
OR = 3.2 [1.7,5.9]
38 46 41143 159 154 372 390
OR = 1.1 [0.9,1.5]Low Risk High Risk
Age-Adjusted Odds on Hypertension by ACE ID/DD
Genotype and Sex
DD ID II P-value
Men: % HTN 53.1 45.8 44.4
Men: OR 1.67 1.19 1.00 0.004
Women: % HTN 43.3 41.8 44.4
Women: OR 1.01 0.80 1.00 0.15
after O’Donnell C et al, Circulation 1998; 97:1766-72.
Number of New, Significant Gene-Disease Associations by Year,
1984 - 2000
Hirschhorn J et al, Genet Med 2002; 4:45-61.
Of 600 Gene-Disease Associations, Only 6 Significant in > 75% of
Identified Studies
Disease/Trait GenePolymorphism
Frequency
DVT F5 Arg506Gln 0.015
Graves’ Disease
CTLA4 Thr17Ala 0.62
Type 1 DM INS 5’ VNTR 0.67
HIV/AIDS CCR5 32 bp Ins/Del 0.05-0.07
Alzheimer’s APOE Epsilon 2/3/4 0.16-0.24
Creutzfeldt-Jakob Disease
PRNP Met129Val 0.37
Hirschhorn J et al, Genet Med 2002; 4:45-61.Hirschhorn J et al, Genet Med 2002; 4:45-61.
Polymorphism Present Absent Summary
ACE I/D13 with D; 1 with I
18 favors none
APOE 8 with ε4, 2 with ε2 9 equivocal
AGT M235T 0 8 none
AGTR1 A1166C 0 7 none
MTHFR 7 with T, 1 with non-T
8equivocal
PON1 Q192R 3 with R 10 none
PON1 L55M 5 with L (subgroups)
1weak
NOS3 G894T 1 with T 4 none
MMP3 -1516 5A/6A
4 with 6A0 associatio
n
IL6 G-174C 1 with G 3 none
Reports For and Against Associations of Variants with
Carotid Atherosclerosis
Manolio et al, ATVB 2004; 24:1567-1577.
Summary Points: Candidate Gene Studies
• Initial enthusiasm markedly damped by failure to replicate
• Can probably find study or story that will fit almost any candidate to any disease/trait
• Understanding of genome structure and function, and of pathophysiologic mechanisms, just too preliminary to project more than a handful of plausible candidates at present
Larson, G. The Complete Far Side. 2003.
200520062007 first quarter2007 second quarter2007 third quarter2007 fourth quarter2008 first quarter
Manolio et al., J Clin Invest 2008; 118:1590-605.
Pennisi E, Science 2007; 318:1842-43.
2007: The Year of GWA Studies
Diseases and Traits with Published GWA Studies (n = 53,
5/9/08)•Macular Degeneration•Exfoliation Glaucoma
•Lung Cancer•Prostate Cancer•Breast Cancer•Colorectal Cancer•Neuroblastoma
•Crohn’s Disease•Celiac Disease•Gallstones• Irritable Bowel
Syndrome
•QT Prolongation •Coronary Disease•Stroke•Hypertension•Atrial Fibrillation/Flutter•Coronary Spasm•Lipids and Lipoproteins
•HIV Viral Setpoint•Childhood Asthma
•Type 1 Diabetes •Type 2 Diabetes•Diabetic
Nephropathy •End-Stage Renal
Disease•Obesity, BMI, Waist,
IR•Height•Osteoporosis
•F-Cell Distribution•Fetal Hgb Levels•C-Reactive Protein•18 groups of
Framingham Traits•Pigmentation•Uric Acid Levels•Recombination Rate
•Parkinson Disease•Amyotrophic Lat.
Sclerosis•Multiple Sclerosis•Prog. Supranuclear
Palsy•MS Interferon-β
Response •Alzheimer’s Disease•Cognitive Ability•Memory•Restless Legs Syndrome •Nicotine Dependence•Methamphetamine
Depend.•Neuroticism•Schizophrenia•Bipolar Disorder•Family Chaos
•Rheumatoid Arthritis•Systemic Lupus
Erythematosus•Psoriasis
NHGRI Catalog of GWA Studies: http://www.genome.gov/gwastudies/
NHGRI Catalog of GWA Studies: http://www.genome.gov/gwastudies/
• First author/Data/Journal/Study• Disease/Trait• Initial Sample Size• Replication Sample Size• Region• Gene• Strongest SNP – Risk Allele• Risk Allele Frequency in Controls• P-value• OR per copy [95% CI]• Platform and SNPs passing QC
NHGRI Catalog of GWA Studies: http://www.genome.gov/gwastudies/
• First author/Data/Journal/Study• Disease/Trait• Initial Sample Size• Replication Sample Size• Region• Gene• Strongest SNP – Risk Allele• Risk Allele Frequency in Controls• P-value• OR per copy [95% CI]• Platform and SNPs passing QC
NHGRI Catalog of GWA Studies: http://www.genome.gov/gwastudies/
What is a Genome-Wide Association Study?
• Method for interrogating all 10 million variable points across human genome
• Variation inherited in groups, or blocks, so not all 10 million points have to be tested
• Blocks are shorter (so need to test more points) the less closely people are related
• Technology now allows studies in unrelated persons, assuming 5,000 – 10,000 base pair lengths in common (300,000 – 1,000,000 markers)
Association of Alleles and Genotypes of rs1333049 (‘3049) with Myocardial
Infarction C
N (%)G
N (%)2
(1df)P-value
Cases2,132 (55.4)
1,716 (44.6)
55.11.2 x 10-
13Controls
2,783 (47.4)
3,089 (52.6)
Allelic Odds Ratio = 1.38CC
N (%)CG
N (%)GG
N (%)2
(2df) P-value
Cases586
(30.5) 960 (49.9)
378 (19.6)59.7
1.1 x 10-
14Controls
676 (23.0)
1,431 (48.7)
829 (28.2)
Heterozygote Odds Ratio = 1.47
Homozygote Odds Ratio = 1.90
Samani N et al, N Engl J Med 2007; 357:443-453.
Bierut LJ et al, Hum Molec Genet 2007; 16:24-35.
Nicotine Dependence among Smokers
Klein et al, Science 2005; 308:385-389.
P Values of GWA Scan for Age-Related Macular Degeneration
Arking D et al, Nat Genet 2006; 38:644-651.
Genome-Wide Scan for QTc Interval
http://www.broad.mit.edu/diabetes/scandinavs/type2.html
Genome-Wide Scan for Type 2 Diabetes in a Scandinavian Cohort
WTCCC, Nature 2007; 447:661-678.
Wellcome Trust Genome-Wide Association Study of Seven Common
Diseases
Hunter DJ and Kraft P, N Engl J Med 2007; 357:436-439.
“There have been few, if any, similar bursts of discovery in the history of medical research…”
Lessons Learned from Initial GWA Studies
Signals in Previously Unsuspected GenesMacular Degeneration CFHCoronary Disease CDKN2A/2BChildhood Asthma ORMDL3Type II Diabetes CDKAL1QT interval prolongation NOS1AP
Signals in Gene “Deserts”Prostate Cancer 8q24Crohn’s Disease 5p13.1, 1q31.2, 10p21
Signals in CommonDiabetes, CHD, Melanoma, Frailty CDKN2A/2BProstate, Breast, Colorectal Cancer 8q24 regionCrohn’s Disease, Psoriasis IL23RCrohn’s Disease, T1DM PTPN2Rheumatoid Arthritis, T1DM PTPN22
Unique Aspects of GWA Studies
• Permit examination of inherited genetic variability at unprecedented level of resolution
• Permit "agnostic" genome-wide evaluation • Once genome measured, can be related to any trait• Most robust associations in GWA studies have not
been with genes previously suspected of association with the disease
• Some associations in regions not even known to harbor genes
“The chief strength of the new approach also contains its chief problem: with more than 500,000 comparisons per study, the potential for false positive results is unprecedented.”
Hunter DJ and Kraft P, N Engl J Med 2007; 357:436-439.
Larson, G. The Complete Far Side. 2003.
Ways of Dealing with Multiple Testing
• Bonferroni correction: most common, typically p < 10-7 or 10-8
• False discovery rate: proportion of significant associations that are actually false positives
• False positive report probability: probability that the null hypothesis is true, given a statistically significant finding
• Replication, replication, replication
Chanock S, Manolio T, et al, Nature 2007; 447:655-660.
Replication Study #1 3,000 cases / 3,000 controls
Replication Study #22,400 cases / 2,400 controls
Replication Study #3 2,500 cases / 2,500 controls
Initial Study1,150 cases / 1,150 controls
~24,000 SNPs
~1,500 SNPs
200+ New ht-SNPs
>500,000 Tag SNPs
25-50 Loci
Replication Strategy for Prostate Cancer Study in CGEMS
Hoover R, Epidemiology 2007; 18:13-17.
Replication, Replication, ReplicationInitial study: Sufficient description to permit replication
• Sources of cases and controls• Participation rates and flow chart of selection• Methods for assessing affected status• Standard “Table 1” including rates of missing data• Assessment of population heterogeneity• Genotyping methods and QC metricsReplication study: • Similar population, similar phenotype• Same genetic model, same SNP, same direction• Adequately powered to detect postulated effect
Chanock S, Manolio T, et al, Nature 2007; 447:655-660.
Replication Strategy in Easton Breast Cancer Study
• ABCFS• BCST• COPS • GENICA• HBCS• HBCP
Stage Cases Controls SNPs
1 408 400 266,722
2 3,990 3,916 13,023
3 23,734 23,639 31
Final 6
• MEC-W• MEC-J• NHS• PBCS• RBCS• SASBAC
Easton et al, Nature 2007; 447:1087-93.
• SEARCH2
• SEARCH3
• SBCP• SBCS• CNIOBCS• USRT
• TBCS• KConFab/
AOCS• KBCP• LUMCBCS• MCBCS• MCCS
Larson, G. The Complete Far Side. 2003.
Replication Strategy in CGEMS Prostate Cancer Study
Stage Cases Controls SNPs
1 1,172 1,157 527,869
2 3,941 3,964 26,958*
* Selected for p < 0.068
SNP GeneStage 1+2
P-value
Initial Rank
Initial P-
value
rs4962416 MSMB 7 x 10-13 24,223 0.042
rs10896449 11q13 2 x 10-9 2,439 0.004
rs10993994 CTBP2 2 x 10-7 3194 x 10-
4
rs10486567 JAZF1 2 x 10-6 24,407 0.042Thomas et al, Nat Genet 2008; 40:310-15.
Cupples et al, BMC Med Genet 2007; 8(Suppl 1):S1
Genome-wide Association and Cohort Studies
Ridker et al, Clin Chem 2008; 54:249-55.
http://www.ncbi.nlm.nih.gov/sites/entrez
Genetic Association and Clinical Trials
7/2000
6/2006
1/2008
Genome-wide Association and Clinical Trials
5/2007
HLA DRB1*0701 and Transaminase Elevations Following Ximelegatran
Treatment
Kindmark et al, Pharmacogen J 2007; May 15 (on-line)
Genome-wide Association and Clinical Trials
1/2008
5/2007
Summary Points: Genetic Association Studies
• Candidate gene studies enormously prone to spurious associations
• GWA presents new paradigm, is unconstrained by current imperfect understanding of genome structure and function
• Initial findings astoundingly positive• Most are skimming surface of what could be
learned• GWA beginning to be applied to cohort studies• Very little work in genetic association in
clinical trials and treatment response
Larson, G. The Complete Far Side. 2003.