principles of genetic epidemiology april 2008 course
TRANSCRIPT
Principles of genetic epidemiology
April 2008 course
The post-genomic eraThe post-genomic era• Now that the full human genome sequence has been published, we
have access to genetic information in an unprecedented manner:– 3 billion base pairs in the human genome
– c 22 000 genes
– Tens of thousands of RNAs
– Hundreds of thousands of proteins
• Thus, developments in molecular genetic analysis render it now possible to attempt identification of liability genes in complex, multifactorial traits, and to dissect out with new precision the role of genetic predisposition and environment/life style factors in these disorders.
• New technologies and statistical tools are continuously introduced
• Nonetheless, often much hype and little real progress
In complex disease a person's susceptibility
genotype and environmental history combine to establish present health status,
and the genotype's norm of reaction determines future health trajectory
Genes, developmental history and environment as determinants of health
Characteristics of complex traits
Trait values are determined by complex interactions among numerous metabolic and physiological systems, as well as demographic and lifestyle factors
Variation in a large number of genes can potentially influence interindividual variation of trait values
The impact of any one gene is likely to be small to moderate in size
For diseases: Monogenic diseases that mimic complex diseases typically account for a small fraction of disease cases (examples in breast cancer, obesity, hypertension, osteoarthritis)
Example: Ala-Kokko L et al. Single base mutation in the type II procollagen gene (COL2A1) as a cause of primary osteoarthritis associated with a mild chondrodysplasia. PNAS 1990 ;87:6565-8. One large family, mutation not found otherwise.
Phenotype:
Clinical definitionDefine genetic componentIdentify data sets and data sources
Follow-up:(gene tracing & evaluation)
ReplicationFunctional studiesInteractions (gene-gene & gene-environment)
Analysis:
GenotypingStatistical analysisBioinformaticsVariation detection
Study design:
Case-control (association)Family-based (multiplex families/ sibpairs/ trio-based)Whole genome analysis
Steps in gene discovery, tracing and evaluation
Adapted from Fig 16.2 in Genetic Analysis of Complex Disease, Haines & Pericak-Vance, 2nd edition, 2006
Strategies for family studies:
• Does disease or behavior aggregate in families?
• What are the causes of familial aggregation?
• What is the model of genetic inheritance and which genes are responsible?
• How do genes interact with the environment?
Families are the basic unit
How to detect genetic effects and find genes?
Family studies:– provide estimates of heritability– information on mode of inheritance– adoption and twin studies as special cases
Molecular genetic studies:– candidate genes, genome-wide scans– association studies & linkage– animal studies (e.g.’knockouts’)
D1S1597
GATA29A05
D1S552
D1S1622
D1S2134
D1S1669
D1S1665
D1S551
D1S1588
D1S1631
D1S1675
D1S534
D1S1595
D1S1679
D1S1677
D1S1589
D1S518
D1S1660
D1S1678
D1S3465
D1S2141
D1S549
D1S1656
ATA29C07
What is heritability
Heritability is the estimate of the proportion in total variance of a trait or liability to a disease that is accounted for by genetic variance - interindividual genetic differences.
Genetic variance may arise from additive effects, due to different alleles at a locus, or may be due to dominance, the interactions of alleles
Heritability is a characteristic of populations, not individuals or families, which is affected by both genetic and environmental effects
Conceptual model of individual’s phenotype
Y = μ + G + Env
Where Env = C+E
Hence, variance can be decomposed:
σ2 = σ2G + σ2C + σ2E
Heritability is σ2G/σ2 and genetic variance has several components:
σ2G = σ2A + σ2D + σ2I
FAMILY STUDY
• Provides estimates of the degree of family aggregation
• Risks to siblings, parents, offspring as well as to other relatives can be estimated
• Similarity of different types of relatives can permit modelling of genetic versus non-genetic familial influences
• To disentangle genes and experience, we
study special family groups:
• Either family members sharing experiences but differing in shared genes, e.g. twin studies or
• family members sharing genes, but differing in their shared experience, e.g. adoption studies
ADOPTION DESIGNTest for association between trait in adoptees and trait in
biological parents (genetic correlation) &
Test for association between trait in adoptees and trait in
adoptive parents.
STRENGTHS: relatively powerful
WEAKNESSES:(1) poor generalizability
(2) adoptive parents likely to provide ‘good homes’
(3) biological parents of adoptive children may have
had multiple forms of psychopathology - selection
(4) poor characterization of phenotypes of biological
parents
The Classical Twin Study• Monozygotic (MZ) pairs are genetically alike• Dizygotic (DZ) pairs, like siblings, share on average half
of their segregating genes• DZ pairs can be same-sexed or opposite-sex (male-
female)• Increased similarity of twin pairs compared to unrelated
subjects suggests familial factors• Increased similarity of MZ pairs compared to DZ pairs
provides evidence for genetic factors
The classical twin study modelling• Model contribution of additive (A) and non-
additive (D)genetic effects, environmental effects shared by family members (C ) and unshared effects (E) (i.e. unique to each family member)
• Competing models, e.g. E, AE, ACE can be statistically compared and tested against actual data
• Mx – statistical program created by Mike Neale most commonly used in genetic modelling: http://views.vcu.edu/mx/
Different phenotypes,different effects of genes
Genetic effects
Non-genetic family effects
Experimentation (age 12) 11% 73%
Initiation/ever smoker
(adolescents)20-36% 18-59%
Initiation/ever smoker
(adults)28-80% 4-50%
Persistence/ cessation 58-71% None
Nicotine dependence (FTND or DSM-IV)
60-75% None
Extensions of the classical twin study I
• Effect modification by age, sex and environmental factors, e.g. smoking or obesity
• Assess genetic covariance over time through longitudinal models
• Assess sex effects by comparison of like-sexed and same-sexed DZ pairs
• Assess social interaction effects
Genetic Influenceson Change in BMI
A longitudinal study of Finnish twins
J.v.B.Hjelmborg, C.Fagnani, K.Silventoinen,M.McGue, M.Korkeila, K.Christensen, A.Rissanen, J.Kaprio
Finnish Twin Cohort
• Twins born 1930-1955 participating in three surveys in 1975, 1981 and 1990
• Wt and ht asked in each questionnaire
• 10556 twins answered all questionnaires
• Same sex pairs
• Age at baseline 20-45 y
Latent growth model for weight change in adults 1975-1990
Males (95% CI) Females (95% CI)
N of pairs 499 MZ, 1013 DZ 735 MZ; 1265 DZ
MZ correlation of BMI level 0.79 (.79,.80) 0.83 (.82,.83)
DZ correlation of BMI level 0.44 (.44,.45) 0.39 (.38,.39)
MZ correlation of weight gain 0.60 (.56,.68) 0.65 (.61,.71)
DZ correlation of weight gain 0.26 (.24,.32) 0.30 (.28,.32)
Heritability of BMI level 0.80 (.79,.80) 0.82 (0.81,0.84)
Heritability of rate of weight gain 0.58 (.50,.69) 0.64 (0.58,0.69)
Add. genetic correlation of BMI levels with rate of weight gain
-0.070 (-.13,-.068) 0.041 (0.00,0.076)
Unique environmental correlation of BMI levels with rate of weight gain
0.0094 (-.020,.091) 0.24 (0.14,0.34)
Genetic modeling results for latent growth curve model of BMI Finnish Twin Cohort 1975 – 1990
Summary of findings
• A longitudinal growth curve model provides better estimates of heritability – c 80% for adult BMI – c 60% for rate of weight gain over a 15 year period in
young to middle-aged adults• Genetic influences on baseline BMI and on rate
of weight gain are weakly, if at all, correlated• Genes regulating weight gain and loss are likely
to be different from those affecting BMI• Environmental effects on weight change appear
to be larger than on BMI
Extensions of the classical twin study II
• Define phenotypes by assessing the combination of signs and symptoms with highest heritability– for example, broad vs. narrow definitions of LBP
• Define natural history of disease by assessing genetic communality of different stages – for example, initiation, persistence, and dependence
in smoking• Common genetic pathways across phenotypes
– for example, hip, knee and hand OA; bone density in weight-bearing & non-weight bearing bones
Phenotype:
Clinical definitionDefine genetic componentIdentify data sets and data sources
Follow-up:(gene tracing & evaluation)
ReplicationFunctional studiesInteractions (gene-gene & gene-environment)
Analysis:
GenotypingStatistical analysisBioinformaticsVariation detection
Study design:
Case-control (association)Family-based (multiplex families/ sibpairs/ trio-based)Whole genome analysis
Steps in gene discovery, tracing and evaluation
Adapted from Fig 16.2 in Genetic Analysis of Complex Disease, Haines & Pericak-Vance, 2nd edition, 2006
How to detect genetic effects and genes?
Molecular genetic studies:– candidate genes, genome-wide scans– association studies & linkage– animal studies (e.g.’knockouts, knock-ins’)
Family studies:– provide estimates of heritability– information on mode of inheritance
– adoption and twin studies as special cases
D1S1597
GATA29A05
D1S552
D1S1622
D1S2134
D1S1669
D1S1665
D1S551
D1S1588
D1S1631
D1S1675
D1S534
D1S1595
D1S1679
D1S1677
D1S1589
D1S518
D1S1660
D1S1678
D1S3465
D1S2141
D1S549
D1S1656
ATA29C07
• ascertain pedigree units that are likely to segregate genes of relevance – Ex: pedigrees with quasi-Mendelian disease
transmission – affected sib pair approach of linkage analysis
• ascertain families on the basis of individuals with extreme or remarkable phenotypes– Ex: extremely discordant sibpairs – ascertain young individuals with the disease
• ascertain individuals from isolated populations: – more homogenous genetically and culturally as well
• ascertain intermediate phenotypes – physiologic phenotype is “closer” to sequence variants
Increasing the genetic signal in the data...
... At the cost of representativeness and ability to evaluate population risk
ISOLATED POPULATION
• Wonderfully isolated Finnish population
– Small number of founders
– Subsequent isolation
– Rapid expansion
– Major bottlenecks
→ Genetic drift has moulded the gene pool
• Genetic homogeneity, longer LD blocks
• Valuable for genetic studies, especially of
monogenic diseases
1. candidate gene analysismotto: study a few good genes
2. whole-genome searches (genome scans)
motto: cast out a net that catches all the big fish
Two basic Analysis Strategies
• statistically straightforward: test the association between genotypes and phenotype with contingency tables, chi-square test, regression
• principle: if an allele is more frequent in affecteds than unaffecteds gene may be close to a disease gene
• candidacy of a gene can come from a number of different sources: – biological insights (e.g. gene expressed in a certain
tissue)– homology to other genes – functional studies in model organisms – member of a relevant gene family
• Challenge: greater biological understanding of the genes
Candidate Gene Studies
Allelic association studies test whether alleles are associated with the trait
• 2 types of association tests– population-based association test
• cases and controls are unrelated• cross-classify by genotype• use 2 test, ANOVA or logistic regression
– family-based association tests (e.g. TDT)• cases and controls are related: parents, sibs etc• often based on allele transmission rates
• Multivariate/data reduction approaches– Multiple regression of all SNPs in gene– Haplotype analyses– False discovery rate and replication rather than p-values
• Pathway analyses– Combination of individual SNPs/genes and pathway
constraints
• best: allele increases disease susceptibility– candidate gene studies
• good: some subjects share common ancestor – linkage disequilibrium studies
• bad: association due to population stratification– family-based offer protection
The 3 possible causes for association
d
A1
d
M
K
AllelesLoci
Slide by Steven Horwath, 2003
POPULATION STRATIFICATIONHypothetical Example (by Andrew Heath)
Falsely infer that A1 allele is risk-factor for following traditional Mediterranean diet .
OR = 2.28, 95%CI 1.39 - 3.73
NO ASSOCIATION NO ASSOCIATION
NORTHERN EUROPEANANCESTRY (N=200)
SOUTHERN EUROPEAN ANCESTRY (N=200)
NOT A1 alleleA1 allele
NON-MED DIET
MEDDIET
NON-MEDDIET
MEDDIET
16218
90%
182
10%
3515
25%
10545
75%
70%
30%
90%
10%
NON-MEDDIET
MEDDIET
19733
12347
NOT A1 alleleA1 allele
MINGLED IN AUSTRALIAN POPULATION (N=400)
• Family-based association tests avoid confounding due to ethnic stratification – These designs automatically match "controls" to
cases on ethnic ancestry.• Conventional wisdom:
– family-based designs are generally less efficient than designs based on unrelated control subjects
– population admixture effects are negligible• Non-conventional wisdom
– family controls are better matched for environmental exposures
– cryptic relatedness may be an important issue in isolate populations
Population-based versus family-based association tests
Pathway approach
Hung et al. Cancer Epid Biomarker Prev 2004 &Conti et al. Human Heredity 2003
• involve anonymous markers, no candidate genes• hundreds of evenly spaced genetic markers in the
genome• often hundreds of related individuals in small to
large families • linkage analysis is statistical method to draw
inferences about the co-transmission of marker locus alleles and trait-influencing alleles
• Identifies chromosomal regions harboring the genes predisposing to trait (such as nicotine dependence)
Family-based Genome Scans
Co-transmission of disease and alleles
Aa
Aa
aa Aa
Aa aa
aaAa Aa
Aa
aa
Chromosome Phenotype LOD ≥2 /p-value
Author and year Country Number of families and individuals
2 FTQ 2.61 Straub et al. 1999 New Zealand 130 families, 343 individuals
FTQ 2.53 Sullivan et al. 2004 New Zealand 129 families
5 FTND 3.04 Gelernter et al. 2007 US 634 small nuclear families
6 FTND 2.70 Swan et al. 2006 US 158 nuclear families, 607 individuals
7 FTND 2.70 Swan et al. 2006 US 158 nuclear families, 607 individuals
FTND 2.73 Gelernter et al. 2007 US 634 small nuclear families
FTND 2.50 Loukola et al. 2007 Finland 153 families,505 individuals
8 FTND 2.7 Swan et al. 2006 US 158 nuclear families, 607 individuals
10 HSI 4.17 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals
FTQ 2.43 Straub et al. 1999 New Zealand 130 families, 343 individuals
FTQ 2.02 Sullivan et al. 2004 New Zealand 129 families
11 FTND 2.31 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals
HSI 2.15 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals
17 FTND 0.009* Lou et al. 2007 US (EA) 200 families, 671 individuals
AA=African-American sample, EA=European-American sample, * Lou et al 2007 reported a p-value
Index cases are twins
from pairs concordant
for heavy smoking
based on earlier
questionnaires from the
Finnish Twin Cohorts
1293 families (twin pairs) invited
762 families recruited with 2412 family members
(1278 men, 1134 women)
Data collection complete for 2143 persons
Interview, blood sample, informed consent
SAMPLE COLLECTION
– Identified Finnish families with DZ smoking twins
• Invited also siblings and parents to participate
– 153 affected twin-pair families, 505 individuals
– On average 3 individuals per family (range 2-9)
Phenotype definitions
1. Smoker (smoked ≥100 cigarettes during lifetime)
2. Nicotine dependent (Fagerström, FTND)
3. Nicotine dependent (DSM-IV)
4. Alcohol use (aiming for intoxication)
5. Co-morbid phenotype of FTND and alcohol use
STUDY SAMPLE
Chromosome 11- Nicotine WithdrawalL
OD
sco
re
cM position
Finnish Australian
Chromosome 11- Candidate Genes for Nicotine withdrawal in Finnish and Australian families
1. DRD42. TH3. CHRNA104. TPH15. ANKK1/DRD2, HTR3A, HTR3B
1 2 3 4 5
• involve anonymous markers, no candidate genes• chips of 300,000 to 1,000,000 SNPS on a single array
(Illumina, Affymetrix)• Hundreds to thousands of cases and unrelated controls• High-through-put genotyping of common SNPs such as
those identified from HapMap project • Over past two years many new genes in common diseases
have been identified• Two recent GWAs on nicotine dependence (Uhl et al, 2007,
Bierut et al, 2007) • New GWA on smoking cessation (Uhl G, et al, Arch Gen
Psychiatr, in press) finds genes with very little overlap to earlier GWAs on nicotine dependence
Genome-wide Case-Control Analyses
Li C-Y et al, PLoS Comput Biol 2008
Bioinformatics processing of existing information to discover biological pathways
Li C-Y et al, PLoS Comput Biol 2008
Phenotype:
Clinical definitionDefine genetic componentIdentify data sets and data sources
Follow-up:(gene tracing & evaluation)
ReplicationFunctional studiesInteractions (gene-gene & gene-environment)
Analysis:
GenotypingStatistical analysisBioinformaticsVariation detection
Study design:
Case-control (association)Family-based (multiplex families/ sibpairs/ trio-based)Whole genome analysis
Steps in gene discovery, tracing and evaluation
Adapted from Fig 16.2 in Genetic Analysis of Complex Disease, Haines & Pericak-Vance, 2nd edition, 2006
Integration of information at different levels
Developments in molecular genetics render it now possible to attempt identification of liability genes in complex, multifactorial traits, and to dissect out with new precision the role of genetic predisposition and environment/life style factors in these disorders. But, an integrative framework is needed
Complex picture
Gottesmann I, Science 1997
P
G4G1 G2 G3 E1 E4E2 E3
GE
P1 P4P2 P3
G’4
E’4E’1G’2 E’3
G’1
Measured Genotypes Measured Environments
Outcome Phenotype
Endophenotypes
TIME?
P5
G’5
E’
Eaves et al., 2005
• millions of SNPs, bi-allelic
• all common genetic variants known
• common function known
•
fast genotyping, sequencing, mutation detection
Information of genetic data will increase
past present, future
• microsatellites
•incomplete knowledge of variants•function barely known
• linkage analysis
genetic map
candidate genes
new technology
statistical methods
Linkage disequilibrium tests
Slide from Steve Horwarth
• Complex disease gene mapping is starting to fullfill its promise
• distinction between candidate gene studies and whole genome scans diminishes as genotyping costs decrease
• when collecting pedigrees enriched with affecteds always collect the DNA of good controls as well
• Put effort into high quality and detailed phenotyping– multiple, longitudinal measures– use intermediate, physiological phenotypes as traits– Imaging, metabolomics– gene expression and protein array measurements
To summarize
Useful reading
• JL Haines, MA Pericak-Vance. Genetic analysis of Complex Disease. Wiley, 2006
• DC Thomas. Statistical Methods in Genetic Epidemiology, Oxford 2004
• MJ Khoury. Human Genome Epidemiology, Oxford, 2003