principles of genetic epidemiology april 2008 course

Principles of genetic epidemiology

April 2008 course

The post-genomic eraThe post-genomic era• Now that the full human genome sequence has been published, we

have access to genetic information in an unprecedented manner:– 3 billion base pairs in the human genome

– c 22 000 genes

– Tens of thousands of RNAs

– Hundreds of thousands of proteins

• Thus, developments in molecular genetic analysis render it now possible to attempt identification of liability genes in complex, multifactorial traits, and to dissect out with new precision the role of genetic predisposition and environment/life style factors in these disorders.

• New technologies and statistical tools are continuously introduced

• Nonetheless, often much hype and little real progress

In complex disease a person's susceptibility

genotype and environmental history combine to establish present health status,

and the genotype's norm of reaction determines future health trajectory

Genes, developmental history and environment as determinants of health

Characteristics of complex traits

Trait values are determined by complex interactions among numerous metabolic and physiological systems, as well as demographic and lifestyle factors

Variation in a large number of genes can potentially influence interindividual variation of trait values

The impact of any one gene is likely to be small to moderate in size

For diseases: Monogenic diseases that mimic complex diseases typically account for a small fraction of disease cases (examples in breast cancer, obesity, hypertension, osteoarthritis)

Example: Ala-Kokko L et al. Single base mutation in the type II procollagen gene (COL2A1) as a cause of primary osteoarthritis associated with a mild chondrodysplasia. PNAS 1990 ;87:6565-8. One large family, mutation not found otherwise.

Phenotype:

Clinical definitionDefine genetic componentIdentify data sets and data sources

Follow-up:(gene tracing & evaluation)

ReplicationFunctional studiesInteractions (gene-gene & gene-environment)

Analysis:

GenotypingStatistical analysisBioinformaticsVariation detection

Study design:

Case-control (association)Family-based (multiplex families/ sibpairs/ trio-based)Whole genome analysis

Steps in gene discovery, tracing and evaluation

Adapted from Fig 16.2 in Genetic Analysis of Complex Disease, Haines & Pericak-Vance, 2nd edition, 2006

Strategies for family studies:

• Does disease or behavior aggregate in families?

• What are the causes of familial aggregation?

• What is the model of genetic inheritance and which genes are responsible?

• How do genes interact with the environment?

Families are the basic unit

How to detect genetic effects and find genes?

Family studies:– provide estimates of heritability– information on mode of inheritance– adoption and twin studies as special cases

Molecular genetic studies:– candidate genes, genome-wide scans– association studies & linkage– animal studies (e.g.’knockouts’)

D1S1597

GATA29A05

D1S552

D1S1622

D1S2134

D1S1669

D1S1665

D1S551

D1S1588

D1S1631

D1S1675

D1S534

D1S1595

D1S1679

D1S1677

D1S1589

D1S518

D1S1660

D1S1678

D1S3465

D1S2141

D1S549

D1S1656

ATA29C07

What is heritability

Heritability is the estimate of the proportion in total variance of a trait or liability to a disease that is accounted for by genetic variance - interindividual genetic differences.

Genetic variance may arise from additive effects, due to different alleles at a locus, or may be due to dominance, the interactions of alleles

Heritability is a characteristic of populations, not individuals or families, which is affected by both genetic and environmental effects

Conceptual model of individual’s phenotype

Y = μ + G + Env

Where Env = C+E

Hence, variance can be decomposed:

σ2 = σ2G + σ2C + σ2E

Heritability is σ2G/σ2 and genetic variance has several components:

σ2G = σ2A + σ2D + σ2I

FAMILY STUDY

• Provides estimates of the degree of family aggregation

• Risks to siblings, parents, offspring as well as to other relatives can be estimated

• Similarity of different types of relatives can permit modelling of genetic versus non-genetic familial influences

• To disentangle genes and experience, we

study special family groups:

• Either family members sharing experiences but differing in shared genes, e.g. twin studies or

• family members sharing genes, but differing in their shared experience, e.g. adoption studies

ADOPTION DESIGNTest for association between trait in adoptees and trait in

biological parents (genetic correlation) &

Test for association between trait in adoptees and trait in

adoptive parents.

STRENGTHS: relatively powerful

WEAKNESSES:(1) poor generalizability

(2) adoptive parents likely to provide ‘good homes’

(3) biological parents of adoptive children may have

had multiple forms of psychopathology - selection

(4) poor characterization of phenotypes of biological

parents

The Classical Twin Study• Monozygotic (MZ) pairs are genetically alike• Dizygotic (DZ) pairs, like siblings, share on average half

of their segregating genes• DZ pairs can be same-sexed or opposite-sex (male-

female)• Increased similarity of twin pairs compared to unrelated

subjects suggests familial factors• Increased similarity of MZ pairs compared to DZ pairs

provides evidence for genetic factors

The classical twin study modelling• Model contribution of additive (A) and non-

additive (D)genetic effects, environmental effects shared by family members (C ) and unshared effects (E) (i.e. unique to each family member)

• Competing models, e.g. E, AE, ACE can be statistically compared and tested against actual data

• Mx – statistical program created by Mike Neale most commonly used in genetic modelling: http://views.vcu.edu/mx/

http://views.vcu.edu/mx/

Different phenotypes,different effects of genes

Genetic effects

Non-genetic family effects

Experimentation (age 12) 11% 73%

Initiation/ever smoker

(adolescents)20-36% 18-59%

Initiation/ever smoker

(adults)28-80% 4-50%

Persistence/ cessation 58-71% None

Nicotine dependence (FTND or DSM-IV)

60-75% None

Extensions of the classical twin study I

• Effect modification by age, sex and environmental factors, e.g. smoking or obesity

• Assess genetic covariance over time through longitudinal models

• Assess sex effects by comparison of like-sexed and same-sexed DZ pairs

• Assess social interaction effects

Genetic Influenceson Change in BMI

A longitudinal study of Finnish twins

J.v.B.Hjelmborg, C.Fagnani, K.Silventoinen,M.McGue, M.Korkeila, K.Christensen, A.Rissanen, J.Kaprio

Finnish Twin Cohort

• Twins born 1930-1955 participating in three surveys in 1975, 1981 and 1990

• Wt and ht asked in each questionnaire

• 10556 twins answered all questionnaires

• Same sex pairs

• Age at baseline 20-45 y

Latent growth model for weight change in adults 1975-1990

Males (95% CI) Females (95% CI)

N of pairs 499 MZ, 1013 DZ 735 MZ; 1265 DZ

MZ correlation of BMI level 0.79 (.79,.80) 0.83 (.82,.83)

DZ correlation of BMI level 0.44 (.44,.45) 0.39 (.38,.39)

MZ correlation of weight gain 0.60 (.56,.68) 0.65 (.61,.71)

DZ correlation of weight gain 0.26 (.24,.32) 0.30 (.28,.32)

Heritability of BMI level 0.80 (.79,.80) 0.82 (0.81,0.84)

Heritability of rate of weight gain 0.58 (.50,.69) 0.64 (0.58,0.69)

Add. genetic correlation of BMI levels with rate of weight gain

-0.070 (-.13,-.068) 0.041 (0.00,0.076)

Unique environmental correlation of BMI levels with rate of weight gain

0.0094 (-.020,.091) 0.24 (0.14,0.34)

Genetic modeling results for latent growth curve model of BMI Finnish Twin Cohort 1975 – 1990

Summary of findings

• A longitudinal growth curve model provides better estimates of heritability – c 80% for adult BMI – c 60% for rate of weight gain over a 15 year period in

young to middle-aged adults• Genetic influences on baseline BMI and on rate

of weight gain are weakly, if at all, correlated• Genes regulating weight gain and loss are likely

to be different from those affecting BMI• Environmental effects on weight change appear

to be larger than on BMI

Extensions of the classical twin study II

• Define phenotypes by assessing the combination of signs and symptoms with highest heritability– for example, broad vs. narrow definitions of LBP

• Define natural history of disease by assessing genetic communality of different stages – for example, initiation, persistence, and dependence

in smoking• Common genetic pathways across phenotypes

– for example, hip, knee and hand OA; bone density in weight-bearing & non-weight bearing bones

Phenotype:




Analysis:


Study design:




How to detect genetic effects and genes?

Molecular genetic studies:– candidate genes, genome-wide scans– association studies & linkage– animal studies (e.g.’knockouts, knock-ins’)

Family studies:– provide estimates of heritability– information on mode of inheritance

– adoption and twin studies as special cases

D1S1597

GATA29A05

D1S552

D1S1622

D1S2134

D1S1669

D1S1665

D1S551

D1S1588

D1S1631

D1S1675

D1S534

D1S1595

D1S1679

D1S1677

D1S1589

D1S518

D1S1660

D1S1678

D1S3465

D1S2141

D1S549

D1S1656

ATA29C07

• ascertain pedigree units that are likely to segregate genes of relevance – Ex: pedigrees with quasi-Mendelian disease

transmission – affected sib pair approach of linkage analysis

• ascertain families on the basis of individuals with extreme or remarkable phenotypes– Ex: extremely discordant sibpairs – ascertain young individuals with the disease

• ascertain individuals from isolated populations: – more homogenous genetically and culturally as well

• ascertain intermediate phenotypes – physiologic phenotype is “closer” to sequence variants

Increasing the genetic signal in the data...

... At the cost of representativeness and ability to evaluate population risk

ISOLATED POPULATION

• Wonderfully isolated Finnish population

– Small number of founders

– Subsequent isolation

– Rapid expansion

– Major bottlenecks

→ Genetic drift has moulded the gene pool

• Genetic homogeneity, longer LD blocks

• Valuable for genetic studies, especially of

monogenic diseases

1. candidate gene analysismotto: study a few good genes

2. whole-genome searches (genome scans)

motto: cast out a net that catches all the big fish

Two basic Analysis Strategies

• statistically straightforward: test the association between genotypes and phenotype with contingency tables, chi-square test, regression

• principle: if an allele is more frequent in affecteds than unaffecteds gene may be close to a disease gene

• candidacy of a gene can come from a number of different sources: – biological insights (e.g. gene expressed in a certain

tissue)– homology to other genes – functional studies in model organisms – member of a relevant gene family

• Challenge: greater biological understanding of the genes

Candidate Gene Studies

Allelic association studies test whether alleles are associated with the trait

• 2 types of association tests– population-based association test

• cases and controls are unrelated• cross-classify by genotype• use 2 test, ANOVA or logistic regression

– family-based association tests (e.g. TDT)• cases and controls are related: parents, sibs etc• often based on allele transmission rates

• Multivariate/data reduction approaches– Multiple regression of all SNPs in gene– Haplotype analyses– False discovery rate and replication rather than p-values

• Pathway analyses– Combination of individual SNPs/genes and pathway

constraints

• best: allele increases disease susceptibility– candidate gene studies

• good: some subjects share common ancestor – linkage disequilibrium studies

• bad: association due to population stratification– family-based offer protection

The 3 possible causes for association

d

A1

d

M

K

AllelesLoci

Slide by Steven Horwath, 2003

POPULATION STRATIFICATIONHypothetical Example (by Andrew Heath)

Falsely infer that A1 allele is risk-factor for following traditional Mediterranean diet .

OR = 2.28, 95%CI 1.39 - 3.73

NO ASSOCIATION NO ASSOCIATION

NORTHERN EUROPEANANCESTRY (N=200)

SOUTHERN EUROPEAN ANCESTRY (N=200)

NOT A1 alleleA1 allele

NON-MED DIET

MEDDIET

NON-MEDDIET

MEDDIET

16218

90%

182

10%

3515

25%

10545

75%

70%

30%

90%

10%

NON-MEDDIET

MEDDIET

19733

12347

NOT A1 alleleA1 allele

MINGLED IN AUSTRALIAN POPULATION (N=400)

• Family-based association tests avoid confounding due to ethnic stratification – These designs automatically match "controls" to

cases on ethnic ancestry.• Conventional wisdom:

– family-based designs are generally less efficient than designs based on unrelated control subjects

– population admixture effects are negligible• Non-conventional wisdom

– family controls are better matched for environmental exposures

– cryptic relatedness may be an important issue in isolate populations

Population-based versus family-based association tests

Pathway approach

Hung et al. Cancer Epid Biomarker Prev 2004 &Conti et al. Human Heredity 2003

• involve anonymous markers, no candidate genes• hundreds of evenly spaced genetic markers in the

genome• often hundreds of related individuals in small to

large families • linkage analysis is statistical method to draw

inferences about the co-transmission of marker locus alleles and trait-influencing alleles

• Identifies chromosomal regions harboring the genes predisposing to trait (such as nicotine dependence)

Family-based Genome Scans

Co-transmission of disease and alleles

Aa

Aa

aa Aa

Aa aa

aaAa Aa

Aa

aa

Chromosome Phenotype LOD ≥2 /p-value

Author and year Country Number of families and individuals

2 FTQ 2.61 Straub et al. 1999 New Zealand 130 families, 343 individuals

FTQ 2.53 Sullivan et al. 2004 New Zealand 129 families

5 FTND 3.04 Gelernter et al. 2007 US 634 small nuclear families

6 FTND 2.70 Swan et al. 2006 US 158 nuclear families, 607 individuals


FTND 2.73 Gelernter et al. 2007 US 634 small nuclear families

FTND 2.50 Loukola et al. 2007 Finland 153 families,505 individuals


10 HSI 4.17 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals

FTQ 2.43 Straub et al. 1999 New Zealand 130 families, 343 individuals

FTQ 2.02 Sullivan et al. 2004 New Zealand 129 families

11 FTND 2.31 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals

HSI 2.15 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals

17 FTND 0.009* Lou et al. 2007 US (EA) 200 families, 671 individuals

AA=African-American sample, EA=European-American sample, * Lou et al 2007 reported a p-value

Index cases are twins

from pairs concordant

for heavy smoking

based on earlier

questionnaires from the

Finnish Twin Cohorts

1293 families (twin pairs) invited

762 families recruited with 2412 family members

(1278 men, 1134 women)

Data collection complete for 2143 persons

Interview, blood sample, informed consent

SAMPLE COLLECTION

– Identified Finnish families with DZ smoking twins

• Invited also siblings and parents to participate

– 153 affected twin-pair families, 505 individuals

– On average 3 individuals per family (range 2-9)

Phenotype definitions

1. Smoker (smoked ≥100 cigarettes during lifetime)

2. Nicotine dependent (Fagerström, FTND)

3. Nicotine dependent (DSM-IV)

4. Alcohol use (aiming for intoxication)

5. Co-morbid phenotype of FTND and alcohol use

STUDY SAMPLE

Chromosome 11- Nicotine WithdrawalL

OD

sco

re

cM position

Finnish Australian

Chromosome 11- Candidate Genes for Nicotine withdrawal in Finnish and Australian families

1. DRD42. TH3. CHRNA104. TPH15. ANKK1/DRD2, HTR3A, HTR3B

1 2 3 4 5

• involve anonymous markers, no candidate genes• chips of 300,000 to 1,000,000 SNPS on a single array

(Illumina, Affymetrix)• Hundreds to thousands of cases and unrelated controls• High-through-put genotyping of common SNPs such as

those identified from HapMap project • Over past two years many new genes in common diseases

have been identified• Two recent GWAs on nicotine dependence (Uhl et al, 2007,

Bierut et al, 2007) • New GWA on smoking cessation (Uhl G, et al, Arch Gen

Psychiatr, in press) finds genes with very little overlap to earlier GWAs on nicotine dependence

Genome-wide Case-Control Analyses

Li C-Y et al, PLoS Comput Biol 2008

Bioinformatics processing of existing information to discover biological pathways

Li C-Y et al, PLoS Comput Biol 2008

Phenotype:




Analysis:


Study design:




Integration of information at different levels

Developments in molecular genetics render it now possible to attempt identification of liability genes in complex, multifactorial traits, and to dissect out with new precision the role of genetic predisposition and environment/life style factors in these disorders. But, an integrative framework is needed

Complex picture

Gottesmann I, Science 1997

P

G4G1 G2 G3 E1 E4E2 E3

GE

P1 P4P2 P3

G’4

E’4E’1G’2 E’3

G’1

Measured Genotypes Measured Environments

Outcome Phenotype

Endophenotypes

TIME?

P5

G’5

E’

Eaves et al., 2005

• millions of SNPs, bi-allelic

• all common genetic variants known

• common function known

•

fast genotyping, sequencing, mutation detection

Information of genetic data will increase

past present, future

• microsatellites

•incomplete knowledge of variants•function barely known

• linkage analysis

genetic map

candidate genes

new technology

statistical methods

Linkage disequilibrium tests

Slide from Steve Horwarth

• Complex disease gene mapping is starting to fullfill its promise

• distinction between candidate gene studies and whole genome scans diminishes as genotyping costs decrease

• when collecting pedigrees enriched with affecteds always collect the DNA of good controls as well

• Put effort into high quality and detailed phenotyping– multiple, longitudinal measures– use intermediate, physiological phenotypes as traits– Imaging, metabolomics– gene expression and protein array measurements

To summarize

Useful reading

• JL Haines, MA Pericak-Vance. Genetic analysis of Complex Disease. Wiley, 2006

• DC Thomas. Statistical Methods in Genetic Epidemiology, Oxford 2004

• MJ Khoury. Human Genome Epidemiology, Oxford, 2003

principles of genetic epidemiology april 2008 course

Documents