association analysis shaun purcell boulder twin workshop 2004

55
Association analysis Association analysis Shaun Purcell Boulder Twin Workshop 2004

Post on 20-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Association analysisAssociation analysis

Shaun PurcellBoulder Twin Workshop 2004

Page 2: Association analysis Shaun Purcell Boulder Twin Workshop 2004

OverviewOverview

• Candidate gene association

• Haplotypes and linkage disequilibrium

• Linkage and association

• Family-based association

Page 3: Association analysis Shaun Purcell Boulder Twin Workshop 2004

What is association?What is association?

• Categorical traits– disease susceptibility genes

• Continuous traits– quantitative trait loci, QTL

Page 4: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Disease traitsDisease traits

Case Control

AA n1 n2

Aa n3 n4

aa n5 n6

Is there a difference in allele/genotype frequency between cases and controls?

Page 5: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Disease traitsDisease traits

Case Control

AA 30 25 p2

Aa 50 50 2p(1-p)

aa 20 25 (1-p)2

Is there a difference in allele/genotype frequency between cases and controls?

2Test for independence , p-value

Page 6: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Disease traitsDisease traits

Case

Control

AA n1 n2

Aa n3 n4

aa n5 n6

Case Control

A 2n1+n3

2n2+n4

a 2n5+n3

2n6+n4

Case

Control

A* n1+n3

n2+n4

aa n5 n6

General model Additive model Dominant model for A

2 df

1 df 1 df

Effect sizes calculated as odds ratios

Page 7: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Quantitative traitsQuantitative traits

AA

Aa

aa

-2

-1

0

1

2

3

4

aa Aa AA

ID Y G A D001 0.34 aa -1 0002 1.23 Aa 0 1003 1.66 Aa 0 1004 2.74 AA 1 0005 1.33 AA 1 0… … … … …

Y = aA + dD + e

Page 8: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Some web resourcesSome web resources• BGIM

http://statgen.iop.kcl.ac.uk/bgim/Introductory tutorials on twin analysis, primer on maximum likelihood, Mx language.

• GxE moderator modelshttp://statgen.iop.kcl.ac.uk/gxe/

• Power calculationhttp://statgen.iop.kcl.ac.uk/gpc/

• Case/control association toolshttp://statgen.iop.kcl.ac.uk/gpc/model/

Page 9: Association analysis Shaun Purcell Boulder Twin Workshop 2004
Page 10: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Relative riskRelative riskGenotype P(D|G) RR

AA P(D|AA) P(D|AA)/P(D|aa)

Aa P(D|Aa) P(D|Aa)/P(D|aa)

aa P(D|aa) 1

P(D|AA) / P(D|aa) labelled RR(AA)

P(D|Aa) / P(D|aa) labelled RR(Aa)

Page 11: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Genetic modelsGenetic modelsModel RR(Aa) RR(AA)

General x y

Multiplicative x x2

Dominant x x

Recessive 1.000 x

No effect 1.000 1.000

Page 12: Association analysis Shaun Purcell Boulder Twin Workshop 2004

TestsTestsTest Alternate NullAny effect? General No effectAny effect assuming a multiplicative gene?

Multiplicative

No effect

Any effect assuming a dominant gene?

Dominance No effect

Any effect assuming a recessive gene?

Recessive No effect

Can we assume a multiplicative effect?

General Multiplicative

Can we assume a dominant effect?

General Dominance

Can we assume a recessive effect?

General Recessive

Page 13: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Multiple samplesMultiple samples

• Constrain frequencies across samples• Constrain effects across samples

– Can test genetic models with effects and/or frequencies constrained to be equal

– Can perform tests of homogeneity of effects and/or frequencies across samples

Page 14: Association analysis Shaun Purcell Boulder Twin Workshop 2004

An exampleAn example2 case/control samples2 case/control samples

• Population frequency 5%

Case

Control

AA 17 11

Aa 35 59

aa 24 40

Case

Control

AA 37 10

Aa 67 43

aa 20 37

Page 15: Association analysis Shaun Purcell Boulder Twin Workshop 2004
Page 16: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Homogeneous effects across samplesHomogeneous allele frequencies across samples

Model p RR(Aa)RR(AA)-2LL----- - ---------------- Gen 0.367 1.979 3.663

0.367 1.979 3.663 793.143

Mult 0.367 1.911 3.6510.367 1.911 3.651 793.199

Dom 0.401 1.990 1.9900.401 1.990 1.990

802.927

Rec 0.405 1.000 1.9210.405 1.000 1.921

805.064

None 0.442 1.000 1.0000.442 1.000 1.000 815.628

Page 17: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Heterogeneous effects across samplesHomogeneous allele frequencies across samples

Model p RR(Aa) RR(AA) -2LL----- - ------ ------ ---- Gen 0.367 1.235 2.136

0.367 2.890 5.547 786.498

Mult 0.367 1.440 2.073 0.367 2.282 5.208 788.262

Dom 0.401 1.216 1.2160.401 2.936 2.936 796.422

Rec 0.405 1.000 1.5190.405 1.000 2.195 803.849

None 0.443 1.000 1.0000.443 1.000 1.000 815.628

Page 18: Association analysis Shaun Purcell Boulder Twin Workshop 2004

TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS & EQ FREQS=========================================================

Gen vs None (2 df) : 22.485 p = 0.000Mult vs None (1 df) : 22.429 p = 0.000Dom vs None (1 df) : 12.701 p = 0.000Rec vs None (1 df) : 10.564 p = 0.001Gen vs Mult (1 df) : 0.056 p = 0.813Gen vs Dom (1 df) : 9.784 p = 0.002Gen vs Rec (1 df) : 11.921 p = 0.001

TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS & EQ FREQS===========================================================

Gen vs None (4 df) : 29.130 p = 0.000Mult vs None (2 df) : 27.366 p = 0.000Dom vs None (2 df) : 19.205 p = 0.000Rec vs None (2 df) : 11.779 p = 0.003Gen vs Mult (2 df) : 1.764 p = 0.414Gen vs Dom (2 df) : 9.925 p = 0.007Gen vs Rec (2 df) : 17.351 p = 0.000

TESTS OF EQUAL EFFECTS -- ASSUMING EQ FREQS===========================================

w/ Gen model (2 df) : 6.645 p = 0.036w/ Mult model (1 df) : 4.938 p = 0.026w/ Dom model (1 df) : 6.505 p = 0.011w/ Rec model (1 df) : 1.215 p = 0.270

Page 19: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Indirect associationIndirect association

QTL

Genotyped markers

Ungenotyped markers

Page 20: Association analysis Shaun Purcell Boulder Twin Workshop 2004

RecombinationRecombination

Paternal chromosomeMaternal chromosome

Homologous chromosomes in one parent

Recombination eventduring meiosis

Recombinant gamete transmitted,harboring mutation

Page 21: Association analysis Shaun Purcell Boulder Twin Workshop 2004

RecombinationRecombination

Paternal chromosomeMaternal chromosome

Homologous chromosomes in one parent

No recombination eventduring meiosis

Nonrecombinant gamete transmitted,not harboring mutation

Page 22: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Linkage: affected sib Linkage: affected sib pairspairs

Paternal chromosomeMaternal chromosome

First affected offspring, no recombination

Second affected offspring,recombinant gamete

IBD sharing from this one parent (0 or 1)1

0

Page 23: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Association analysisAssociation analysis

• Mutation occurs on a ‘red’ chromosome

Page 24: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Association analysisAssociation analysis

• Mutation occurs on a ‘red’ chromosome

Page 25: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Association analysisAssociation analysis

• Association due to `linkage disequilibrium’

Page 26: Association analysis Shaun Purcell Boulder Twin Workshop 2004

A aM AM aMm Am am

This individual has aa and Mm genotypes

and am and aM haplotypes

HaplotypesHaplotypes

Page 27: Association analysis Shaun Purcell Boulder Twin Workshop 2004

A aM AM aMm Am am

This individual has Aa and Mm genotypes and AM and am haplotypes

… but given only genotype data, consistent with Am/aM as well as

AM/am

HaplotypesHaplotypes

Page 28: Association analysis Shaun Purcell Boulder Twin Workshop 2004

A aM AM aMm Am am

This individual has AA and Mm genotypes

and AM and Am haplotypes

HaplotypesHaplotypes

Page 29: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Equilibrium haplotype Equilibrium haplotype frequenciesfrequencies

A aM pr ps pm qr qs q

r s

Page 30: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Linkage disequilibriumLinkage disequilibrium

A aM pr + D ps - D pm qr - D qs + D q

r s

DMAX = Min(qs, pr)

D’ = D /DMAX

r2 = D’ / pqrs

Page 31: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Haplotype analysisHaplotype analysis

1. Estimate haplotypes from genotypes2. Associate haplotypes with trait

Haplotype Freq. Odds RatioAAGG 40% 1.00*

AAGT 30% 2.21

CGCG 25% 1.07

AGCT 5% 0.92

* baseline, fixed to 1.00

Page 32: Association analysis Shaun Purcell Boulder Twin Workshop 2004
Page 33: Association analysis Shaun Purcell Boulder Twin Workshop 2004

LinkageLinkage AssociationAssociation

QTL genotype

Trait

IBD at the QTL

Sib correlation

0 1 2 aa Aa AA

Marker genotype

Trait

QTL genotype

Trait

LDRF

IBD at the Marker

Sib correlation

0 1 2IBD at the QTL

Sib correlation

0 1 2 aa Aa AAaa Aa AA

Page 34: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Variance ComponentsVariance Components

• MeansM1 M2

• Variance-covariance matrix

V1 C21

C12 V2

ASSOCIATION

LINKAGE

Page 35: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Variance ComponentsVariance Components

• MeansM1 + bG1 M2 + bG2

• Variance-covariance matrix

V1 C21+ q(-½)

C12 + q(-½) V2

LINKAGEq = regression coef. = IBD sharing 0 , ½ , 1

ASSOCIATIONb = regression coef.G = individual’s genotype

Page 36: Association analysis Shaun Purcell Boulder Twin Workshop 2004

• POPULATION MODEL– Allele & genotype frequencies– Demographics & population history– Linkage disequilibrium, haplotype structure

• TRANSMISSION MODEL– Mendelian segregation– Identity by descent & genetic relatedness

• PHENOTYPE MODEL– Biometrical model of quantitative traits– Additive & dominance components

Components of a Genetic Components of a Genetic TheoryTheory

G

G

G

G

G

G

G

G

Time

G

G

G

G

G

G

G

G

GG

G

G

G

G

GG

PP

Page 37: Association analysis Shaun Purcell Boulder Twin Workshop 2004

3/5 2/6

3/2 5/2

3/5 2/6

3/6 5/6

Both families are ‘linked’ with the marker…

…but a different allele is involved.

Linkage without associationLinkage without association

Page 38: Association analysis Shaun Purcell Boulder Twin Workshop 2004

3/6 2/4

3/2 6/2

3/5 2/6

3/6 5/6

All families are ‘linked’ with the marker…

… and allele 6 is ‘associated’ with disease

4/6 2/6

6/6 6/6

Linkage is just association within families

Linkage and associationLinkage and association

Page 39: Association analysis Shaun Purcell Boulder Twin Workshop 2004

3/6

2/43/2

6/23/5

2/5

3/6 5/6

Allele 6 is more common in the GREEN populationThe disease is more common in the GREEN population

… a ‘spurious association’

4/62/6

6/6

2/2

3/4

5/2

Controls Cases

Association without Association without linkagelinkage

Page 40: Association analysis Shaun Purcell Boulder Twin Workshop 2004

TDTTDT

• Transmission disequilibrium test– test for linkage and association

AA Aa

Aa AA

AA AA

Aa

aa AA

Aa

Aa Aa

Page 41: Association analysis Shaun Purcell Boulder Twin Workshop 2004

TDT “A” disease alleleTDT “A” disease allele

AA x Aa AA x Aa aa x Aa aa x Aa

AA Aa Aa aa

+ - + -

0.5 0.5 + -

+ - 0.5 0.5

Additive

Dominant

Recessive

Page 42: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Between and within Between and within componentscomponents

Sib1

Sib2

Sib1 = B - W

Sib2 = B + W

Page 43: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Between and within Between and within componentscomponents

• Fulker et al (1999)

S1 S2 S1 S2 B W S1 S2

AA AA 1 1 1 0 B+W B-W

AA Aa 1 0 0.5 0.5

B+W B-W

AA aa 1 -1 0 1 B+W B-W

Note : W = S1 – B

Page 44: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Parental genotypesParental genotypes

• Use parental genotypes to generate B

• Examples– AA from AAxAA W = 0

– Aa from AAxAa W = -0.5

– Aa from AaxAa W = 0

Pat Mat

B

1 1 1

1 0 0.5

1 -1 0

0 1 0.5

0 0 0

0 -1 -0.5

-1 1 0

-1 0 -0.5

-1 -1 -1

Page 45: Association analysis Shaun Purcell Boulder Twin Workshop 2004

assoc.mxassoc.mx

• Sibling pair sample

• B and W components precalculated in input file

• Single SNP genotype

• Quantitative trait

Page 46: Association analysis Shaun Purcell Boulder Twin Workshop 2004

assoc.datassoc.dat

-0.007 -0.972 -1 0 -0.5 -0.5 0.5 -0.829 -0.196 1 1 1 0 0 0.369 0.645 1 1 1 0 0 0.318 1.55 0 1 0.5 -0.5 0.5 1.52 0.910 0 0 0 0 0 -0.948 -1.55 1 1 1 0 0 0.596 -0.394 1 0 0.5 0.5 -0.5 -1.91 -0.905 0 1 0.5 -0.5 0.5 0.499 0.940 1 0 0.5 0.5 -0.5 -1.17 -1.29 1 0 0.5 0.5 -0.5 -0.16 -1.81 1 1 1 0 0

s1 s2 g1 g2 b w1 w2

Page 47: Association analysis Shaun Purcell Boulder Twin Workshop 2004

! Mx script for QTL association: sib pairs, univariate

Group 1 : Calc NG=2

Begin Matrices;! ** Parameters

B Full 1 1 free! association : between componentW Full 1 1 free ! association : within component

M Full 1 1 free ! meanS Full 1 1 free ! Shared residual varianceN Full 1 1 free! Nonshared residual variance

! ** Definition variables **C Full 1 1 ! association : between X Full 1 1 ! association : within, sib 1 Y Full 1 1 ! association : within, sib 2

End Matrices;

! ** Uncomment for B=W model ! Equate W 1 1 1 B 1 1 1

! Starting valuesMatrix B 0Matrix W 0Matrix M 0Matrix S 0.5Matrix N 0.5

End

Page 48: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Group2 : Data Group Data NI=7 NO=0 RE file=assoc.dat Labels Sib1 Sib2 g1 g2 b w1 w2 Select Sib1 Sib2 b w1 w2 / Definition b w1 w2 /

Matrices = Group 1

Means M + B*C + W*X | M + B*C + W*Y / Covariance

S + N | S _ S | S + N /

Specify C b / Specify X w1 / Specify Y w2 /

End

Page 49: Association analysis Shaun Purcell Boulder Twin Workshop 2004

ModelsModels

B & W B Full 1 1 free W Full 1 1 free!Equate W 1 1 1 B 1 1 1

B = W B Full 1 1 free W Full 1 1 freeEquate W 1 1 1 B 1 1 1

B B Full 1 1 free W Full 1 1!Equate W 1 1 1 B 1 1 1

B=W=0B Full 1 1 W Full 1 1!Equate W 1 1 1 B 1 1 1

Page 50: Association analysis Shaun Purcell Boulder Twin Workshop 2004

TestsTests

Test HA H0

Standard association test B = WB=W=0

Test of stratification B & W B = W

Robust association test B & W B

Page 51: Association analysis Shaun Purcell Boulder Twin Workshop 2004

assoc.mxassoc.mx

Model B W -2LL df

B & W -0.478 -0.365 2103.96 795

B = W -0.420 -0.420 2105.05 796

B -0.4778 2127.01 796

B=W=0 2163.34 797

Test of total association HA B=W 2105.05 H0 B=W=0 2163.34

Δ-2LL = 58.29, df = 1, p < 1e-14

Page 52: Association analysis Shaun Purcell Boulder Twin Workshop 2004

assoc.mxassoc.mx

Model B W -2LL df

B & W -0.478 -0.365 2103.96 795

B = W -0.420 -0.420 2105.05 796

B -0.4778 2127.01 796

B=W=0 2163.34 797

Test of stratification HA B &W 2103.96 H0 B = W 2105.05

Δ-2LL = 1.09, df = 1, p =0.29

Page 53: Association analysis Shaun Purcell Boulder Twin Workshop 2004

assoc.mxassoc.mx

Model B W -2LL df

B & W -0.478 -0.365 2103.96 795

B = W -0.420 -0.420 2105.05 796

B -0.4778 2127.01 796

B=W=0 2163.34 797

Test of within association HA B &W 2103.96 H0 B 2127.01 Δ-2LL = 23.06, df = 1, p < 1e-6

Page 54: Association analysis Shaun Purcell Boulder Twin Workshop 2004

ImplementationImplementation

• QTDT– Abecasis et al (2001) AJHG– extends between/within model to

general pedigrees– multiple alleles– covariates– combined test of linkage and

association– discrete as well as quantitative traits

Page 55: Association analysis Shaun Purcell Boulder Twin Workshop 2004

Linkage Linkage AssociationAssociation

• families

• detectable over large distances >10 cM

• large effects OR >3, variance>10%

• unrelateds or families

• detectable over small distances <1 cM

• small effects OR<2, variance<1%