associating genomic v ariations with phenotypes

21
1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine

Upload: fleta

Post on 22-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Associating Genomic V ariations with Phenotypes. M odel comparison , rare variants , and analysis pipeline. Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine. Data & Question. Genotypes: SNP Insertion Deletion Duplication - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Associating Genomic  V ariations with Phenotypes

1

Associating Genomic Variations with

Phenotypes

Model comparison, rare variants, and analysis pipeline

Qunyuan ZhangDivision of Statistical Genomics & Genome InstituteWashington University School of Medicine

Page 2: Associating Genomic  V ariations with Phenotypes

2

Data & Question

Relationshipbetween X and Y ?

nmnnn

m

m

xxxyn

xxxyxxxy

XYi

.......................

...2

...1

21

222212

112111

Genotypes:SNP

InsertionDeletion

DuplicationInversion

Translocation…

Phenotypes(quantitative,categorical)

Page 3: Associating Genomic  V ariations with Phenotypes

3

Linkage & Association

Association: (Y,X)

Linkage: (Y,Q)Q is unobservable

...

.....................

...2

...1

221

2222212

1212111

nnnn xqxyn

xqxyxqxy

XYi Genotypes

Phenotype

Putative QTL

r1 Q r2

Page 4: Associating Genomic  V ariations with Phenotypes

4

A Fixed-effect Mixture Model For LinkageCommonly used in plant genetics

r1 Q r2

P1 X P2

F1

F2

3

1

),|()(j

iji rXQPyf

2)(

21exp

21

j

jiy

j

n

iiyfYL

1

)()(

SNP A SNP B

Page 5: Associating Genomic  V ariations with Phenotypes

5

A Variance-component Model For LinkageCommonly used in human genetics

r1 Q r2

)()(

21exp

||)2(1)( 1

2/12/

YYYL Tn V

V

222)( eggQQYCov IΔΔV

Background IBD matrix

QTL IBD matrix

Diagonal unit matrix

SNP A SNP B

Page 6: Associating Genomic  V ariations with Phenotypes

6

Variance-component Model = Random-effect Linear Model

222eggQQ IΔΔV

eγZγZμ ggQQY

),0( 2QQMVN Δ ),0( 2

ggMVN Δ ),0( 2eN

)()(

21exp

||)2(1)( 1

2/12/

YYYL Tn V

V

Random effects

Page 7: Associating Genomic  V ariations with Phenotypes

7

From Linkage to Association

22egg IΔV

eγZγZμ ggQQY

)()(

21exp

||)2(1)( 1

2/12/

XYXYYL Tn V

V

eγZXβμ ggY

marker effect(s)

Family-based association model

Linkage model

QTL effect(s)

fixed effect(s)

Page 8: Associating Genomic  V ariations with Phenotypes

8

A Simple Association ModelFor Unrelated Subjects

2eIV

)()(

21exp

||)2(1)( 1

2/12/

XYXYYL Tn V

V

eXβμ Y

n

i e

i Xy

e1

2)(21exp

21

Page 9: Associating Genomic  V ariations with Phenotypes

9

Covariate(s): Adjusting For Confounder(s)

eβXXβμ CCY

Observed confounders: age, sex etc.Hidden confounders: population structure

Population structure can be estimated by:-PCA-Clustering-Admixture/ancestry

Page 10: Associating Genomic  V ariations with Phenotypes

10

Modeling Hidden Genetic CorrelationBetween Subjects

22egg IΔV

eγZβXXβμ ggCCY

marker fixed effect(s)

Family data, pedigree => IBD matrixPopulation data, hidden, marker data => IBS matrix

covariate fixed effect(s)

Genetic background random effects

Page 11: Associating Genomic  V ariations with Phenotypes

11

Modeling Rare Variants

eγZβXXβμ ggCCY

...11 XY μ

......2211 kkXXXY μ

Common variants, tested individually, H0: β1=0. One p-value per variant

Rare variants, tested as an entire group (burden test), usually by geneH0: β1= β2=…=βk=0 . One p-value per group of variants

Incorporated with variable selection, with loose criteria

β can be treated as random effects, variance components test, can be weighted by prior information

Page 12: Associating Genomic  V ariations with Phenotypes

12

Collapsing Model

......2211 kkXXXY μ

... XY μ

110

001311020001

321 XXXXsubject

Collapsing multiple variables into one

Page 13: Associating Genomic  V ariations with Phenotypes

13

Weighted Sum Model......2211 kkXXXY μ

...)(1

k

jjjXwY μ

2.08.00.0

001311020001

3.05.02.0 1

3

1

2

1

1 SwX

wX

wXsubject

Weighted sum score

... SY μ

Page 14: Associating Genomic  V ariations with Phenotypes

14

Weighting VariantsBase on allele frequency, continuous or binary(0,1) weight,

variable threshold;Based on function annotation/prediction;Based on sequencing quality (coverage, mapping quality,

genotyping quality, validated or not etc.);Data-driven, using both genotype and phenotype data,

learning weights (including effect directions) from data, requiring permutation test;

Any combination …

Grouping VariantsBy gene By transcript By exonBy gene set / pathway By protein domain……

Page 15: Associating Genomic  V ariations with Phenotypes

15

Modeling More Data TypesGeneralized Linear (Mixed) Model

eXβμ ...)(Yg

Link function

For binary Y, logistic model

)0(1)1(log)(log)(

YPYPYitYg

1)...exp()...exp()1(

eXβμeXβμYP

Page 16: Associating Genomic  V ariations with Phenotypes

16

Longitudinal Data (quantitative)

Fixed effect, time as covariate

Repeated measures, random effect, correlation within subjects

Time

Page 17: Associating Genomic  V ariations with Phenotypes

17

Longitudinal Data (binary)

Linear model, time as covariate

Survival analysis, CoxPH model etc.

Time

Page 18: Associating Genomic  V ariations with Phenotypes

18

Tools

SAS ProceduresREG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST

R Functions/Packageslm (), glm()gee, nlme, kinship2/coxme, lme4, survival

Other ProgramsSOLAR, MMAP, EMMA, EMMAX, SKAT

Page 19: Associating Genomic  V ariations with Phenotypes

19

Pipeline

job1 job2 ….. Job N

Input (data + options)

Options.jobi => self-programmed modules (SAS, R,…)

Options.jobi => external program modules (MMAP, SKAT,..)

Result 1

Result 2

….. Result N

Job generating/submitting module

Job number controlling module

Job status monitoring module (all done ?)

Yes

Result summarizing module

no

Wait …

LSF bsub

Page 20: Associating Genomic  V ariations with Phenotypes

20

gwas.sh options.gwa

#!/bin/shOPFILE=$1...…

[DATA]database=SASgenotype_dir=/dsg1/gwas/fhsgenogenotype_file=

phenotype_file=fhs100markerinfo_file=mapallmarker_selection=MAF>0.01pedigree_file=pediallsubjectID=subjectpedgreeID=famidmarkername=snp…[ANALYSIS]phenolist_file=pheno_list=bmi/qtcovariates=program=SASGLManalysis=mixed[OUTPUT]output_dir=/dsguser/qunyuan/fhs/bmioutput_file=output_replace=no[RUN]clusterjobname=bmimixedmemsize=1000Mmaxjobn=300…

Pheno type covar program analysis runBmi qt age,sex SASGLM mixed YESObes ql NA SASGLM gee YESHD ql age SASGLM gee NOAge …Sex ……

Program language location Maintainer SASGLM SAS /dsg1/code/sas/glm.sas Q.ZhangGSTAT R /dsg1/code/R/gstat.R Q.ZhangMMAP C /dsg1/code/sas/mmap.sh J. Czajkowski…

Page 21: Associating Genomic  V ariations with Phenotypes

21

Thanks !