association tests for rare variants using sequence data

Click here to load reader

Upload: signa

Post on 24-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Association Tests for Rare Variants Using Sequence Data. Guimin Gao , Wenan Chen, & Xi Gao Department of Biostatistics, VCU. Introduction to Association tests: two hypotheses. Common variant-common disease Common variant: Minor allele frequencies (MAF) >= 5% - PowerPoint PPT Presentation

TRANSCRIPT

Slide 1

Guimin Gao, Wenan Chen, & Xi Gao

Department of Biostatistics, VCUAssociation Tests for Rare Variants Using Sequence DataIntroduction to Association tests: two hypothesesCommon variant-common diseaseCommon variant: Minor allele frequencies (MAF) >= 5%Using linkage disequilibrium(LD)Rare variant-common diseaseRare variant: MAF < 1% (or 5%)High allelic heterogeneity: collectively by multiple rare variants with moderate to high penetrancesAssociations through LD would not be suitable

2Association tests for Common variantsTest a single marker each timeCochran-Armitages trend test (CATT) (assuming additive (ADD))Power: High for additive (ADD) or Multiplicative (MUL); low recessive (REC) or Dominant (DOM)Genotype association test (GAT) using chi-square statisticPower: a little lower for ADD, higher for REC MAX3 = maximum of three trend test statistics across the REC, ADD, and DOM models (Freidlin et al. 2002 Hum Hered.)Power: lower than CATT under ADD higher than CATT & CAT under REC

Association tests for Common variantsTest for single marker (CATT, GAT, & MAX3)Low power when MAF CAST(WSM) > CMC may not be true in other situationsCan be applied to rare variants & common variantsDisadvantage:Give very high weights to very rare alleles (singleton), very low weights to common variants.An evaluation of the CMC method and Weighted sum method by using GAW 17 dataBoth methods are powerful (based on the authors simulation)Our evaluation based on simulated datasets from GAW 17GAW 17 data: a subset of genes with real sequence data available in the 1000 genome projectSimulated phenotypesUnrelated individuals, families Dataset of 697 unrelated individuals24487 SNPs in 3205 genes from 22 autosomal chromosomesOnly test for the 2196 genes with non-synonymous SNPs

In favor of their method13GAW 17 dataset of unrelated individualsFour phenotypes: Q1, Q2, Q4 and disease status. Q1, Q2, and Q4 are quantitative traits Q1 associated with 39 SNP in 9 genes, Q2 associated with 72 SNPs in 13 genesQ4: not related to any genesDisease status is a binary trait: affected or unaffected, associated with 37 genes200 simulated phenotype replicatesOnly one replicate of genotype data (original data)

Methods: case-control designTransform Q1, Q2, Q4 into binary traitsSplitting at the top 30% percentile of the distributions

Transforming PhenotypesCriteria for evaluation of TestsFamilywise error rate (FWER)2196 genes with non-synonymous SNPs, 2196 tests 2196 null hypotheses Hj0: gene not associated with the traitQ1 associated in 9 genes, 9 null hypotheses are not true.(2196-9) null hypotheses are trueFWER = Pr(reject at least one true null hypothesis) = Nf/200Nf : No. of replicates, at least one true hypothesis are rejected

Average PowerMean of power for all the 9 genes that affect the phenotypesEvaluating power: Q1, Q2, DiseaseEvaluate FWER: Q4

the corresponding 16Distribution of MAF in the GAW 17 dataset

Figure 1. Distribution of MAF of 24487 SNPs in GAW 17Figure 1. Group SNPs based on MAFs for CMC0 - 0.010.01 - 0.1>=0.1Similar to Madsen & Browning (2009)Table 1: Average powerTraitsCMC methodWeighted sum methodQ10.1440.112Q20.006150.00308Disease0.004440.00500Table 2: FWER (nominal = 0.05)TraitCMC methodWeighted sum methodQ40.1150.0100 CMC has FWER inflation Population stratification or admixture, Samples from Asian, Europe, Relatedness among samples Similar results in Power and FWER were reported at GAW 17Variable-Threshold Approach (Price et al 2010)Given a threshold T, calculate a score for indiv j

Iij = 0, 1, 2, the count of the minor allele of indiv i at locus j

Calculate the sum of score for cases:

Calculate Z(T) = V(T)/Var(V(T))Find T to maximize Z(T), Zmax = max (Z(T))Permutation to estimate p-value for Zmax Power: >CMC; Extended to quantitative traits

A weighted approach (Price et al 2010)Calculate a weighted score for indiv j

Iij = 0, 1, 2Calculate the sum of score for cases

Possible weight

Power: similar to the weighted sum method (Madsen & Browning 09)

A weighted approach (Price et al 2010)Calculate the sum of score for cases

Iij = 0, 1, 2Calculate weight by the prediction of functional effectsPolyPhen-2 is used to predict damaging effects of missense mutations with probabilistic scores.Probabilistic scores as weights may reduce the noise of non-functional variants.Higher Power than other methods

A data-adaptive sum test (Han & Pan 2010, Hum Hered)Logistic model

xij = 0, 1, 2, the count of the minor allele of indiv i at locus j Effect on opposite directions

If j