multi-skat: general framework to test multiple phenotype ... · study used for our data...

Multi-SKAT: General framework to test multiple phenotype

associations of rare variants

Diptavo Dutta1,2, Laura Scott1,2, Michael Boehnke1,2, and Seunggeun Lee ∗1,2

1Department of Biostatistics

2Center for Statistical Genetics

University of Michigan

Ann Arbor, Michigan, USA

Abstract

In genetic association analysis, a joint test of multiple distinct phenotypes can increase power to

identify sets of trait-associated variants within genes or regions of interest. Existing multi-phenotype

tests for rare variants make specific assumptions about the patterns of association of underlying causal

variants, and the violation of these assumptions can reduce power to detect association. Here we develop

a general framework for testing pleiotropic effects of rare variants based on multivariate kernel regression

(Multi-SKAT). Multi-SKAT models effect sizes of variants on the phenotypes through a kernel matrix

and performs a variance component test of association. We show that many existing tests are equivalent

to specific choices of kernel matrices with the Multi-SKAT framework. To increase power to detect

association across tests with different kernel matrices, we developed a fast and accurate approximation

of the significance of the minimum observed p-value across tests. To account for related individuals, our

framework uses a random effects for the kinship matrix. Using simulated data and amino acid and exome-

array data from the METSIM study, we show that Multi-SKAT can improve power over single-phenotype

SKAT-O test and existing multiple phenotype tests, while maintaining type I error rate.

∗Corresponding author. Email: [email protected]

1

.CC-BY-ND 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/229583doi: bioRxiv preprint

https://doi.org/10.1101/229583

http://creativecommons.org/licenses/by-nd/4.0/

Introduction

With the advent of array genotyping technologies, genome-wide association studies (GWAS) have identified

numerous genetic variants associated with complex traits. Despite these discoveries, GWAS loci explain only

a small proportion of heritability for most traits. This may be due, in part, to the fact that the analysis

framework of such association studies is underpowered to identify associations in rare variants1. It has been

argued that rare variants play important roles in the underlying genetics of complex traits2;3. To identify

such rare variant associations, gene- or region-based multiple variant tests have been developed. By jointly

testing rare variants in a target gene or region, these methods can increase power over a single variant test

and are now a standard approach in rare variant analysis4.

Recent GWAS results have shown that many GWAS loci are associated with multiple traits5. For

example, 44% of autoimmune risk SNPs have been estimated to be associated with multiple autoimmune

diseases6. Nearly 17% of genes in NHLBI GWAS categories are associated with multiple traits7. Detecting

such pleiotropic effects is important to understand the underlying biological structure of complex traits. In

addition, by leveraging cross-phenotype associations, the power for detecting trait-associated variants can

be increased.

Identifying the cross-phenotype effects requires a suitable joint or multivariate analysis framework that

can leverage the inter-dependency of the phenotypes. Various methods have been proposed for multiple

phenotype analysis in GWAS8;9;10;11;12. Extending them, several groups have developed multiple phenotype

tests for rare variants13;14;15;16;17;18;19. For example, Wang et al.13 proposed using a multivariate functional

linear model (MFLM) to this end; Broadaway et al.14 used a non-parametric approach, called GAMuT, to

test such effects and Wu et al.15 developed a score based sequence kernel association test for multiple traits,

MSKAT, which has been shown to be similar in performance to GAMuT14.

Despite these developments, existing methods have important limitations. Since these methods were

developed under specific assumptions on genetic effect structure to multiple phenotypes, specifically the

relationship between the effects of different genetic variants within a region, they can lose power if the un-

derlying assumptions are not satisfied12. For example, if genetic effects are heterogeneous across multiple

phenotypes, methods assuming homogeneous genetic effects can lose substantial amount of power16. Al-

though combining multiple models can improve power for common variants12, no rare variant tests have

aggregated different models for multiple phenotypes to yield robust results. In addition, most existing meth-

ods and softwares cannot adjust for the relatedness between individuals; thus, to apply these methods,

related individuals must be removed from the analysis to maintain type I error rates. For example, in the

study used for our data application, the METabolic Syndrome In Men (METSIM) study, nearly 15% of

2


https://doi.org/10.1101/229583


individuals are estimated to be related up to the second degree, and removing them from the analysis would

greatly reduce the power.

In this article, we develop a class of methods Multi-SKAT that extend the kernel regression framework in

the mixed effect models to incorporate multiple correlated continuous phenotypes while accounting for family

relatedness. Mixed effect models have been useful modeling structures for rare-variant association tests. For

example, popular rare variant tests such as SKAT20 and SKAT-O21 are based on mixed effect models. By

using kernels for the correlated phenotypes and multiple variants in a target region, Multi-SKAT allows

for flexible modeling of the genetic effects on the phenotypes. Many of the existing methods for multiple

phenotype rare variant tests can be viewed as special cases of Multi-SKAT with particular choices of kernels.

Since Multi-SKAT is score statistics-based, it requires fitting the null model only once for genome-wide data.

In addition, Multi-SKAT is computationally efficient as it can calculate asymptotic p-values. To avoid loss

of power due to model misspecification, we develop computationally efficient omnibus tests, which allow

aggregating tests over several kernels producing robust power of detection. Additionally, Multi-SKAT can

correct for relatedness by incorporating estimated kinship information.

The article is organized as follows: in the first section, we present the multivariate mixed effect model and

kernel matrices. We particularly focus on the phenotype-kernel and describe omnibus procedures that can

aggregate results across different choices of kernels and kinship adjustment. We then describe the simulation

experiments that clearly demonstrate that Multi-SKAT tests have increased power to detect associations

compared to existing methods like GAMuT, MSKAT and others. Finally, we apply Multi-SKAT tests to

detect the cross-phenotype effects of rare nonsynonymous and protein-truncating variants on a set of nine

amino acids from 8,545 Finnish men from the METSIM study.

Material and Methods

Single-phenotype region-based tests

To describe the Multi-SKAT tests, we first present the existing model of the single-phenotype gene or region-

based tests. Let yk = (y1k, y2k, · · · , ynk)T be an n× 1 vector of the kth phenotype over n individuals; X an

n× q matrix of the q non-genetic covariates including the intercept; Gj = (G1j , · · · , Gnj)T is an n× 1 vector

of the minor allele counts (0, 1, or 2) for a binary genetic variant j; and G = [G1, · · · , Gm] is an n × m

genotype matrix for m genetic variants in a target region. The following regression model can be used to

3


https://doi.org/10.1101/229583


relate m genetic variants to phenotype k,

yk = Xαk +Gβ.k + εk (1)

where αk is a q × 1 vector of regression coefficients of q non-genetic covariates, β.k = (β1k, · · · , βmk)T is an

m× 1 vector of regression coefficients of the m genetic variants, and εk is an n× 1 vector of non-systematic

error term with each element following N(0, σ2k). To test for H0 : β.k = 0, a variance component test

under the mixed effects model have been proposed to increase power over the usual F-test20. The variance

component test assumes that the regression coefficients, β.k, are random variables and follow a centered

distribution with variance τ2ΣG (see below). Under these assumptions, the test for β.k = 0 is equivalent to

testing τ = 0. The score statistic for this test is

Q = (yk − µk)TGΣGGT (yk − µk) (2)

where µk = Xαk is the estimated mean of yk under the null hypothesis of no association. The test statistic

Q asymptotically follows a mixture of chi-squared distributions under the null hypothesis and p-values can

be computed by inverting the characteristic function22.

The kernel matrix ΣG plays a critical role; it models the relationship among the effect sizes of the variants

on the phenotypes. Any positive semidefinite matrix can be used for ΣG providing a unified framework

for the region-based tests. A frequent choice of ΣG is a sandwich type matrix ΣG = WRGW , where

W = diag(w1, .., wm) is a diagonal weighting matrix for each variant, and RG is a correlation matrix between

the effect sizes of the variants. RG = Im×m implies uncorrelated effect sizes and corresponds to SKAT, and

RG = 1m1mT corresponds to the burden test, where Im×m is an m×m diagonal matrix and 1m = (1, · · · 1)T

is an m × 1 vector with all element being unity. Furthermore, a linear combination of these two matrices

corresponds to RG = ρ1m1mT + (1− ρ)Im×m, which is used for SKAT-O.

Multiple-phenotype region-based tests

Extending the idea of using kernels, we build a model for multiple phenotypes. To model K correlated

phenotypes in n unrelated individuals, we use the following multivariate linear model

Y = XA+GB + E (3)

4


https://doi.org/10.1101/229583


where Y = (y1, · · · , yK) denotes an n×K phenotype matrix; A is an n × q matrix of coefficients of X; B

= (βij) is a p×K matrix of coefficients where βij denotes the effect of the ith variant on the jth phenotype

and E is an n×K matrix of non-systematic errors. Let vec(·) denote the matrix vectorization function, and

then vec(E) follows N(0, In ⊗ V ), where V is a K ×K covariance matrix and ⊗ represents the Kronecker

product.

In addition to assuming that β.k follows a centered distribution with covariance τ2ΣG, we further assume

that βj. = (βj1, · · · , βjK)T , which is the regression coefficients of variant j to K multiple phenotypes, follows

a centered distribution with covariance τ2ΣP , which implies that vec(B) follows a centered distribution with

covariance τ2ΣG ⊗ΣP . Similarly as before, the null hypothesis H0 : vec(B) = 0 is equivalent to τ = 0. The

corresponding score test statistic is

Q = {vec(Y )− vec(µ)}T{

(GΣGGT )⊗ (V −1ΣP V

−1)}{vec(Y )− vec(µ)} (4)

where µ and V are the estimated mean and covariance of Y under the null hypothesis.

We note that ΣP plays a similar role as ΣG but with respect to phenotypes. ΣP represents a kernel in the

phenotypes space and models the relationship among the effect sizes of a variant on each of the phenotypes.

Any positive semidefinite matrix can be used as ΣP .

The proposed approach provides a double-flexibility in modeling. Through the choice of structures for

ΣG and ΣP , we can control the dependencies of genetic effects. Most of our hypotheses about the underlying

genetic structure of a set of phenotypes can be modeled through varying structures of these two matrices.

Phenotype kernel structure ΣP

The use of ΣG has been extensively studied previously in literature20;21. Here we propose several choices for

ΣP and study their effect from a modeling perspective.

Homogeneous (Hom) Kernel

It is possible that effect sizes of a variant on different phenotypes to be homogeneous in which βj1 = · · · = βjK .

Under this assumption,

ΣP,Hom = 1K1TK (5)

Under ΣP,Hom, βjk, (k = 1, · · ·K) for a variant j are the same for all the phenotypes.

Heterogeneous (Het) Kernel

5


https://doi.org/10.1101/229583


Effect sizes of a variant to different phenotypes can be heterogeneous in which βj1 6= · · · 6= βjK . Under this

assumption, we can construct

ΣP,Het = Ik×k (6)

The ΣP,Het implies that (βj1, · · · , βTjK) are uncorrelated. This also indicates that the correlation among the

phenotypes is not affected by this particular region or gene.

Phenotype Covariance (PhC) Kernel

We may model ΣP as proportional to the estimated residual covariance across the phenotypes as,

ΣP,PhC = V (7)

where V is the estimated covariance matrix among the phenotypes. This model assumes that the correlation

of the phenotypes is proportional to that of between the effect sizes.

Principal Component (PC) Kernel

Principal component analysis (PCA) is a popular tool for multivariate analysis. In multiple phenotype tests,

PC-based approaches have been used to reduce the dimension in phenotypes23. Here we show that PC-based

approach can be included in our framework.

Let Li be the loading vector for the ith PC, which produces the ith PC score Pi = Y Li. In PCA-based

analysis, PC scores are used as outcomes instead of original Y . Since the genetic information regarding the

phenotypes may not be confined to the top few PCs23, we first consider using all PCs. Let P = (P1, · · ·PK).

Since PCs are orthogonal, we assume genetic effects to multiple PCs are heterogeneous, which resulted in

Q = {vec(P )− vec(µP )}T{(GΣGG

T)⊗(V −1P V −1

P

)}{vec(P )− vec(µP )} (8)

where µP is the mean of P under the null hypothesis and VP is the estimated covariance matrix between

the PC’s. VP will be a diagonal matrix since PCs are orthogonal. The equation (8) can be written as

Q = {vec(y)− vec(µ)}T{(GΣGG

T)⊗(LV −1

P V −1P LT

)}{vec(y)− vec(µ)} (9)

where L = (L1, · · · , LK) is a K × K PC loading matrix. The equation (9) shows that by using ΣP,PC =

V LVP−1VP

−1LT V , we can carry out PC-based tests.

Instead of using all the PC’s, we can use selected PC’s that represent the majority of cumulative variation

6


https://doi.org/10.1101/229583


in phenotypes. For example, we can jointly test the PC’s that have cumulative variance of 90%. If the top

t PC’s have been chosen for analysis using ν% cumulative variance as cutoff, we can use

ΣP,PC−ν = V LselV−1P V −1

P LTselV

where Lsel = [L1, · · · , Lt, 0, · · · , 0] and 0 represents a vector of 0’s of appropriate length.

Relationship with other Multiple-Phenotype rare variant tests

We have proposed a uniform framework of Multi-SKAT tests that depend on ΣG and ΣP . There are certain

specific choices of kernels that correspond to other published methods.

• Using ΣP,PhC and ΣG = WImWT is identical to GAMuT14(with linear kernel) and MSKAT15.

• Using ΣP,Hom and ΣG = WImWT is identical to hom-MAAUSS16.

• Using ΣP,Het and ΣG = WImWT is identical to het-MAAUSS16 and MF-KM19.

For the detailed proof, please see Appendix A.

Minimum p-value based omnibus tests (minP)

The model and the corresponding test of association that we propose has two parameters, ΣG and ΣP , which

are absent in the null model of no association. Since our test is a score test, ΣG and ΣP cannot be estimated

from the data. One possible remedy is to select ΣG and ΣP based on prior knowledge; however, if the

selected ΣG and ΣP do not reflect underlying biology, the test may have substantially reduced power12;16.

In an attempt to achieve robust power, we aggregate results across different ΣG and ΣP . Here we propose a

general strategy to combine the p-values of a number of Multi-SKAT tests with different kernels to produce

an aggregate p-value of association.

Suppose pi is the p-value for Qi with given ith ΣG and ΣP , i = 1, · · · , b, and TP = (p1, · · · , pb)T is an

b× 1 vector of p-values of b such Multi-SKAT tests. The minimum p-value test statistic after the Bonferroni

adjustment is b× pmin, where pmin is the minimum of the b p-values. In the presence of positive correlation

among the tests, this approach is conservative and hence might lack power of detection. Rather than using

Bonferroni corrected pmin, more accurate results can be obtained if the joint null distribution or more

specifically the correlation structure of TP can be estimated. Here we adopt a resampling based approach

to estimate this correlation structure. Note that our test statistic is equivalent to

Q = {vec(y)− vec(µ)}T (In ⊗ V −1)(G⊗ IK)(ΣG ⊗ ΣP )(GT ⊗ IK)(In ⊗ V −1) {vec(y)− vec(µ)} (10)

7


https://doi.org/10.1101/229583


Under the null hypothesis (In ⊗ V − 12 )(vec(y) − vec(µ)) approximately follows an uncorrelated normal dis-

tribution N(0, InK). Using this, we propose the following resampling algorithm

• Step 1. Generate nK samples from an N(0, 1) distribution, say SR.

• Step 2. Calculate b different test statistics as TR = STR

{(GΣGG

T )⊗ (V − 12 ΣP V

− 12 )}SR for all the

choices of ΣP and calculate p-values.

• Step 3. Repeat the previous steps independently for R(= 1000) iterations, and calculate correlation

between the p-values of the tests from the R resampling p-values.

With the estimated null correlation structure, we use Copula to approximate the joint distribution of

TP24. Copula is a statistical approach to construct joint multivariate distribution using marginal distribution

of each variable and correlation structure. Since marginally each test statistic Q follows a mixture of chi-

square distribution, which has a heavier tail than normal distribution, we propose to use a t-Copula to

approximate the joint distribution, i.e, we assume the joint distribution of TP to be multivariate t with the

estimated correlation structure. The final p-value for association is then calculated from the distribution

function of the assumed t-Copula.

When calculating the correlation across the p-values, Pearson’s correlation coefficient can be unreliable

since it depends on normality and homoscedasticity assumptions. To avoid such assumptions we recommend

estimating the null-correlation matrix of the p-values through Kendall’s tau (τ), which is a non-parametric

approach based on concordance of ranks.

The minimum p-value approach can be used to combine different ΣP given ΣG, or combine both ΣP and

ΣG. To differentiate the latter, we will call it minPcom which combines SKAT and Burden type kernels of

ΣG.

Adjusting for relatedness

We formulated model (3) and corresponding tests under the assumption of independent individuals. If

individuals are related, this assumption is no longer valid, and the tests may have inflated type I error rates.

Since our method is regression-based, we can relax the independence assumption by introducing a random

effect term representing the correlation structure of phenotypes among related individuals.

Let Φ be the kinship matrix of the individuals and Vg is a co-heritability matrix, denoting the shared

heritability between the phenotypes. Extending the model (3), we incorporate Φ and Vg as

Y = XA+GB + Z + E (11)

8


https://doi.org/10.1101/229583


where Z is an n ×K matrix with vec(Z) following N(0,Φ ⊗ Vg). Z represents a matrix of random effects

arising from shared genetic effects between the samples due to the relatedness. The remaining terms are the

same as in (3). The corresponding score test statistic is

QKin = {vec(Y )− vec(µ)}T Ve−1

(G⊗ Ik)(ΣG ⊗ ΣP )(GT ⊗ Ik)Ve−1{vec(y)− vec(µ)} (12)

where Ve = Φ⊗ Vg + In⊗ V is the estimated covariance matrix of vec(Y ) under the null hypothesis. Similar

to the previous versions for unrelated individuals, QKin asymptotically follows a mixture of chi-square under

the assumption of no association.

This approach depends on the estimation of the matrices Φ, Vg and V . The kinship matrix Φ can

be estimated using the genome-wide genotype data25. Several of the published methods like LD-Score26,

PHENIX27 and GEMMA10 28 can jointly estimate Vg and V . In our numerical analysis, we have used

PHENIX. This is an efficient method to fit local maximum likelihood variance components in a multiple

phenotype mixed model through an E-M algorithm.

Once the matrices Φ, Vg and V are estimated, we compute the asymptotic p-values for QKin by using

a mixture of chi-square distribution. The computation of QKin requires large matrix multiplications, which

can be time and memory consuming. To reduce computational burden, we employ several transformations.

We perform an eigen-decomposition on the kinship matrix Φ as Φ = UΛUT , where U is an orthogonal

matrix of eigenvectors and Λ is a diagonal matrix of corresponding eigenvalues. We obtain the transformed

phenotype matrix as Y = Y U , the transformed covariate matrix as X = XU , the transformed random effects

matrix Z = ZU and transformed residual error matrix E = EU . The equation (11) can be transformed into

Y = XA+ GB + Z + E; vec(Z) ∼ N(0,Λ⊗ Vg); vec(e) ∼ N(0, I ⊗ V ) (13)

For each individual l, the transformed phenotypes given the transformed covariates asymptotically follow

independent (but not identical) multivariate normal distributions.

yl = Axl +Bgl + zl + el; zl ∼ N(0, dl ⊗ Vg); el ∼ N(0, V ); (14)

All the properties of the tests in the model (3) are directly applicable to those for (13). QKin can be

computed from this transformed model as,

QKin ={vec(Y )− vec(µ)

}TV −1e (G⊗ Ik)(ΣG ⊗ ΣP )(GT ⊗ Ik)V −1

e

{vec(Y )− vec(µ)

}(15)

9


https://doi.org/10.1101/229583


where Ve = Λ ⊗ Vg + In ⊗ V . Asymptotic p-values can be obtained from the corresponding mixture of

chi-squares distribution. Further, omnibus strategies developed for (3) are applicable in this case with

similar modifications. For example, the resampling algorithm for minimum p-value based omnibus test can

be implemented here as well by noting that V− 1

2e

{vec(Y )− vec(µ)

}approximately follows an uncorrelated

normal distribution.

Simulations

We carried out extensive simulation studies to evaluate the type I error control and power of Multi-SKAT

tests. Our simulation studies focus on evaluating the proposed tests when the number of phenotypes are 5, 6

or 9. For type I error simulations without related individuals and all power simulations, we generated 10,000

chromosomes over 1Mbp regions using a coalescent simulator with European demographic model29. Since

the average length of the gene is 3 kbps we randomly selected a 3 kbps region for each simulated dataset

to test for associations. For the type I error simulations with related individuals, to have a realistic kinship

structure, we used the METSIM study genotype data.

Phenotypes were generated from the multivariate normal distribution as

yi ∼MVN{(β1G1 + · · ·+ βmGm)I, V }, (16)

where yi = (yi1, · · · , yiK)T is the outcome vector, Gj is the genotype of the jth variant, and βj is the

corresponding effect size, and V is a covariance of the non-systematic error term. I is a k × 1 indicator

vector, which has 1 when the corresponding phenotype is associated with the region and 0 otherwise. For

example, if there are 5 phenotypes and the last three are associated with the region, I = (0, 0, 1, 1, 1)T .

To evaluate whether Multi-SKAT can control type I error under realistic scenarios, we simulated a

dataset with 9 phenotypes with a correlation structure identical to that of 9 amino acid phenotypes in the

METSIM data (See Supplementary Figure S1). Phenotypes were generated using equation (16) with β = 0.

Total 5, 000, 000 datasets with 5, 000 individuals were generated to obtain the empirical type-I error rates at

α = 10−4, 10−5 and 2.5× 10−6, which are corresponding to candidate gene studies to test for 500 and 5000

genes and exome-wide studies to test for all 20, 000 protein coding genes, respectively.

Next, we evaluated type I error controls in the presence of related individuals. To have a realistic related

structure, as aforementioned, we used the METSIM study genotype data. We generated a random subsample

of 5000 individuals from the METSIM study individuals and generated null values for the 9 phenotypes from

MVN(0, Ve), where Ve = Φ5k ⊗ Vg;5k + I ⊗ V5k, Φ5k is the estimated kinship matrix of the 5000 selected

individuals, Vg;5k and V5k are estimated co-heritability and residual variance matrices respectively for these

10


https://doi.org/10.1101/229583


samples. For each set of 9 phenotypes, we performed the Multi-SKAT tests for a randomly selected 5000

genes in the METSIM data. For the details about the data, see next section. We carried out this procedure

1000 times and obtained 5, 000, 000 p-values, and estimated type I error rates as proportions of p-values

smaller than the given level α.

Power simulations were performed both in situations when there was no pleiotropy (i.e., only one of the

phenotypes was associated with the causal variants) and also when there was pleiotropy. Under pleiotropy,

since it is unlikely that all the phenotypes are associated with genotypes in the region, we varied the number

of phenotypes associated. For each associated phenotype, 30% or 50% of the rare variants (MAF < 1%)

were randomly selected to be causal variants. We modeled the rarer variants to have stronger effect, as

|βj | = c|log10(MAFj)|. We used c = 0.3 which yields |βj | = 0.9 for variants with MAF= 10−3. Our choice of

β yielded the average heritability of associated phenotypes between 1% to 4%. We also considered situations

that all causal variants were trait-increasing variants (i.e. positive β) or 20 % of causal variants were trait-

decreasing variants (i.e. negative β). Empirical power was estimated from 1000 independent datasets at

exome-wide α = 2.5× 10−6.

In type I error and power simulations, we compared the following tests: Bonferroni adjusted minimum

p-values from SKAT on each phenotype (minPhen); Multi-SKAT with ΣP,Hom (Hom); Multi-SKAT with

ΣP,Het (Het); Multi-SKAT with ΣP,PhC (PhC); Multi-SKAT with ΣP,PC−0.9 (PC-Sel); Minimum P-value

of Hom, Het, PhC and PC-Sel using Copula (minP). For the Multi-SKAT tests, we used two different ΣG’s

corresponding to SKAT (i.e. ΣG = WW ) and Burden tests (i.e. ΣG = W1m1TmW ). We further evaluated

minPcom, by combining two ΣG’s (SKAT and Burden) and four ΣP ’s (Hom, Het, PhC and PC-Sel), and hence

a total of 8 tests. For the variant weighting matrix W = diag(w1, · · · , wm), we used wj = Beta(MAFj , 1, 25)

function to upweight rarer variants, as recommended by Wu et al.20.

Analysis of the METSIM study exomechip data

To investigate the cross-phenotype roles of low frequency and rare variants on amino acids, we analyzed

data on 8545 participants of the METSIM study on whom all 9 Amino acids (Alanine, Leucine, Isoleucine,

Glycine,Valine, Tyrosine, Phenylalanine, Glutamine, Histidine) were measured by proton nuclear magnetic

resonance spectroscopy. Samples were genotyped on the Illumina ExomeChip and OmniExpress arrays and

we included samples that passed sample QC filters30. The kinship between the samples was estimated via

KING (version 2.0). We adjusted the amino acid levels (AA) for age, age2 and BMI and inverse-normalized

the residuals. The phenotype correlation matrix after covariate adjustment is shown in Figure 3.

We included rare (MAF < 1%) nonsynonymous and protein-truncating variants with a total rare minor

11


https://doi.org/10.1101/229583


allele count of at least 5 for genes that had at least 3 rare variants leaving 5207 genes for analysis.

Results

Type I Error simulations

We estimated empirical type I error rates of the Multi-SKAT tests with and without related individuals. For

unrelated individuals, we simulated 5, 000 individuals and 9 phenotypes based on the correlation structure

for the amino acids phenotypes in the METSIM study data. For related individuals, we simulated 5, 000

individuals using the kinship matrix for randomly chosen METSIM individuals (see the Method section).

We performed association tests and type I error rates were estimated by the proportion of p-values less than

the specified α levels. Type I error rates of the Multi-SKAT tests were well maintained at α = 10−4, 10−5

and 2.5× 10−6 for both unrelated and related individuals (Table 1). For example, at level α = 2.5× 10−6,

the largest empirical type I error rates from any of the Multi-SKAT tests was 3.4× 10−6, which was within

the 95% confidence interval (CI = (1.6× 10−6, 4× 10−6)).

Power simulations

We compared the empirical power of the minPhen and Multi-SKAT tests. For each simulation setting,

we generated 1, 000 datasets with 5, 000 unrelated individuals and for each test estimated empirical power

as the proportion of p-values less than α = 2.5 × 10−6, reflecting Bonferroni correction for testing 20, 000

independent genes. Since Hom and Het are identical to hom-MAAUSS and het-MAAUSS, respectively, and

using PhC is identical to GAMuT (with linear kernel) and MSKAT, our power simulation studies effectively

compare the existing multiple phenotype tests.

In Figure 1, we show the results for 5 phenotypes with compound symmetric correlation structure with

the correlation 0.3 or 0.7, where 30% of rare variants (MAF < 0.01) were positively associated with 1, 2 or 3

phenotypes. In the most scenarios, PhC, PC-Sel and Het have greater power among the Multi-SKAT tests

with pre-specified phenotype kernels (i.e. Hom, Het, PhC and PC-Sel) while the Multi-SKAT omnibus test,

minP, maintained high power as well. For example, when the correlation between the phenotypes is 0.3 (i.e.

ρ = 0.3) and SKAT kernel is used for ΣG, if 3 phenotypes are associated with the region, minP and PhC are

more powerful than the other tests. If the correlation between the phenotypes is ρ = 0.7 and Burden kernel

is used for ΣG, Het, PC-Sel and minP had higher power than the rest of the tests when 2 phenotypes were

associated. It is noteworthy that Hom had the lowest power in almost all the scenarios of Figure 1.

As aforementioned, PhC is equivalent to GAMuT and MSKAT. Figure 2 demonstrates scenarios involving

12


https://doi.org/10.1101/229583


6 phenotypes and clustered correlation structures where PhC can be outperformed by other choices of ΣP .

For example, Figure 2 upper panel shows a situation when the phenotypes have clustered correlation structure

and the correlation within the clusters is low (ρ = 0.3); phenotypes 1, 2, 3, 4 and 6 were associated with the

causal variants. In this situation, Hom and minP outperformed PhC when SKAT kernel was used. This may

be because the phenotype correlation structure does not reflect the genetic association pattern. Similarly

in Figure 2 lower panel the phenotypes consist of 2 small high correlation(ρ = 0.7) cluster and one larger

low correlation (ρ = 0.3) cluster as shown in right column; if phenotypes 1, 3 and 6 are associated with the

region, Het and minP had higher power than PhC.

When 20% of causal variants were trait-decreasing variants, the power of Multi-SKAT tests with Burden

ΣG was substantially reduced (Supplementary Figure S2 and S3). This is because the association signals were

canceled out due to the presence of negative genetic effect coefficients. Since SKAT is robust regardless of the

association direction, power with SKAT ΣG was largely maintained. The relative performance of methods

with different ΣP given ΣG was quantitatively similar to the results without trait-decreasing variants.

Further, we estimated power of minPcom, which combines both ΣP and ΣG. The power of minPcom was

evaluated for the compound symmetric phenotype correlation structure presented in Figure 1 and has been

compared with the two minP tests of SKAT (minP-SKAT) and Burden (minP-Burden) ΣG kernels. Figure

3 shows empirical power with and without trait-decreasing variants. When all genetic effect coefficients were

positive (Figure 3, left panel), and the correlation among phenotypes were low (i.e. ρ = 0.3), minP-SKAT was

slightly more powerful than minP-Burden. When the correlation was high (i.e. ρ = 0.7), minP-Burden was

more powerful. When 20% of genetic effect coefficients were negative (Figure 3, right panel), as expected,

the power of minP-Burden was substantially decreased. Across all the situations, the power of minPcom

was either similar or slightly higher than the most powerful minP with fixed ΣG. When 50% of variants

were causal variants and all genetic effect coefficients were positive (Supplementary Figure S4, left panel),

minP-Burden was more powerful than minP-SKAT, and minPcom had similar or slightly higher power than

minP-Burden.

Overall, our simulation results show that the omnibus tests, especially minPcom, had robust power

throughout all the simulation scenarios considered. When ΣG and ΣP were fixed, power depended on

the model of association and the correlation structure of the phenotypes. It is also noteworthy that the

proposed Multi-SKAT tests generally outperformed the single phenotype test (minPhen), even when only

one phenotype was associated with genetic variants.

13


https://doi.org/10.1101/229583


Application to the METSIM study exomechip data

Inborn errors of amino acid metabolism cause mild to severe symptoms like mental retardation, seizure,

coma, and osteoporosis. Amino acid levels are perturbed in other disease states, e.g., Glutamic and aspar-

tic acid levels are reduced in Alzheimer disease brains31; Isoleucine, glycine, alanine, phenylalanine, and

threonine levels are increased in cerebo-spinal fluid (CSF) of motor neuron disease patients32. To find rare

variants associated with 9 amino acid levels, we applied the Multi-SKAT tests to the METSIM study data.

We estimated the relatedness between individuals by KING25, and coheritability and residual variance using

PHENIX27 (Supplementary Figure S1). Among the 8,545 METSIM participants with non-missing pheno-

types and covariates, 1332 individuals were found to be as closely related as degree 2 or closer. A total

of 5207 genes with at least three rare variants were included in our analysis. The Bonferroni corrected

significance threshold was α = 0.05/5207 = 9.6× 10−6. Further we used a less significant cutoff of α = 10−4

for a gene to be suggestive. To investigate the validity of our results, we applied SKAT-O to test for single

phenotype associations. After identifying associated genes, we carried out backward elimination procedure

(Appendix B) to investigate which phenotypes are associated with the gene. This procedure iteratively

removes phenotypes based on minPcom p-values.

QQ plots for the p-values obtained by minPhen and Multi-SKAT omnibus tests are displayed in Figure 4.

Due to the presence of several strong associations, for the ease of viewing, any p-value < 10−12 was collapsed

to 10−12. Table 2 shows significant and suggestive genes by either minPhen or minPcom, and Supplementary

Table S3 shows SKAT-O p-values between each of these genes and each phenotype. Among the eight signifi-

cant or suggestive genes displayed in Table 2, minPcom provides more significant p-values than minPhen for

six genes: Glycine decarboxylase (GLDC [MIM: 238300]), Histidine ammonia-lyase (HAL [MIM: 609457]),

Phenylalanine hydroxylase (PAH [MIM: 612349]), Dihydroorotate dehydrogenase (DHODH [MIM: 126064]),

Mediator of RNA polymerase II transcription subunit 1 (MED1 [MIM: 604311]), Serine/Threonine Kinase

33 (STK33 [MIM: 607670]). Interestingly, PAH and MED1 are significant by minPcom, but not signifi-

cant by minPhen. PAH encodes an Phenylalanine hydroxylase, which catalyzes the hydroxylation of the

aromatic side-chain of phenylalanine to generate tyrosine. Backward elimination shows that Phenylalanine

and Tyrosine are two last phenotypes to be eliminated (Supplementary Table S2). MED1 is involved in

the regulated transcription of nearly all RNA polymerase II-dependent genes. This gene does not show any

single phenotype association, but the cross-phenotype analysis produced the evidence of association.

Among other genes, GLDC has the smallest p-value. Variants in GLDC are known to cause glycine

encephalopathy (MIM: 605899)33. Univariate SKAT-O test with each of these phenotype reveals that this

gene has a strong association with Glycine (p-value = 2.5 × 10−64, Supplementary Table S3). Among the

14


https://doi.org/10.1101/229583


variants genotyped in this gene, rs138640017 (MAF = 0.009) appears to drive the association (single variant

p-value = 1.0×10−64). Variants in HAL cause histidinemia (MIM: 235800) in human and mouse. This gene

manifests significant univariate association with Histidine (SKAT-O p-value = 3.2 × 10−8, Supplementary

Table S3) which in turn is influenced by the association of rs141635447 (MAF = 0.005) with Histidine (single

variant p-value = 3.7 × 10−13). Similarly, variants in DHODH, which have been previously found to be

associated with postaxial acrofacial dysostosis (MIM: 263750), have significant cross-phenotype association

though mostly driven by the association with Alanine (SKAT-O p-value =1.4×10−07, Supplementary Table

S3). ALDH1L1 catalyzes conversion of 10-formyltetrahydrofolate to tetrahydrofolate. Published results

show that common variant rs1107366, 5kb upstream of ALDH1L1, is associated with Glycine-Seratinine

ratio34. Down-regulation of BCAT2 in mice causes elevated serum branched chain amino acid levels and

features of maple syrup urine disease.

Table 3 shows p-values of Multi-SKAT kernel and omnibus tests with two genotype kernels (SKAT and

Burden). Among phenotype kernels, PhC and Het generally produced the smallest p-values. We further

applied Multi-SKAT tests without kinship adjustment on the whole METSIM study sample. As expected,

this produced inflation in QQ plots (Supplementary Figure S5).

Overall, our METSIM amino acid data analysis suggests that the proposed method can be more powerful

than the single phenotype tests, while maintaining type I error rates even in the presence of the sample

relatedness. It also shows that the omnibus test provides robust performance by effectively aggregating

results of various kernels.

Computation Time

When ΣP and ΣG are given, p-values of Multi-SKAT are computed by the Davies method22, which inverts

the characteristic function of the mixture of chi-squares. On average, Multi-SKAT tests given ΣP and ΣG

required less than 1 CPU sec (Intel Xeon 2.80 GHz) when applied to a dataset with 5000 independent

samples, 50 variants and 5 phenotypes (Supplementary Table S1). With the kinship adjustment for 5000

related individuals, computation time was increased to 5 CPU sec. Since minPcom requires only a small

number of resampling to estimate the correlation among tests, it is still scalable for genome-wide analysis. In

the same dataset, minPcom required 16 and 19 CPU sec on average without and with the kinship adjustment,

respectively. Analyzing the METSIM dataset required 10 hours when parallelized into 5 processes.

15


https://doi.org/10.1101/229583


Discussion

In this article, we have introduced a new class of methods, Multi-SKAT, for rare variant test for multiple

phenotypes. As demonstrated, Multi-SKAT gains flexibility in terms of modeling the complex interaction

between the phenotypes and the genotypes through the use of the kernels ΣP and ΣG. Many published

methods of gene-based analysis of multiple phenotypes, including GAMuT, MSKAT and MAAUSS, can be

viewed as special cases of the Multi-SKAT test, corresponding to different values of ΣP and ΣG. Thus,

Multi-SKAT, in essence, provides a general framework for region based tests for cross-phenotype effects.

It is natural to assume that different genes follow different models of association. For some genes, the

effect of the variants on the phenotypes might be independent of each other, resembling the Het kernel for

ΣP , while for others, the effects might be nearly the same, which is similar to the Hom kernel. The omnibus

test, which combines various choices of kernels, can be a useful method in this regard. The minimum p-value

approaches using copula (minP and minPcom) combines the tests from a p-value perspective and provide

robust power regardless of underlying genetic models.

Multi-SKAT retains most of the desirable properties of SKAT. The asymptotic p-values of all the Multi-

SKAT tests, other than minP and minPcom, can be analytically obtained via Davies’ method. The p-value

calculations for minP and minPcom depend on a resampling based approach but a reliable estimate can be

obtained from a small number of resampling. Thus, computationally all the Multi-SKAT tests are scalable

at the genome-wide level. This method also allows the inclusion of prior information through weighing of

variants.

Additionally, Multi-SKAT can adjust for the relatedness among study individuals by accounting for

the kinship matrix. As shown in Supplementary Figure S5, in the presence of related individuals, not

adjusting for the relatedness can produce inflated type I error rates. Since Multi-SKAT is a regression based

approach, it effectively incorporates the relatedness by including a random effect term due to kinship. Type

I error simulation and METSIM data analysis show that our approach produced more accurate p-values

than alternative methods while controlling type I error rates.

Although it provides a general framework for gene-based multiple phenotype tests, the current approach

is limited to continuous phenotypes. In the future, using a generalized mixed effect model framework, we

aim to extend Multi-SKAT to binary phenotypes.

In summary, we have developed a powerful multiple phenotype test for rare variants. The proposed

method has robust power regardless of the underlying biology and can adjust for family relatedness. Our

method can be a scalable and practical solution to test for multiple phenotypes and will contribute to

detecting rare variants with pleiotropic effects. All our methods are implemented in the R package Multi-

16


https://doi.org/10.1101/229583


SKAT.

Supplemental Data

Five figures and three tables that were referred to in this paper are provided in the Supplemental tables and

figures.

Acknowledgments

This work was supported by grants R01 HG008773 (D.D. and S.L.) from the National Institutes of Health.

We would like to thank investigators of the METSIM study for access to the genotype and amino-acid

phenotype data.

Web Resources

MultiSKAT R-package: https://github.com/diptavo/MultiSKAT

Online Mendelian Inheritance in Man (OMIM): http://www.omim.org

Appendices

A. Relationship between Multi-SKAT and existing methods

A.1. MSKAT

The MSKAT test statistic is

QMSKAT = vec(S)T (WW ⊗ V −1)vec(S), (17)

where S = GT (Y − µ) is a matrix of score statistics15. From the fact

(I ⊗ V −1)vec(S) = (GT ⊗ I)(I ⊗ V −1) {vec(Y )− vec(µ)}

the QMSKAT can be written as

{vec(Y )− vec(µ)}T{

(GWWGT )⊗ V −1}{vec(Y )− vec(µ)} ,

17


https://doi.org/10.1101/229583


which is a Multi-SKAT test statistics with ΣG = WW and ΣP = V .

A.2. GAMuT

Suppose Yadj = HY and Gadj = HG are covariate adjusted phenotype and genotype matrices where H =

I −X(XTX)−1XT . We note that with the intercept in X, Yadj and Gadj are mean centered. The covariate

adjusted GAMuT test statistics is

QGAMuT =tr(PcXc)

n

where Pc = Yadj(YTadjYadj)

−1Y Tadj and Xc = GadjWWGTadj . Using the fact that Y TadjYadj/n = V is the

estimate of variance after adjusting covariates and GTadjYadj = GHY = GTYadj , we show

tr(PcXc)/n = tr(Yadj V−1Y TadjGadjWWGTadj)

= tr(V − 12Y TadjGWWGTYadj V

− 12 )

= vec(WGTYadj V− 1

2 )T vec(WGTYadj V− 1

2 )

= ((WGT ⊗ V − 12 )vec(Yadj))

T ((WGT ⊗ V − 12 )vec(Yadj))

= (vec(Y )− vec(µ))T (GWWGT ⊗ V −1)(vec(Y )− vec(µ))

which the Multi-SKAT test statistic with ΣG = WW and ΣP = V .

A.3. MAAUSS and MF-KM

There exists two different version of the MAAUSS tests. The homogeneous version of MAAUSS assumes

that the effects of a variant on multiple phenotypes are identical and uses the following test statistics

QMAAUSS−HOM = (vec(Y )−vec(µ))T (In⊗ V −1)(G⊗I)(WW ⊗1m1Tm)(GT ⊗I)(In⊗ V −1)(vec(Y )−vec(µ))

(18)

which is identical to the Multi-SKAT test statistic with ΣG = WW and ΣP = 1m1Tm. The heterogeneous

version of MAAUSS assumes that the effects of a variant on multiple phenotypes are independent, and use

the following test statistics

QMAAUSS−HET = (vec(Y )−vec(µ))T (In⊗V −1)(G⊗I)(WW⊗I)(GT⊗I)(In⊗V −1)(vec(Y )−vec(µ)) (19)

which is identical to the Multi-SKAT test statistic with ΣG = WW and ΣP = I. Note that the test statistic

of MF-KM is exactly the same as QMAAUSS−HET .

18


https://doi.org/10.1101/229583


B. Backward elimination procedure to identify associated phenotypes

After identifying the gene or region associated with multiple phenotypes, next question would be identifying

truly associated phenotypes. Here we present a simple backward elimination algorithm to iteratively remove

relatively less important phenotypes. Similar method has previously been applied to identify rare causal

variants in an associated gene35.

• Step 1. Start with a set of k phenotypes PhenCurrent = {y1, y2, · · · yk} and compute a Multi-SKAT

test association p-value for the set PhenCurrent denoted by pCurrent.

• Step 2. Remove each of the phenotypes one at a time from the set PhenCurrent. The resulting set is

Phen−i = {y1, y2, · · · yi−1, yi+1 · · · yk} for i = 1, 2, · · · , k and compute the corresponding p-values p−i

for that same Multi-SKAT test.

• Step 3. Remove the phenotype j that leads to the smallest p-value, i.e. j = argmin{p−1, p−2, · · · , p−k}.

Update PhenCurrent to Phen−j .

• Step 4. Continue removing phenotypes till only 1 phenotype is left.

Supplementary Table S2 shows the backward elimination results of 5 most significant and suggestive genes

in the METSIM study data analysis as per the p-values reported by minPcom. Although this procedure does

not provide a set of phenotypes truly associated, it provides the relative importance of the phenotypes in

driving association signals. For example, the minPcom p-value for GLDC was 2.3 × 10−72. When each of

the phenotypes were removed one at a time and the minPcom p-values were calculated on the remaining 8

phenotypes, we found that eliminating Isoleucine (Ile) actually improved the signal. The minPcom p-value

of the set of 8 amino acids leaving out Ile was 2.8× 10−73. This indicates that Isoleucine has very minimal

contribution to the association between the amino acids and GLDC. Subsequently, Valine was the next

phenotype to be eliminated indicating that it has the next lowest contribution after Isoleucine. Carrying out

this procedure further, we find that Glycine is the last phenotype to remain indicating that it is the strongest

driver of the signal. This is in agreement to the single phenotype SKAT-O results (Supplementary Table

S3). Similary for genes HAL, DHODH, PAH and MED1, Histine, Alanine, Phenylalanine and Tyrosine were

the most associated phenotypes, respectively. Interestingly for PAH and MED1, single phenotype p-values

are not significant, which suggests that multiple phenotypes are associated with these genes.

19


https://doi.org/10.1101/229583


References

[1] Arthur Korte and Ashley Farlow. The advantages and limitations of trait analysis with gwas: a review.

Plant Methods, 9:9–29, 2013.

[2] Teri A. Manolio, Francis S. Collins, Nancy J. Cox, David B. Goldstein, Lucia A. Hindorff, David J.

Hunter, Mark I. McCarthy, Erin M. Ramos, Lon R. Cardon, Aravinda Chakravarti, Judy H. Cho,

Alan E. Guttmacher, Augustine Kong, Leonid Kruglyak, Elaine Mardis, Charles N. Rotimi, Montgomery

Slatkin, David Valle, Alice S. Whittemore, Michael Boehnke, Andrew G. Clark, Evan E. Eichler, Greg

Gibson, Jonathan L. Haines, Trudy F. C. Mackay, Steven A. McCarroll, , and Peter M. Visscher. Finding

the missing heritability of complex diseases. Nature, 461:747–753, 2009.

[3] The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human

genomes. Nature, 491:56–65, 2012.

[4] Seunggeung Lee, Goncalo R Abecasis, Michael Boehnke, and Xihong Lin. Rare-variant association

analysis: study designs and statistical tests. The American Journal of Human Genetics, 95(1):5–23,

2014.

[5] Nadia Solovieff, Chris Cotsapas, Phil H Lee, Shaun M Purcell, and Jordan W Smoller. Pleiotropy in

complex traits: challenges and strategies. Nature reviews. Genetics, 14(7):483, 2013.

[6] Chris Cotsapas, Benjamin F Voight, Elizabeth Rossin, Kasper Lage, Benjamin M Neale, Chris Wallace,

Goncalo R Abecasis, Jeffrey C Barrett, Timothy Behrens, Judy Cho, et al. Pervasive sharing of genetic

effects in autoimmune disease. PLoS genetics, 7(8):e1002254, 2011.

[7] Shanya Sivakumaran, Felix Agakov, Evropi Theodoratou, James G Prendergast, Lina Zgaga, Teri Mano-

lio, Igor Rudan, Paul McKeigue, James F Wilson, and Harry Campbell. Abundant pleiotropy in human

complex diseases and traits. The American Journal of Human Genetics, 89(5):607–618, 2011.

[8] M.A. Ferreira and S.M. Purcell. A multivariate test of association. Bioinformatics, 25:132–133, 2009.

[9] J. Huang, A.D. Johnson, and C.J. O’Donnell. Prime: a method for characterization and evaluation of

pleiotropic regions from multiple genome-wide association studies. Bioinformatics, 27:1201–1206, 2011.

[10] Xiang Zhou and Matthew Stephens. Efficient multivariate linear mixed model algorithms for genome-

wide association studies. Nature Methods, 11:407–409, 2014.

20


https://doi.org/10.1101/229583


[11] J. S. Ried, A. Doring, K. Oexle, C. Meisinger, J. Winkelmann, N. Klopp, T. Meitinger, A. Peters,

K. Suhre, H.E. Wichmann, and C. Gieger. Psea: Phenotype set enrichment analysis–a new method for

analysis of multiple phenotypes. Genetic Epidemiology, 36:244–252, 2012.

[12] Debashree Ray, James S Pankow, and Saonli Basu. Usat: A unified score-based association test for

multiple phenotype-genotype analysis. Genetic epidemiology, 40(1):20–34, 2016.

[13] Y. Wang, A. Liu, J.L. Mills, M. Boehnke, A.F. Wilson, J.E. Bailey-Wilson, M. Xiong, C.O. Wu, and

R. Fan. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models.

Genetic Epidemiology, 39:259–275, 2015.

[14] K. Alaine Broadaway, David J. Cutler, Richard Duncan, Jacob L. Moore, Erin B. Ware, Lawrence

F. Bielak Min A. Jhun, Wei Zhao, Jennifer A. Smith, Patricia A. Peyser, Sharon L.R. Kardia, Debashis

Ghosh, and Michael P. Epstein. A statistical approach for testing cross-phenotype effects of rare variants.

American Journal of Human Genetics, 98(3):525–540, 2016.

[15] Baolin Wu and J.S. Pankow. Sequence kernel association test of multiple continuous phenotypes. Genetic

Epidemiology, 40(2):91–100, 2016.

[16] Selyeong Lee, Sungho Won, Young Jin Kim, Yongkang Kim, Bong-Jo Kim, and Taesung Park. Rare

variant association test with multiple phenotypes. Genetic Epidemiology, 2016.

[17] Jianping Sun, Karim Oualkacha, Vincenzo Forgetta, Hou-Feng Zheng, J Brent Richards, Antonio

Ciampi, and Celia MT Greenwood. A method for analyzing multiple continuous phenotypes in rare vari-

ant association studies allowing for flexible correlations in variant effects. European Journal of Human

Genetics, 24(9):1344–1351, 2016.

[18] A. Maity, P.F. Sullivan, and J.Y. Tzeng. Multivariate phenotype association analysis by marker-set

kernel machine regression. Genetic Epidemiology, 36:686–695, 2012.

[19] Qi Yan, Daniel E Weeks, Juan C Celedon, Hemant K Tiwari, Bingshan Li, Xiaojing Wang, Wan-Yu

Lin, Xiang-Yang Lou, Guimin Gao, Wei Chen, et al. Associating multivariate quantitative phenotypes

with genetic variants in family samples with a novel kernel machine regression method. Genetics,

201(4):1329–1339, 2015.

[20] Michael C. Wu, Seunggeun Lee, Tianxi Cai, Yun Li, Michael Boehnke, and Xihong Lin. Rare-variant

association testing for sequencing data with the sequence kernel association test. Americal Journal of

Human Genetics, 89:82–93, 2011.

21


https://doi.org/10.1101/229583


[21] Seunggeun Lee, Michael C Wu, and Xihong Lin. Optimal tests for rare variant effects in sequencing

association studies. Biostatistics, 13(4):762–775, 2012.

[22] Robert B Davies. The distribution of a linear combination of x2 random variables. Applied Statistics,

29(3):323–333, 1980.

[23] Hugues Aschard, Bjarni J Vilhjalmsson, Nicolas Greliche, Pierre-Emmanuel Morange, David-Alexandre

Tregouet, and Peter Kraft. Maximizing the power of principal-component analysis of correlated pheno-

types in genome-wide association studies. The American Journal of Human Genetics, 94(5):662–676,

2014.

[24] Stefano Demarta and Alexander J McNeil. The t copula and related copulas. International Statistical

Review/Revue Internationale de Statistique, pages 111–129, 2005.

[25] A Manichaikul, J C Mychaleckyj, S S Rich, K Daly, M Sale, and W M Chen. Robust relationship

inference in genome-wide association studies. Bioinformatics, 26(22):2867–2873, 2010.

[26] Brendan K Bulik-Sullivan, Po-Ru Loh, Hilary K Finucane, Stephan Ripke, Jian Yang, Schizophrenia

Working Group of the Psychiatric Genomics Consortium, Nick Patterson, Mark J Daly, Alkes L Price,

and Benjamin M Neale. Ld score regression distinguishes confounding from polygenicity in genome-wide

association studies. Nature Genetics, 47:291–295, 2015.

[27] Andrew Dahl, Valentina Iotchkova, Amelie Baud, Asa Johansson, Ulf Gyllensten, Nicole Soranzo,

Richard Mott, Andreas Kranis, and Jonathan Marchini. A multiple-phenotype imputation method

for genetic studies. Nature Genetics, 48:466–472, 2016.

[28] Xiang Zhou, Peter Carbonetto, and Stephens Matthew. Polygenic modeling with bayesian sparse linear

mixed models. PLoS Genetics, 9(2):e1003264, 2013.

[29] Stephen F Schaffner, Catherine Foo, Stacey Gabriel, David Reich, Mark J Daly, and David Altshuler.

Calibrating a coalescent simulation of human genome sequence variation. Genome research, 15(11):1576–

1583, 2005.

[30] Jeroen R Huyghe, Anne U Jackson, Marie P Fogarty, Martin L Buchkivoch, Alena Stancakova,

Heather M Stringham, Xueling Sim, Lingyao Yang, Christian Fuchsberger, Henna Cederberg, and et al.

Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and

secretion. Nature Genetics, 45:197–201, 2013.

22


https://doi.org/10.1101/229583


[31] D. Allan Butterfield and Chava B. Pocernich. The glutamatergic system and alzheimer’s disease. CNS

Drugs, 17(9):641–652, 2003.

[32] J. de Belleroche, A. Recordati, and F. Clifford Rose. Elevated levels of amino acids in the csf of motor

neuron disease patients. CNS Drugs, 17(9):641–652, 2003.

[33] Ian Hughes. Pediatric endocrinology and inborn errors of metabolism. Journal of Paediatrics and Child

Health, 45(11):668–669, 2009.

[34] Weijia Xie, Andrew R. Wood, Valeria Lyssenko, Michael N. Weedon, Joshua W. Knowles, Sami Alka-

yyali, Themistocles L. Assimes, Thomas Qaurtermous, Fahim Abbasi, Jussi Paananen, and et al. Genetic

variants associated with glycine metabolism and their role in insulin sensitivity and type 2 diabetes.

Diabetes, 62:2141–2150, 2013.

[35] Iuliana Ionita-Laza, Marinela Capanu, Silvia De Rubeis, Kenneth McCallum, and Joseph D Buxbaum.

Identification of rare causal variants in sequence-based studies: methods and applications to vps13b, a

gene involved in cohen syndrome and autism. PLoS Genet, 10(12):e1004729, 2014.

23


https://doi.org/10.1101/229583


ρ =0.3 ΣG= SKAT

Associated Phenotypes

00.

50.

7

1 2 3

Pow

er

ρ =0.3 ΣG= Burden


00.

50.

7

1 2 3

ρ =0.7 ΣG= SKAT


00.

50.

7

1 2 3

Pow

er

ρ =0.7 ΣG= Burden


00.

50.

7

1 2 3

minPhenHomHetPhCPC−SelminP

Figure 1: Power for Multi-SKAT tests when phenotypes have compound symmetric correlation struc-tures. Empirical power for minPhen, Hom, Het, PhC, PC-Sel, minP plotted against the number ofphenotypes associated with the gene of interest with a total of 5 phenotypes under consideration.Upper row shows the results for ρ = 0.3 and lower row for ρ = 0.7. Left column shows results withSKAT kernel ΣG, and right columns shows results with Burden kernel. All the causal variants weretrait-increasing variants.

24


https://doi.org/10.1101/229583


ΣG= SKAT

Pow

er

00.

50.

7

ΣG= SKAT

Pow

er

00.

50.

7

ΣG= Burden

Pow

er

00.

50.

7Po

wer

ΣG= Burden

Pow

er

00.

50.

7Po

wer

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

● −1

1

x x x x x

x

x

x

x

x

●●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●● −1

1

x x x

x

x

x

minPhenHomHetPhCPC−SelminP

Figure 2: Power for Multi-SKAT tests when phenotypes have clustered correlation structures. Empirical powers forminPhen, Hom, Het, PhC, PC-Sel, minP are plotted under different levels of association with a total of 6 phenotypesand with clustered correlation structures. Middle column shows the empirical powers for different combinations ofphenotypes associated with SKAT kernel ΣG; the rightmost column shows the corresponding results with Burdenkernel; left column shows the corresponding correlation matrices for the phenotypes. The associated phenotypes areindicated in red cross marks across the correlation matrices. All the causal variants were trait-increasing variants.

25


https://doi.org/10.1101/229583


1 2 3

ρ =0.3 β +/− = 100/0


00.

50.

7

Pow

er

1 2 3

ρ =0.3 β +/− = 80/20


00.

50.

7

1 2 3

ρ =0.7 β +/− = 100/0


00.

50.

7

Pow

er

1 2 3

ρ =0.7 β +/− = 80/20


00.

50.

7

minP−BurdenminP−SKATminPcom

Figure 3: Power for Multi-SKAT by combining tests with ΣP as Hom, Het, PhC, PC-Sel and ΣG as SKAT andBurden when phenotypes have compound symmetric correlation structures. Empirical powers for minP-Burden,minP-SKAT and minPcom are plotted against the number of phenotypes associated with the gene of interest witha total of 5 phenotypes under consideration. Upper row shows the results for ρ = 0.3 and lower row for ρ = 0.7.Left column shows results when all the causal variants were trait-increasing variants, and right column shows resultswhen 80%/20% of the causal variants were trait-increasing/trait-decreasing variants.

26


https://doi.org/10.1101/229583


minPhen

Expected (- log10 p-value)

Obs

erve

d (-

log

10 p

-val

ue)

2

4

6

8

10

2 4 6 8 10

(a)

minP-SKAT


Obs

erve

d (-

log

10 p

-val

ue)

2

4

6

8

10

2 4 6 8 10

(b)

minP-Burden


Obs

erve

d (-

log

10 p

-val

ue)

2

4

6

8

10

2 4 6 8 10

(c)

minPcom


Obs

erve

d (-

log

10 p

-val

ue)

2

4

6

8

10

2 4 6 8 10

(d)

Figure 4: QQplot of the p-values of minPhen and Multi-SKAT omnibus tests for the METSIM data. Forthe ease of viewing, any associations with p-values < 10−12 have been collapsed to 10−12

27


https://doi.org/10.1101/229583


Table 1: Empirical type I error rates of the Multi-SKAT tests. The number of phenotypes were nine and the correlation structure among the phenotypeswere similar to that of the amino acid phenotypes in the METSIM study data. Type I error rates were estimated based on 5,000,000 simulated nulldatasets.

Level ΣG Hom Het PhC PC-Sel minP minPcom

SKAT 2.6× 10−06 2.6× 10−06 2.8× 10−06 2.6× 10−06 3.4× 10−06

2.5× 10−06 2.8× 10−06

Burden 2.8× 10−06 2.6× 10−06 2.6× 10−06 2.6× 10−06 3.0× 10−06

SKAT 9.6× 10−06 9.8× 10−06 9.8× 10−06 9.8× 10−06 9.4× 10−06

Independent samples 10−05 1.2× 10−05

(without kinship adjustment) Burden 9.6× 10−06 9.8× 10−06 9.6× 10−06 9.6× 10−06 9.6× 10−06

SKAT 9.5× 10−05 9.6× 10−05 9.8× 10−05 9.8× 10−05 9.8× 10−05

10−04 1.1× 10−04

Burden 9.6× 10−05 9.4× 10−05 9.7× 10−05 9.6× 10−05 9.7× 10−05

SKAT 2.4× 10−06 2.4× 10−06 2.8× 10−06 2.6× 10−06 3.0× 10−06

2.5× 10−06 2.6× 10−06

Burden 2.6× 10−06 2.4× 10−06 2.6× 10−06 2.4× 10−06 3.2× 10−65

SKAT 9.8× 10−06 9.8× 10−06 9.8× 10−06 9.8× 10−06 9.4× 10−06

Related samples 10−05 1.4× 10−05

(with kinship adjustment) Burden 9.6× 10−06 9.4× 10−06 9.4× 10−06 9.4× 10−06 9.4× 10−65

SKAT 9.6× 10−05 9.7× 10−05 9.8× 10−05 9.4× 10−05 9.8× 10−05

10−04 1.2× 10−04

Burden 9.7× 10−05 9.6× 10−05 9.8× 10−05 9.8× 10−05 9.7× 10−05

28

.C

C-B

Y-N

D 4.0 International license

is made available under a

The copyright holder for this preprint (w

hich was not peer-review

ed) is the author/funder. It.

https://doi.org/10.1101/229583doi:

bioRxiv preprint

https://doi.org/10.1101/229583


Table 2: Significant and suggestive genes associated with 9 amino acid phenotypes. Genes with p-values < 10−4 by minPhen or any Multi-SKAT tests(minP-SKAT, minP-Burden and minPcom) were reported in this table. Multi-SKAT tests were applied with the kinship adjustment, and 5207 genes withat least three rare (MAF < 0.01) nonsynonymous and protein-truncating variants were used in this analysis. The total sample size was n=8545. P-valuessmaller than the Bonferroni corrected significance α = 9.6 × 10−06 were marked as bold. Smallest p-value for each gene among all the tests have beenunderlined.

Gene Chromosome Rare SNPs MAC minPhen minP-SKAT minP-Burden minPcom

GLDC 9 4 183 2.2 × 10−63 1.1 × 10−72 9.7 × 10−64 2.3 × 10−72

HAL 3 6 42 2.9 × 10−08 1.1× 10−05 4.2 × 10−11 9.5 × 10−11

DHODH 16 5 90 1.3 × 10−06 9.7 × 10−08 7.7 × 10−06 1.1 × 10−07

PAH 12 6 27 6.1× 10−04 1.4 × 10−06 1.3 × 10−06 1.9 × 10−06

MED1 17 3 147 6.2× 10−01 3.9 × 10−06 3.4× 10−04 4.9 × 10−06

STK33 11 6 180 8.9× 10−01 3.2× 10−05 3.3× 10−02 4.2× 10−05

ALDH1L1 3 8 103 8.4 × 10−08 3.8× 10−04 4.2× 10−05 5.4× 10−05

BCAT2 19 3 133 1.1× 10−05 9.3× 10−03 3.4× 10−05 6.2× 10−05

29

.C

C-B

Y-N







bioRxiv preprint

https://doi.org/10.1101/229583


Table 3: P-values for MultiSKAT tests (Hom, Het, PhC,PC-Sel, minP) with SKAT and burden kernels for the genes reported in Table 2. Multi-SKATtests were applied with the kinship adjustment. Only genes with at least three rare (MAF < 0.01) nonsynonymous and protein-truncating variants wereused in this analysis. The total sample size was n=8545. P-values smaller than the Bonferroni corrected significance α = 9.6× 10−06 were marked as bold.

Gene Multi-SKAT

ΣG Name Chromosome Rare SNPs MAC minPhen Hom Het PhC PC-Sel minP

GLDC 9 4 183 6.8 × 10−63 1.3× 10−03 1.3 × 10−17 2.9 × 10−73 5.3 × 10−59 1.1 × 10−72

DHODH 16 5 90 5.1 × 10−08 1.4× 10−01 3.2× 10−03 3.1 × 10−08 7.1 × 10−06 9.7 × 10−08

PAH 12 6 27 4.3× 10−05 7.9× 10−02 1.8× 10−05 3.9 × 10−07 6.3× 10−04 1.4 × 10−06

SKAT MED1 17 3 147 8.7× 10−02 3.4× 10−01 9.3 × 10−07 2.8× 10−04 1.9× 10−04 3.9 × 10−06

HAL 12 6 42 3.8× 10−05 5.5× 10−01 3.5× 10−03 2.9 × 10−06 1.5× 10−04 1.1× 10−05

STK33 11 6 180 4.0× 10−01 2.3× 10−01 8.9 × 10−06 1.1× 10−03 5.7× 10−03 3.2× 10−05

ALDH1L1 3 8 103 4.4× 10−02 1.1× 10−01 2.2× 10−02 1.6× 10−04 4.2× 10−04 3.8× 10−04

BCAT2 19 3 133 9.7× 10−04 5.0× 10−01 8.1× 10−02 2.4× 10−03 3.5× 10−03 9.3× 10−03

GLDC 9 4 183 1.4 × 10−59 7.5× 10−04 1.5 × 10−15 1.3 × 10−64 2.1 × 10−54 9.7 × 10−64

HAL 12 6 42 2.4 × 10−08 2.8× 10−05 4.6× 10−01 4.2 × 10−12 7.9× 10−03 4.2 × 10−11

PAH 12 6 27 4.3× 10−05 2.0× 10−01 5.3× 10−04 7.2 × 10−06 9.3× 10−03 1.3 × 10−06

Burden DHODH 16 5 90 1.3 × 10−07 5.3× 10−02 1.3× 10−02 2.9 × 10−06 7.6× 10−04 7.7 × 10−06

BCAT2 19 3 133 8.4 × 10−06 4.0× 10−01 1.7× 10−02 1.1× 10−05 6.1× 10−03 3.7× 10−05

ALDH1L1 3 8 103 8.3 × 10−08 1.7× 10−02 5.1× 10−03 2.1 × 10−06 1.7× 10−05 4.2× 10−05

MED1 17 3 147 9.4× 10−01 1.8× 10−01 8.7× 10−05 2.1× 10−03 7.1× 10−01 3.4× 10−04

STK33 11 6 180 4.8× 10−01 7.1× 10−01 7.0× 10−04 3.8× 10−02 6.4× 10−01 6.3× 10−03

30

.C

C-B

Y-N







bioRxiv preprint

https://doi.org/10.1101/229583


multi-skat: general framework to test multiple phenotype ... · study used for our data...

Documents