treelet covariance smoothers - benjamindravesb.github.io/presentation-slides/thesis_defense.pdf ·...

48
Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves 1 1 Department of Mathematics Lafayette College Advisor: T. Gaugler Lafayette College, 2017 Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Upload: others

Post on 11-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance SmoothersEstimation of Genetic Parameters

Benjamin Draves1

1Department of MathematicsLafayette College

Advisor: T. Gaugler

Lafayette College, 2017

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 2: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Overview

1 Motivation in Statistical Genetics

2 Treelets

3 Treelet Covariance Smoothers

4 Simulation Studies

5 Health Aging and Body Composition Study

6 Conclusion

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 3: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Molecular Biology Review

Each person’s geneticcomposition coded onchromosomes

Most humans have 46 in total,all occurring in pairs

The 23rd pair determines sex

We can compare the geneticdata coded by the first 22 pairsfor all humans

Find patterns between thisgenetic data and realized traits& diseases

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 4: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Traditional Genetic Studies

We wish to estimate the penetrance function, P(Y |G)

Y is some phenotype of interestG codes the underlying genotype

Kinda hard to do without G...

Linkage Analysis studies have had considerable success understandingG indirectly by analyzing Y through numerous generations

Hard to do with human genetics

Next Generation Sequencing (NGS) technology allows us to samplefrom G directly

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 5: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Single Nucleotide Polymorphisms (SNPs)

So how do we encode this genetic information?Code the chromosome pairsExploit the complimentary fashion of DNA

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 6: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

SNPs (cont.)

SNPs Recode Count Minor Alleles

(A,T) (A,T)

(G,C) (A,T)...

...

(G,C) (A,T)

(G,C) (G,C)

=⇒

α α

β α...

...

β α

β β

=⇒

2

1...

1

0

Each row in this diagram represents a SNP

The pair, either (A,T ) or (G ,C ), is called a polymorphism or an allele

An allele is called a minor allele if appears less frequently in thepopulation

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 7: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Minor Allele Counts as Random Variables

For each locus, k, we code can code individual i ’s minor allele count(MAC) by c

(i)k ∈ {0, 1, 2}

For m loci, we can describe the full genotype by

Minor Allele Count (MAC)

c(i)∗ = {c(i)

1 , c(i)2 , . . . , c

(i)m } ∈ {0, 1, 2}m

If we assume random recombination of alleles, c(i)k ∼ Binom(2, pk)

Where pk is the minor allele frequency

This is a pretty strong assumption, but using this framework allowsfor simple model construction

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 8: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Scaled Minor Allele Counts

Under the assumption that alleles are independent, we can center ourcount vector

Let z(i)k := (c

(i)k − 2pk)/(2pk(1− pk))1/2 be the scaled minor allele

count at locus k

Then for each SNP, k , we define the scaled minor allele count by

Scaled Minor Allele Count (SMAC)

z∗k = (z(1)k , z

(2)k , . . . , z

(n)k )t

Where n is the number of individuals in the sample

Then for a sample of m genetic markers, we organize this data asZ = (z∗1, z

∗2, . . . , z

∗m) ∈ Rn×m

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 9: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Did everyone get that?

Z =

z∗1 z∗2 . . . z∗m

z(1)∗ z

(1)1 z

(1)2 . . . z

(1)m

z(2)∗ z

(2)1 z

(2)2 . . . z

(2)m

......

.... . .

...z

(n)∗ z

(n)1 z

(n)2 . . . z

(n)m

Individual n

SNP 2

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 10: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Genetic Parameters of Interest

Additive Genetic Relatedness (A)

Denoted Aij for relatedness between individuals i and jAdditive covariance between genetic markersI’ll refer to this as Relatedness

Narrow Sense Heritability (h2)

Incorporates a small contribution for the m genetic markers,independentlyDoesn’t try to understand the joint distribution of the allelesTraditional studies implicitly use this joint distribution to infer broadsense heritabilityI’ll refer to this as Heritability

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 11: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Estimating Relatedness

We consider alleles Identical By Descent (IBD)

Relatedness is the expected proportion of alleles IBD betweenindividuals

Under this interpretation of A, at SNP k , Aij = Cov(z(i)k , z

(j)k )

Using this information, we can estimate A by

Method of Moments Estimate of A

A =1

m

m∑

k=1

z∗k(z∗k)t =ZZt

m

As m increases, we expectZZt

m→ A

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 12: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Estimating Heritability

Phenotype Model (1)

y = Xβββ + Zu + εεε with Var(y) = ZZtσ2u + Iσ2

ε

y vector of phenotypes, Xβββ fixed effects, u vector of random effectsof the causal SNPs with Var(u) = Iσ2

u, εεε ∼ N (0, Iσ2ε ) residual errors

But remember, we want to understand the ratio of genetic variance tototal variance

Let u = (u1, u2, . . . , uJ)t ∈ RJ be the vector of effects correspondingto the J casual SNPs

Let σ2g = Jσ2

u be the variance explained by all the SNPs

We can then write the genetic effect of individual i as gi =J∑

j=1

z(i)j uj

where Var(g) = Aσ2g

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 13: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Estimating Heritability (cont.)

Phenotype Model (2)

y = Xβββ + g + εεε with Var(y) = Aσ2g + Iσ2

ε

We can partition the variability of phenotypic expression into genetic(σ2

g ) and environmental (σ2ε ) factors

From here we define narrow sense heritability as

Narrow Sense Heritability

h2 =σ2g

σ2g + σ2

ε

We can estimate this value via restricted maximum likelihood(REML) algorithms

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 14: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Motivation in Statistical Genetics

Possible Problems

Assume we have three random individuals, who happen to be namedBen, Josh, and Trent

Trent and Ben, coming from small Midwest towns, are 7th degreerelatives

Josh, from the west coast, is unrelated to Trent and Ben

Ben : 0 2 1 · · · · · · · · · 2

Trent : 1 2 0 · · · · · · · · · 0

Josh : 1 2 1 · · · · · · · · · 1

A(Ben, Trent) = 1130 , A(Ben, Josh) = 1

130

How do we differentiate between distantly and unrelated individuals?

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 15: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelets

Preliminaries: Principal Component Analysis (PCA)

Goal: Rotate underlying space so variability lies on few vectors

We can rotate the space via a Jacobian matrix corresponding to theprincipal components

Also used as a dimensionality reduction tool

● ●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

−20

0

20

−20 −10 0 10 20 30

Centered X1

Cen

tere

d X

2

PCA Example Data

●●

●● ●●

●●●

●●

● ●●

● ●●

●●

●●●

●●

●●

●●●

●●

●●

● ●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

−20

−10

0

10

20

−25 0 25 50

PCA 1

PC

A 2

PCA Example Data

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 16: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelets

Preliminaries: Wavelet Thresholding

Soft and Hard Thresholding

sλ(Aij) =

Aij + λ if Aij < λ

0 if −λ ≤ Aij ≤ λ,Aij − λ if Aij > λ

fλ(Aij) =

{Aij if |Aij | ≥ λ0 if |Aij | < λ

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 17: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelets

Treelet Algorithm: The Idea

Focus on estimating closerelatives well

Preserve local familiarstructures

Try to extend that structure todistant relatives

1 12 5 6 13 4 14 11 2 3 8 9 15 7 10

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 18: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelets

Treelet Algorithm

1 Let z∗ be a random vector representing the SMAC at any SNP withcovariance Σ = A, which is the additive genetic relationship matrix,corresponding to ` = 0

2 Let V0 be the basis corresponding to this vector

3 Compute the variance-covariance matrix Σ(0) with correspondingsimilarity matrix M(0) defined by

Mij(0)

=Σij

(0)

√Σii

(0)Σjj

(0)

4 Initialize the sum variable indices to S0 = {1, 2, . . . ,N}

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 19: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelets

Treelet Algorithm (cont.)

4 For ` = 1, 2, . . . , L for L ≤ N − 11 Find the two most closely related individuals according M(`−1). Let

(α`, β`) = arg maxi,j∈S`−1

M(`−1)ij

2 Rotate the genetic space to decorrelate zα`and zβ`

3 Rotate Σ(`−1) and update M(`−1)

4 Assuming α` and β` represent the first and and second principalcomponent, respectively

5 Update the sum set S` = S`−1 \ {β`}

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 20: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelets

Treelet Algorithm Visualized

z1(0) z2(0) z3(0) z4(0) z5(0)

s(1),d(1)

s(2),d(2)

s(3),d(3)

s(4),d(4)

` = 0

` = 1

` = 2

` = 3

` = 4

filler�s(4),d(3),d(2),d(4),d(1)

�t

�s(3),d(3),d(2), s(1),d(1)

�t

⇣v

(0)1 , s(2),d(2), s(1),d(1)

⌘t

⇣v

(0)1 ,v

(0)2 ,v

(0)3 , s(1),d(1)

⌘t

⇣v

(0)1 ,v

(0)2 ,v

(0)3 ,v

(0)4 ,v

(0)5

⌘t

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 21: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelets

Treelet Decomposition

At each level ` we have an orthonormal basis V` =[v

(`)1 . . . v

(`)N

]

Using this basis, write z∗(0) =N∑

i=1

α(`)i v

(`)i where α

(`)i = 〈z∗(0), v

(`)i 〉

represent the projections onto that basis vector at level `

This gives rise to the decomposition of the variance of z∗(0)

Treelet Decomposition

Σ = Var[z∗(0)] =N∑

i=1

γ(`)i ,i v

(`)i

(v

(`)i

)t+

N∑

i 6=j

γ(`)i ,j v

(`)i

(v

(`)j

)t= V`Γ`

(V`)t

Where γ(`)i ,j = Cov[α

(`)i , α

(`)j ] and Γ` =

(`)i ,j

]

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 22: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Formalization of Problem

For large samples, we expect A to be quite sparse

We want to enforce this sparsity on our estimates of A

We can do this directly, but run into the Trent, Ben, and Joshproblem

Idea: Use a Treelet representation of A and enforce sparsity of theprojected covariances, ΓΓΓ`, via wavelet hard thresholding

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 23: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Treelet Covariance Smoothing (TCS)

Crosset et al. (2013) first employed this method and called it TreeletCovariance Smoothing (TCS)

Gaugler et al. (2014) used this method to show that the majority ofrisk of Autism resides in common variants

It also partially got Trent a job at Lafayette

TCS Estimator

A(λ) =N∑

i=1

fλ[γi ,i ]vi (vi )t +

N∑

i 6=j

fλ[γi ,j ]vi (vj)t = Vfλ

[Γ]

Vt

Where fλ is a hard-thresholding function with optimal smoothingparameter λ

TCS utilizes the top level of the tree (` = N − 1)

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 24: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Possible Improvements/Heuristic Strategies

We anticipate clusters of closely related individuals in our samplesVarying `, we attain a more representative basis set for the underlyinggenetic space

z1(0) z2(0) z3(0) z4(0) z5(0)

s(1),d(1)

s(2),d(2)

s(3),d(3)

s(4),d(4)

` = 0

` = 1

` = 2

` = 3

` = 4

filler�s(4),d(3),d(2),d(4),d(1)

�t

�s(3),d(3),d(2), s(1),d(1)

�t

⇣v

(0)1 , s(2),d(2), s(1),d(1)

⌘t

⇣v

(0)1 ,v

(0)2 ,v

(0)3 , s(1),d(1)

⌘t

⇣v

(0)1 ,v

(0)2 ,v

(0)3 ,v

(0)4 ,v

(0)5

⌘t

We can induce additional smoothing by projecting only onto the firstprincipal component at each level

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 25: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Treelet Covariance Blocking (TCB)

We employ this idea in our proposed method Treelet CovarianceBlocking (TCB)

To utilize the first principal component only and write

z∗(`) =∑

i∈Sl

α(`)i v

(`)i

Using this projection, we estimate Var (z∗(`)) by

TCB Estimator

A(`) =∑

i∈S`

γ(`)i ,i v

(`)i

(v

(`)i

)t+∑

i ,j∈S`i 6=j

γ(`)i ,j v

(`)i

(v

(`)j

)t= V`Γ`

(V`)t

Where ` is the optimal level of the tree

We implicitly enforce sparsity by only projecting the data onto basisvectors that are supported by familial blockings in the data

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 26: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Treelet Covariance Blocked Smoothing (TCBS)

To further eliminate erroneous inter-familial relatedness, it may beadvantageous to utilize a hard thresholding function

We call this method Treelet Covariance Blocked Smoothing (TCBS)

Using the same projection onto the first principal components we have

TCBS Estimator

A(θ) =∑

i∈Sl

fλ[γ(`)i ,i ]v

(`)i

(v

(`)i

)t+∑

i ,j∈Sli 6=j

fλ[γ(`)i ,j ]v

(`)i

(v

(`)j

)t= V`fλ

[Γ`] (

V`)t

Where θ = (`, λ) is the optimal level, smoothing parametercombination

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 27: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Optimal Parameter Selection

All of our methods rely on choosing optimal smoothing parameters

We considered clustering techniques, cross validation, and likelihoodbased methods to search over the parameter space Θ

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 28: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Cross Validation

Partition chromosomes into two sets: A and BFind a robust, no smoothing estimate, A using the SNPs from ATrain our algorithms on A to attain A(θ) for each θ ∈ Θ

Partition B into K groups

For each k = 1, 2, . . . ,K , attain Ak and compare to smoothingestimates A(θ) via

Cost Function

H(θ) =1

(N − 1)NK

K∑

k=1

N∑

i<j

wij(Aij ,k − Aij(θ))2

wij = |Γ(`)ij | corresponds to the Γ matrix at level ` determined by θ

The optimal parameter is given by θ = arg minθ∈Θ

H(θ)

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 29: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Pretty pictures for those who are lost or bored

0.00 0.05 0.10 0.15 0.20

310.

431

1.0

311.

531

2.0

312.

4

●●

● ●

● ● ● ● ● ●

λ

H(θ

TC

S)

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 30: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Treelet Covariance Smoothers

Pretty pictures for those who are lost or boredH(θTCB)

0 100 200 300 400

312

314

316

318

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 31: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Simulation Studies

Simulation Data - HapMap3 Data

We utilize a pedigree structureused in other simulation studies

Seven generation family - onlyconsider 20 individuals

Most closely related was degreethree ( 1

8 genetic information)

Most distantly related wasdegree eleven ( 1

2048 geneticinformation)

Still unrelated individuals inthis sample

Liebners

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 32: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Simulation Studies

Simulation Design - Relatedness

1 Create a sample of 500 individuals by iteratively sampling 10 personblocks from the Liebner pedigree

2 Record the relatedness of individuals within the blocks

3 Set relatedness for individuals not in the same block to 0

A1 =

A1 0 . . . 00 A2 . . . 0...

.... . .

...0 0 . . . A50

4 Use the genetic information for this pedigree to attain A1(θ)

5 Compare these estimates to the true A1

6 Repeat this process ten times (e.g. A1,A2, . . . ,A10)

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 33: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Simulation Studies

Relatedness Results

● ●

0.0

0.1

0.2

0.3

3 4 5 6

Degree of Relatedness

RM

SE

Method

NS

TCB

TCBS

TCS

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 34: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Simulation Studies

Relatedness Results (cont.)

●●

●●● ●0.00

0.01

0.02

0.03

7 8 9 10 11

Degree of Relatedness

RM

SE

Method

NS

TCB

TCBS

TCS

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 35: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Simulation Studies

Simulation Design - Heritability

1 Use Phenotype Model (2), y = µµµ+ g + εεε, to generate ten phenotypevectors with heritability σ2

g

2 Do this for σ2g ∈ {.1, .2, . . . , .9}

3 Do this for all ten population structures represented byA1,A2, . . . ,A10

4 In aggregate, each population will have 10 phenotype vectors for eachσ2g considered

5 Use A(θ) in the REML algorithm to estimate heritability, σ2g , and

compare to the known σ2g

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 36: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Simulation Studies

Heritability Results

●●

● ●

●●

●●

●●

●●

●● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

0.00

0.25

0.50

0.75

1.00

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

σg2

σ g2

Method

NS

TCB

TCBS

TCS

Huh?

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 37: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Simulation Studies

Is this really how academics fight?

Kumar et al. (2016) - January 5, 2016

Here, we show that GCTA applied to current SNP data cannot producereliable or stable estimates of heritability.

Yang et al. (2016) - July 25, 2016

We show below that those claims are false due to their misunderstandingof the theory and practice of random-effect models underlyinggenome-wide complex trait analysis.

Kumar et al. (2016) - July 25, 2016

We do not understand the basis for the claim that “the GREML fits all ofthe SNPs jointly in a random-effect model so that each SNP effect isfitted conditioning on the joint effects of all of the SNPs.” Although Yangand colleagues insist on this fact, they do not provide any mathematicaljustification for this conclusion.

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 38: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Simulation Studies

Simulation Take-Aways

Relatedness

Our newly proposed methods, like most shrinkage estimators, fail toestimate close relatives accuratelyRefine the estimate of distant relatednessOffer comparable, if not better, estimates for relatedness above 5thdegree

Heritability

Uhhh...Quite difficult to attain any reasonable interpretation of these resultsNeed a better REML algorithm to utilize this modelDoctoral thesis? Draves et al. (2021)

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 39: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Health Aging and Body Composition Study

Health ABC - Study Description

3,075 men and women from Memphis and Pittsburgh areas betweenthe ages of 70 and 79

45% of women and 33% of men self reported African-American race

We only consider the 1663 individuals who self reported White raceseeing they made up the majority of the sample

The study records and maintains SNP level information as well asseveral phenotypes

Body Mass Index (BMI)Abdominal Visceral Fat Density (AVFD)

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 40: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Health Aging and Body Composition Study

Relatedness Estimates

NS

Relatedness

Density

-0.02 0.00 0.02 0.04 0.06

020

4060

80TCS

Relatedness

Density

-0.02 0.00 0.02 0.04 0.06

020

4060

80

TCB

Relatedness

Density

-0.02 0.00 0.02 0.04 0.06

020

4060

80

TCBS

Relatedness

Density

-0.02 0.00 0.02 0.04 0.06

020

4060

80

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 41: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Health Aging and Body Composition Study

Heritability of BMI and AVFD

BMI is 30-40% heritable [Zhang and Lupski (2015)]

AVFD has maximal heritability, including non-genetic factors, of 48%[Rice et al. (1997)]

Method BMI AVFD

NS 44.5% 14.6%TCS 99.9% 54.0%TCB 22.8% 17.0%

TCBS 15.4% 18.0%

TCS over estimates the heritability for both traits

TCB and TCBS have more stable behavior and appropriately estimatethese parameters

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 42: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Conclusion

Conclusions

This thesis develops two new methods that better utilize genome-levelgenetic data

These methods better represent the inherent familial blockings withinlarge samples to better estimate distant relatedness

Our methods offer comparable estimates of relatedness for degree 5relatives and higher

We refine the estimate of relatedness for degree 7 and higher

These better estimates should lead to better estimates of heritability

Applying these methods to the Health ABC study, we show ourmethods stabilize the estimate of heritability in this setting

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 43: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Conclusion

Future Work

Better parameter selection - hierarchical clustering methods

Account for SNP - SNP correlation via genetic distant & othercorrelation metrics

Implement decompositions into software package

Implement alternative methodologies for estimating heritability (e.g.regression techniques, mixture modeling)

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 44: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Conclusion

Thank You

Advisor: Trent Gaugler

Committee: Eric Ho & Joy Zhou

Jayne Trent

Josh Arfin

Math Lounge Rabble

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 45: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Conclusion

Questions?

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 46: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Conclusion

References I

Crossett, A., A. B. Lee, L. Klei, B. Devlin and K. Roeder (2013):”Refining genetically inferred relationships using treelet covariancesmoothing,” The Annals of Applied Statistics, 7, 669-690.

Gaugler, T., L. Klei, S. J. Sanders, C. A. Bodea, A. P. Goldberg, A. B.Lee, M. Mahajan, D. Manaa, Y. Pawitan, J. Reichert, S. Ripke, S.Sandin, P. Sklar, O. Svantesson, A. Reichenberg, C. M. Hultman, B.Devlin, K. Roeder and J. D. Buxbaum (2014): ”Most genetic risk forautism resides with common variation,” Nature Genetics, 46, 881-885.

Kumar S.K.,Feldman M.W., Rehkopf D.H., and Tuljapurkar S. (2015).Limitations of GCTA as a solution to the missing heritability problem.PNAS 2016 113 (1) E61-E70; published ahead of print December 22,2015.

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 47: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Conclusion

References II

Lee, A. B., Nadler, B. and Wasserman, L. (2008). Treeletsan adaptivemulti-scale basis for sparse unordered data. Ann. Appl. Stat. 2 435471.

Rice, T., Desprs, J. P., Daw, E. W., Gagnon, J., Borecki, I. B., Prusse,L., Leon, A. S., Skinner, J. S., Wilmore, J. H., Rao, D. C., andBouchard, C. (1997). Familial Resemeblance for Abdominal ViseralFat: The HERITAGE Family Study. Int J Obes Relat Metab Disord 21(11), 1024-1031.

Yang, J., Benyamin, B., McEvoy, B. P.,Gordon, S.,Henders, A.K.,Nyholt, D. R., Madden, P. A., Heath, A. C., Martin, N. G.,Montgomery, G. W. et al. (2010a). Common SNPs explain a largeproportion of the heritability for human height. Nature Genetics 42565569.

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Page 48: Treelet Covariance Smoothers - Benjamindravesb.github.io/presentation-slides/thesis_defense.pdf · Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves1 1Department

Conclusion

References III

Yang, J., Lee, S. H., Goddard, M. E. and Visscher, P. M. (2010b).GCTA: A tool for genome-wide complex trait analysis. The AmericanJournal of Human Genetics 88 7682.

ang, J., Lee, S.H., Wray, N.R., Goddard, M.E., and Cisscher, P.M.(2016). GCTA-GREML accounts for linkage disequilibrium whenestimating genetic variance from genome-wide SNPs. PNAS 2016 113(32) E4579-E4580; published ahead of print July 25, 2016.

Zhang F, Lupski JR (2015). Non-coding genetic variants in humandisease. Hum Mol Genet. 2015;24:R10210. doi: 10.1093/hmg/ddv259

Benjamin Draves (Lafayette College) Treelet Covariance Smoothers