treelet covariance smoothers - github pages
TRANSCRIPT
Treelet Covariance SmoothersEstimation of Genetic Parameters
B. Draves1
1Department of MathematicsLafayette College
Advisor: T. Gaugler
Moravian College, 2017
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 1 / 11
Treelet Covariance Smoothers
Overview
Principle Component Analysis
Motivation in Statistical Genetics
Existing Methods
New Methods & Results
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 2 / 11
Treelet Covariance Smoothers
Overview
Principle Component Analysis
Motivation in Statistical Genetics
Existing Methods
New Methods & Results
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 2 / 11
Treelet Covariance Smoothers
Overview
Principle Component Analysis
Motivation in Statistical Genetics
Existing Methods
New Methods & Results
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 2 / 11
Treelet Covariance Smoothers
Overview
Principle Component Analysis
Motivation in Statistical Genetics
Existing Methods
New Methods & Results
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 2 / 11
Treelet Covariance Smoothers
Principle Component Analysis (PCA)
PCA looks to reduce the number of input variables X1,X2, . . . ,Xk (klarge) to k0 < k variables
Reduce to fewer inputs while preserving as much variability in thedata as possible
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 3 / 11
Treelet Covariance Smoothers
Principle Component Analysis (PCA)
PCA looks to reduce the number of input variables X1,X2, . . . ,Xk (klarge) to k0 < k variables
Reduce to fewer inputs while preserving as much variability in thedata as possible
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 3 / 11
Treelet Covariance Smoothers
Principle Component Analysis (PCA)
PCA looks to reduce the number of input variables X1,X2, . . . ,Xk (klarge) to k0 < k variables
Reduce to fewer inputs while preserving as much variability in thedata as possible
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 3 / 11
Treelet Covariance Smoothers
Principle Component Analysis (PCA)
PCA looks to reduce the number of input variables X1,X2, . . . ,Xk (klarge) to k0 < k variables
Reduce to fewer inputs while preserving as much variability in thedata as possible
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 3 / 11
Treelet Covariance Smoothers
Principle Component Analysis (PCA)
PCA looks to reduce the number of input variables X1,X2, . . . ,Xk (klarge) to k0 < k variables
Reduce to fewer inputs while preserving as much variability in thedata as possible
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 3 / 11
Treelet Covariance Smoothers
Genetics Motivation
Individual’s genetic material can be described by a panel of genotypedSNPs
Using this genetic information, an estimate of the relationship matrix,A, can be calculated
Genetically inferred relationship matrices are typically very noisy
Idea: use PCA to set the data on top of a more “natural” basis
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 4 / 11
Treelet Covariance Smoothers
Genetics Motivation
Individual’s genetic material can be described by a panel of genotypedSNPs
Using this genetic information, an estimate of the relationship matrix,A, can be calculated
Genetically inferred relationship matrices are typically very noisy
Idea: use PCA to set the data on top of a more “natural” basis
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 4 / 11
Treelet Covariance Smoothers
Genetics Motivation
Individual’s genetic material can be described by a panel of genotypedSNPs
Using this genetic information, an estimate of the relationship matrix,A, can be calculated
Genetically inferred relationship matrices are typically very noisy
Idea: use PCA to set the data on top of a more “natural” basis
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 4 / 11
Treelet Covariance Smoothers
Genetics Motivation
Individual’s genetic material can be described by a panel of genotypedSNPs
Using this genetic information, an estimate of the relationship matrix,A, can be calculated
Genetically inferred relationship matrices are typically very noisy
Idea: use PCA to set the data on top of a more “natural” basis
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 4 / 11
Treelet Covariance Smoothers
Treelet Covariance Smoothing
Goal: preserve local structure of data
Iteratively change basis via PCA to preserve the variability betweentwo most closely related individuals
Repeat this process until all individuals are processed
Once all individuals are processed, enforce sparsity to improve theestimator
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 5 / 11
Treelet Covariance Smoothers
Treelet Covariance Smoothing
Goal: preserve local structure of data
Iteratively change basis via PCA to preserve the variability betweentwo most closely related individuals
Repeat this process until all individuals are processed
Once all individuals are processed, enforce sparsity to improve theestimator
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 5 / 11
Treelet Covariance Smoothers
Treelet Covariance Smoothing
Goal: preserve local structure of data
Iteratively change basis via PCA to preserve the variability betweentwo most closely related individuals
Repeat this process until all individuals are processed
Once all individuals are processed, enforce sparsity to improve theestimator
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 5 / 11
Treelet Covariance Smoothers
Treelet Covariance Smoothing
Goal: preserve local structure of data
Iteratively change basis via PCA to preserve the variability betweentwo most closely related individuals
Repeat this process until all individuals are processed
Once all individuals are processed, enforce sparsity to improve theestimator
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 5 / 11
Treelet Covariance Smoothers
TCS Properties
P8
P7
P6
P5
P4
P3
P2
P1
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 6 / 11
Treelet Covariance Smoothers
TCS Properties
P8
P7
P6
P5
P4
P3
P2
P1
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 6 / 11
Treelet Covariance Smoothers
TCS Properties
P8
P7
P6
P5
P4
P3
P2
P1
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 6 / 11
Treelet Covariance Smoothers
TCS Properties
P8
P7
P6
P5
P4
P3
P2
P1
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 6 / 11
Treelet Covariance Smoothers
TCS Properties
P8
P7
P6
P5
P4
P3
P2
P1
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 6 / 11
Treelet Covariance Smoothers
TCS Properties
P8
P7
P6
P5
P4
P3
P2
P1
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 6 / 11
Treelet Covariance Smoothers
TCS Properties
P8
P7
P6
P5
P4
P3
P2
P1
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 6 / 11
Treelet Covariance Smoothers
TCS Properties
P8
P7
P6
P5
P4
P3
P2
P1
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 6 / 11
Treelet Covariance Smoothers
Expanding TCS
Why do we require merging into one cluster?
Idea: Stop merging variables to utilize familiar blocks
Smooth by projecting data onto sum variables
Treelet Covariance Blocking (TCB)
Further enforce sparsity by thresholding estimates
Treelet Covariance Blocked Smoothing (TCBS)
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 7 / 11
Treelet Covariance Smoothers
Expanding TCS
Why do we require merging into one cluster?
Idea: Stop merging variables to utilize familiar blocks
Smooth by projecting data onto sum variables
Treelet Covariance Blocking (TCB)
Further enforce sparsity by thresholding estimates
Treelet Covariance Blocked Smoothing (TCBS)
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 7 / 11
Treelet Covariance Smoothers
Expanding TCS
Why do we require merging into one cluster?
Idea: Stop merging variables to utilize familiar blocks
Smooth by projecting data onto sum variables
Treelet Covariance Blocking (TCB)
Further enforce sparsity by thresholding estimates
Treelet Covariance Blocked Smoothing (TCBS)
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 7 / 11
Treelet Covariance Smoothers
Expanding TCS
Why do we require merging into one cluster?
Idea: Stop merging variables to utilize familiar blocks
Smooth by projecting data onto sum variables
Treelet Covariance Blocking (TCB)
Further enforce sparsity by thresholding estimates
Treelet Covariance Blocked Smoothing (TCBS)
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 7 / 11
Treelet Covariance Smoothers
Treelet Covariance Smoothers
At each level of the tree there are basis vectors {v (`)1 , . . . , v(`)N }
We can then write our estimate of A, Σ, by
Σ =∑i∈S`
γ(`)i ,i v
(`)i
(v(`)i
)t+
∑i ,j∈S`,i 6=j
γ(`)i ,j v
(`)i
(v(`)j
)t
Here γi ,j(`) are the variance - covariance estimates of transformed
relationship variables
We can enforce sparsity by further thresholding these γi ,j(`) values
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 8 / 11
Treelet Covariance Smoothers
Treelet Covariance Smoothers
At each level of the tree there are basis vectors {v (`)1 , . . . , v(`)N }
We can then write our estimate of A, Σ, by
Σ =∑i∈S`
γ(`)i ,i v
(`)i
(v(`)i
)t+
∑i ,j∈S`,i 6=j
γ(`)i ,j v
(`)i
(v(`)j
)t
Here γi ,j(`) are the variance - covariance estimates of transformed
relationship variables
We can enforce sparsity by further thresholding these γi ,j(`) values
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 8 / 11
Treelet Covariance Smoothers
Treelet Covariance Smoothers
At each level of the tree there are basis vectors {v (`)1 , . . . , v(`)N }
We can then write our estimate of A, Σ, by
Σ =∑i∈S`
γ(`)i ,i v
(`)i
(v(`)i
)t+
∑i ,j∈S`,i 6=j
γ(`)i ,j v
(`)i
(v(`)j
)t
Here γi ,j(`) are the variance - covariance estimates of transformed
relationship variables
We can enforce sparsity by further thresholding these γi ,j(`) values
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 8 / 11
Treelet Covariance Smoothers
Treelet Covariance Smoothers
At each level of the tree there are basis vectors {v (`)1 , . . . , v(`)N }
We can then write our estimate of A, Σ, by
Σ =∑i∈S`
γ(`)i ,i v
(`)i
(v(`)i
)t+
∑i ,j∈S`,i 6=j
γ(`)i ,j v
(`)i
(v(`)j
)t
Here γi ,j(`) are the variance - covariance estimates of transformed
relationship variables
We can enforce sparsity by further thresholding these γi ,j(`) values
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 8 / 11
Treelet Covariance Smoothers
Simulation Results
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 9 / 11
Treelet Covariance Smoothers
Conclusion
Treelet Covariance Smoothers is a class of methods which improvethe estimation of distant relationships
Estimating relationships is the centerpiece of successful estimation ofother genetic parameters such as heritability
Understanding of genetic basis of heritability can lead to bettertreatment
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 10 / 11
Treelet Covariance Smoothers
Conclusion
Treelet Covariance Smoothers is a class of methods which improvethe estimation of distant relationships
Estimating relationships is the centerpiece of successful estimation ofother genetic parameters such as heritability
Understanding of genetic basis of heritability can lead to bettertreatment
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 10 / 11
Treelet Covariance Smoothers
Conclusion
Treelet Covariance Smoothers is a class of methods which improvethe estimation of distant relationships
Estimating relationships is the centerpiece of successful estimation ofother genetic parameters such as heritability
Understanding of genetic basis of heritability can lead to bettertreatment
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 10 / 11
Treelet Covariance Smoothers
Thanks for listening
Questions? Comments?
B. Draves (Lafayette College) Treelet Covariance Smoothers Moravian College, 2017 11 / 11