predicting regulatory variants with composite statisticvariants • composite strategy takes...

19
Predicting regulatory variants with composite statistic MJ Li et al. Presented by Yuchuan Wang 10/03/2016 1

Upload: others

Post on 15-Mar-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

Predicting regulatory variants with composite statistic

MJ Li et al.

Presented by Yuchuan Wang

10/03/2016

1

Page 2: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

Introduction• Prediction and prioritization of human non-coding regulatory variants

• Existing tools utilize functional genomics data and evolutionary information to evaluate the functions of non-coding variants

• Different algorithms have inconsistent and even conflicting predictions

• Integrate prediction scores from eight tools that prevalently used in predicting non-coding regulatory variants

2

Page 3: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

Methods• Variant prediction scores collection and processing

3

Page 4: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

Methods• Variant prediction scores collection and processing

Score Name Source Link Pre-calculated

CADD_CScore, CADD_PHRED http://krishna.gs.washington.edu/download/CADD/v1.1/whole_genome_SNVs.tsv.gz Y

DANN https://cbcl.ics.uci.edu/public_data/DANN/data/DANN_whole_genome_SNVs.tsv.bgz Y

FunSeq http://funseq.gersteinlab.org/data Y

FunSeq2 http://archive.gersteinlab.org/funseq2/hg19_wg_score.tsv.gz Y

GWAS3D http://jjwanglab.org/gwas3d N

GWAVA_Region,

GWAVA_TSS,

GWAVA_Unmatched

ftp://ftp.sanger.ac.uk/pub/resources/software/gwava/v1.0/annotated/gwava_db_csv.tgz N

SuRFR http://www.cgem.ed.ac.uk/resources/SuRFR/SuRFR_0.99.0.tar.gz N

FATHMM-MKL http://fathmm.biocompute.org.uk/database/fathmm-MKL_Current.tab.gz Y 4

Page 5: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

MethodsConstruction of training/testing datasets• A disease-causal or functional regulatory variants dataset by combining four

different resources

• Manually curated 81 experimentally validated regulatory variants from recent publications, which served as an independent dataset for causal variants in evaluating existing algorithms and our model.

5

Page 6: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

MethodsComposite model• Calculate the pdf of scores from each of the eight tools• Assuming the independence between tests (tools)

6

Page 7: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

MethodsComposite model• Calculate the pdf of scores from each of the eight tools• Assuming the independence between tests (tools)

Given a set of scores S𝑆 = 𝑠1, 𝑠2, … , 𝑠𝑛

we can calculated the Bayes factor (BF)

𝐵𝐹 =ෑ

𝑖=1

𝑛𝑃(𝑠𝑖|𝑐𝑎𝑠𝑢𝑎𝑙)

𝑃(𝑠𝑖|𝑛𝑒𝑢𝑡𝑟𝑎𝑙)

7

Page 8: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

MethodsComposite model• The probability of the variant being causal is computed as the composite

likelihood

𝑃(𝑐𝑎𝑠𝑢𝑎𝑙|𝑆) =ෑ

𝑖=1

𝑛𝑃(𝑠𝑖|𝑐𝑎𝑠𝑢𝑎𝑙) × 𝜋

𝑃 𝑠𝑖 𝑐𝑎𝑠𝑢𝑎𝑙 × 𝜋 + 𝑃(𝑠𝑖|𝑛𝑒𝑢𝑡𝑟𝑎𝑙) × (1 − 𝜋)

• Use flat prior probability π = 0.5 for the causal probability of each variant

8

Page 9: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

ResultsIntegrative resources for non-coding regulatory variant functional annotation and prediction• Prediction scores for around 8.6 billion possible SNPs in human genome

• 5247 genome-wide non-redundant variants with reliable causal evidence as the training set

• A control dataset (10 times that of the positive data) that do not contain casual and disease-associated variants

• Independent QTL datasets collected

9

Page 10: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

ResultsIntegrative resources for non-coding regulatory variant functional annotation and prediction

Mulin Jun Li et al. Bioinformatics 2016;32:2729-2736 10

Page 11: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

ResultsExisting methods show inconsistent prioritization of non-coding regulatory variants• Spearman’s Rank Correlation (SRC) tests for each pair of algorithms

11

Page 12: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

SRC among eight tools for (A) refined causal dataset and (B) curated experimentally validated

dataset.

Mulin Jun Li et al. Bioinformatics 2016;32:2729-2736 12

Page 13: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

ResultsComposite of multiple signals improves casual regulatory variant detection• A composite likelihood statistic and estimated the probability of the

investigated variant being causal• Ten-fold cross-validation• AUC 0.84 and MCC 0.41

13

Page 14: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

Regulatory variant predictions performance of different methods.

Mulin Jun Li et al. Bioinformatics 2016;32:2729-2736 14

Page 15: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

ResultsComposite of multiple signals improves casual regulatory variant detection• Combining only a subset of the eight

methods can achieve better predictive power

• CADD_Cscore, GWAVA_TSS, GWAS3D and SuRFR

Mulin Jun Li et al. Bioinformatics 2016;32:2729-2736 15

Page 16: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

ResultsEvaluation of composite model on eQTL, allelic imbalance and dsQTL datasets• Three independent human QTLs datasets to further validate the capacity of

full composite model• Improvement in predicting eQTLs (AUC of 0.81) and allelic imbalanced loci

(AUC of 0.92)• Similar performance as FunSeq2 for the dsQTLs dataset (both for AUC of

0.71)

16

Page 17: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

Performance of regulatory QTLs prediction from different methods.

Mulin Jun Li et al. Bioinformatics 2016;32:2729-2736 17

Page 18: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

ResultsComparison with unsupervised integrative approach

18

Page 19: Predicting regulatory variants with composite statisticvariants • Composite strategy takes advantage of the complementary attributes of individual tools to achieve a better performance

Conclusion• Existing methods show inconsistent prioritization of non-coding regulatory

variants• Composite strategy takes advantage of the complementary attributes of

individual tools to achieve a better performance• Identifying the high quality and confident causal regulatory variants

training dataset and corresponding control is challenging• The correlations among existing methods may be attributed to the different

perspectives and logics of existing algorithms. • Large and independent gold standard is needed to test the correlation of

different tools and stability of reduced combination model.

19