an empirical framework for genome-wide single nucleotide polymorphism-based predictive modeling

Upload: amia

Post on 03-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    1/16

    An empirical framework

    for genome-wide

    single nucleotide

    polymorphism-basedpredictive modeling

    Charalampos S. Floudas, MD, PhD, MS

    Jeya Balaji Balasubramanian, MS

    Marjorie Romkes, PhD

    Vanathi Gopalakrishnan, PhD

    Department of Biomedical Informatics

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    2/16

    Department of Biomedical InformaticsPRoBE Lab

    2 of 16

    TBI 2013

    A workflow for prediction in cancer

    Predicting Risk of early recurrence in

    early stage non-small cell lung cancer

    (NSCLC)

    SNPR workflow

    Genome-wide Single Nucleotide

    Polymorphisms (SNP)

    Bayesian rule learning (BRL) system

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    3/16

    Department of Biomedical InformaticsPRoBE Lab

    3 of 16

    TBI 2013

    Translational Bioinformatics

    Includes prediction of clinical outcomes

    from available genomic data

    Genomic data:

    High-dimensional

    Many modalities

    Different aspects of disease

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    4/16

    Department of Biomedical InformaticsPRoBE Lab

    4 of 16

    TBI 2013

    Translational Bioinformatics

    Multiple clinical outcomes

    Combinations of datasets and outcomes

    Collaborative effort

    Many tools available

    Workflows

    Flexibility of design

    Reproducibility of research

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    5/16

    Department of Biomedical InformaticsPRoBE Lab

    5 of 16

    TBI 2013

    Core elements

    Subjects: 86 early stage NSCLC patients

    University of Pittsburgh Cancer Institute

    Lung SPORE cohort

    Importance

    Predictors dataset: Affymetrix SNP Array

    6.0, 1 million SNPs

    Outcome: categorical disease freesurvival (DFS), good vs. poor, 1952 days

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    6/16

    Department of Biomedical InformaticsPRoBE Lab

    6 of 16

    TBI 2013

    Workflow tools

    Affymetrix Genotyping Console

    Quality control (QC), genotype calling

    After QC: 67 samples (50 poor DFS, 17 good)

    PLINK

    QC of Genotypes (MAF, etc.) feature selection

    (2) for BRL and export of features

    BRL system

    Predictive rules (sets of SNPs) and metrics

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    7/16

    Department of Biomedical InformaticsPRoBE Lab

    7 of 16

    TBI 2013

    BRL system elements

    Rule learner (RL)

    Bayesian Rule Learner (BRL)

    Bayesian scoring induces Bayesian networks

    Rule models

    Global (GBRL): full

    Local (LBRL): decision tree representation

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    8/16

    Department of Biomedical InformaticsPRoBE Lab

    8 of 16

    TBI 2013

    Workflow tools

    SQLite

    Fine selection of datasets and clinical parameters

    Unix command line tools

    Operations on datasets (Affymetrix genotypes to

    PLINK, PLINK selected features to BRL)

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    9/16

    Department of Biomedical InformaticsPRoBE Lab

    9 of 16

    TBI 2013

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    10/16

    Department of Biomedical InformaticsPRoBE Lab

    10 of 16

    TBI 2013

    Results - feature selection

    100 SNPs from PLINK 2

    44 intragenic -> 33 genes

    Functional analysis (Ingenuity IPA)

    most significantly associated disease is

    cancer (9 of 33 genes)

    most significantly associated biological

    function cell-to-cell signaling andinteraction (8 of 33 genes)

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    11/16

    Department of Biomedical InformaticsPRoBE Lab

    11 of 16

    TBI 2013

    Results - feature selection

    CHODL (chondrolectin) gene

    associated with shorter survivalin NSCLC

    CDH13 (cadherin 13) gene

    hypermethylated in NSCLC

    CHST11 (carbohydrate (chondroitin 4)

    sulfotransferase 11) gene

    associated with lung colonization in breast

    cancer

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    12/16

    Department of Biomedical InformaticsPRoBE Lab

    12 of 16

    TBI 2013

    Results BRL prediction

    5 fold cross

    validation

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    13/16

    Department of Biomedical InformaticsPRoBE Lab

    13 of 16

    TBI 2013

    Conclusions

    Our empirical workflow (SNPR)

    Efficiently overcomes challenges of

    prediction using high-dimensional datasets

    Achieves biological relevance and goodpredictive performance

    Can be generalized and adapted

    Other experimental platforms, data miningtasks

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    14/16

    Department of Biomedical InformaticsPRoBE Lab

    14 of 16

    TBI 2013

    Limitations

    Small sample size

    No independent testing cohort

    Categorization of survival instead of

    time-to-event analysis

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    15/16

    Department of Biomedical InformaticsPRoBE Lab

    15 of 16

    TBI 2013

    Acknowledgments

    Cancer Biomarkers Facility of the University ofPittsburgh Cancer Institute, award

    P30CA047904

    Grant support: National Cancer Institute Award Number

    P50CA090440

    National Library of Medicine Award Number

    R01LM010950

    National Institute of General Medical Sciences

    Award Number R01GM100387

  • 7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling

    16/16

    Department of Biomedical InformaticsPRoBE Lab

    16 of 16

    TBI 2013

    Thank you

    [email protected]