objective bayesian nets for integrating cancer knowledge sylvia nagl phd cancer systems biology &...

Download Objective Bayesian Nets for Integrating Cancer Knowledge Sylvia Nagl PhD Cancer Systems Biology & Biomedical Informatics UCL London

Post on 02-Jan-2016

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Objective Bayesian Nets for Integrating Cancer KnowledgeSylvia Nagl PhDCancer Systems Biology & Biomedical Informatics

    UCL London

  • caOBNET: OverviewKnowledge integration by objective Bayesian networks (obNETS)

    Maximum entropy method

    An integrated clinico-genomic obNET for breast cancer

    Conclusions

  • Bayesian networksGraphical models directed and acyclic graph (DAG)

    Joint multivariate probability distribution with conditional independencies between variables

    Given the data, optimal network topology can be estimated

    heuristic search algorithms and scoring criteria

    Statistical significance of edge strengths

    Bayesian methods bootstrapping

    Apolipoprotein E gene SNPs and plasma apoE level Rodin & Boerwinkle 2005

  • Knowledge integrationCancer treatment decisions should be based on all available knowledge

    Knowledge is complex and varied: Patient's symptoms, expert knowledge, clinical databases relating to past patients, molecular databases, scientific papers, medical informatics systems Generated by independent studies withdiverse protocols

  • Knowledge integrationDiverse data typesGenomic, transcriptomic, proteomic, SNPs, tissue microarray, histopathology, clinical etc.New data types, e.g., epigenetic data

    All data types capture different characteristics of a dynamic complex systemAt different spatial and temporal scalesCell, tumour, patient, and therapeutic system of patient-therapy interactions

    How can this disparate data be used for an integrated understanding on which to base our actions?

  • Objective BayesianismData and knowledge impinge on belief we try to find a coherent set of beliefs with best fit Beliefs based on undefeated items of knowledgeIn case of conflict, try to find compromise beliefs

    Objective Bayesianism offers a formalism for determining the beliefs that best fit background knowledge

    Applying Bayesian theory, an agents degree of belief should be representable by a probability function p

    Empirical knowledge imposes quantitative constraints on p

    Represented in an obNET (learnt from database)

  • obNETS for prediction

    Standard algorithms can be used to calculate the probability of a specific outcome

    A direct link between variables may suggest a causal connection

  • Bayesian networks

    Can BNs be integrated?

    Spanning genetic/molecular and clinical levels

    obNETS offer a principled path to knowledge integration

  • Maximum entropy principle

    Adopt p, from all those that satisfy the constraints, that are maximally equivocal

    Williamson, J.(2002) Maximising Entropy Efficiently. Williamson, J. (2005a): Bayesian Nets and Causality. Williamson, J. (2005b): Objective Bayesian nets. www.kent.ac.uk/secl/philosophy/jw/

  • ExampleTwo items of empirical knowledge may conflict:

    Study 1: Cancer will recur in 50% of patients with given set of characteristicsDegree of belief in recurrence in individual patient = 0.5Study 2: Frequency of recurrence is 30%

    Degree of belief will be constrained to closed interval [0.3,0.5]

    In general:Belief function will lie within a closed set of probability functionsThere will be a unique function that maximises entropy

  • obNet integration

  • obNet integrationOriginal obNETs provide probability distributions

  • obNET integration

  • obNET integration

  • obNET integrationn number of nets

  • obNET integrationMaximum entropy principleIf CPTs for merged nodes disagree on probabilities, assign closed interval and take least committal value in that range

  • obNET integration: Proof of principleTwo obNETs from breast cancer knowledge domain

    Genomic: Comparative genome hybridisation (CGH) data - progenetix databaseSubset of bands with 3 or more genes implicated in tumour progression and response to cytotoxic therapies (28 bands)

    Clinical: American Surveillance, Epidemiology and End results (SEER) database

  • Clinical and genomic nets (Hugin 6.6)SEER database 4731 cases

    progenetix database 28 bands/502 cases?

  • obNet integrationobNet learnt from 2nd progenetix dataset - 119 cases with clinical annotation (lymph node status, tumour size, grade)22q12: -1 0 1LN:0 0.148 0.5 0.148 1 0.852 0.5 0.852CPT

  • Additional empirical knowledgeFridlyand et al. 2006chr. 22

  • obNet integrationFridlyand et al. 2006chr. 22CPT

  • obNet integrationFridlyand et al. 2006chr. 22CPT

  • KREMEN1MYH9cadherin11CD97BMP7, ELMO2, BCAS1, BCAS4, ZNF217Metastasis-associated genes

  • KREMEN1Howard et al., 2003 Biological knowledge suggests possible causal link(in context of whole obNET HR status!)

  • Knowledge integrationMolecular profiling of tumoursCancer clinical data & epidemiologyTranslation of clinical data to genomics research Multi-scale obNETsPredictive markers

  • Acknowledgements

    Jon Williamson (Philosophy, Unversity of Kent)www.kent.ac.uk/secl/philosophy/jw/

    Matt Williams (Cancer Research UK) Nadjet El-Mehidi (Cancer Systems Biology, UCL)Vivek Patkar (Cancer Research UK)

    Contact: s.nagl@ucl.ac.uk

  • obNET integration: Proof of principleTwo obNETsNon-independent rearrangements at chromosomal locations in breast cancer from comparative genome hybridisation (CGH) data - progenetix databaseSubset of bands with 3 or more genes implicated in tumour progression and response to cytotoxic therapies (28 bands)

    Probabilistic dependencies between clinical parameters from the American Surveillance, Epidemiology and End results (SEER) database

  • HR status link

  • Genomic systemsGenomes are dynamic molecular systems Selection acts on unstable cancer genomes as integrated wholes, not just on individual oncogenes or tumour suppressors.

    A multitude of ways to solve the problems of achieving a survival advantage in cancer cells:Irreversible evolutionary processesRandomness of mutationModularity and redundancy of complex systems

  • Genome-wide rearrangementsCan we identify probabilistic dependency networks in large sample sets of genomic data from individual tumours? If so, under which conditions may these be interpreted as causal networks?

    Can we identify probabilistic dependency networks involving molecular and clinical levels?

  • Systems Biology and CausationProfound conceptual challenge regarding physical causation in complex biological systems

    Mutual dependence of physical causes

    The biological relevance of any factor, and therefore the information it conveys, is jointly determined, frequently in a statistically interactive fashion, by that factor and the system state (Susan Oyama, The Ontogeny of Information, 2000)The influence of a gene, or a genetic mutation, depends on the context, such as availability of other molecular agents and the state of the biological system, including the rest of the genome

  • Cell networks are dynamically instantiated genes for components are switched on or off in response to signals and cell stateSystem stateagents

  • Cell networks are reconfigured in response to changes in environment or cells internal stateSystem state

  • Cell computation networks are reconfigured in response to changes in environment or cells internal stateSystem state

  • Cancer: Genome instability re-programs cell networksSelection for increased proliferation, resistance, invasiveness etc.Driven by tumour cell tissue interactions

  • Genome-wide rearrangementsCan we identify probabilistic dependency networks in large sample sets of genomic data from individual tumours?

    Can we identify probabilistic dependency networks involving molecular and clinical levels?

  • Proof of principleScreen the whole genome for chromosomal abnormalities in one experiment

    Cytogenetics

    Comparative genomic hybridization (CGH)Fluorescence in situ hybridization (FISH) and multicolour fluorescence in situ hybridization (MFISH)Detection of allelic instabilities, loss of heterozygosity (LOH)

    Markov condition: Conditional on its parents, any variable is probabilistically independent of all other variables apart from its descendants.CPT represents the least committal belief we can adopt given the dataAdditional empirical data further constrain our beliefComponents are put in place by the cell by switching genes on or off or in/de-creasing numbers of components