cancer genome analysis: paradigmcancer genome analysis: paradigm...
TRANSCRIPT
Cancer Genome Analysis: PARADIGM
Inference of pa+ent-‐specific pathway ac+vi+es from mul+-‐dimensional cancer genomics data using
PARADIGM. Bioinforma+cs, 2010. (C. J. Vaske et al.)
02-‐715 Advanced Topics in Computa+onal Genomics
Motivation
• Integra+ve analysis of cancer genome data – Copy number varia+ons, gene expressions
• Leverage pathway informa+on to find frequently occurring pathway perturba+ons – NCI pathway interac+on database, KEGG etc.
Motivation
• Pathway informa+on contains informa+on on how genes are supposed to behave
PARADIGM
PARADIGM Model
• Physical en++es for variables: – Protein-‐coding genes, small molecules, complexes
– Gene families: collec+ons of genes in which any single gene is sufficient to perform a specific func+on
– Abstract processes: the overall role of the pathway, e.g., apoptosis
PARADIGM Model
• Factor graph representa+on of various en++es corresponding to a single gene
PARADIGM Model: Gene Interactions
PARADIGM Model:
• A factor graph for a pathway
Model Specification
• Convert an NCI pathway into a factor graph – NCI pathway to Bayesian network
• Directed network • Each variable takes values of -‐1 (de-‐ac+va+on), 0 (normal), 1 (ac+va+on) – mRNA: over expression for ac+va+on
– Copy number varia+ons: more than two copies for ac+va+ons
• Probability distribu+on of each node – Labeled edges for posi+ve/nega+ve interac+ons – Set the value of the child node as weighted votes from its parents
Model Specification
• Conver+ng the Bayesian network to a factor graph – Assign a factor to each group of variables consis+ng of a node and its
parents
• Z: normaliza+on constant
• ε = 0.001
Inference
• Observed variables: copy number varia+ons, gene expressions
• Unobserved variables: protein, protein ac+vity, overall pathway ac+vity state
• Learn models with EM algorithm – E step: impute the unobserved variables
– M step: what are the parameters?
Log-likelihood Ratio Test
• Test sta+s+c for assessing en+ty i’s ac+vity given data D
– The probabili+es can be obtained by performing inference on the factor graph
Log-likelihood Ratio Test
• Aggrega+ng over mul+ple values en+ty i takes
Dataset
• Breast cancer copy number and gene expression data
• TCGA Glioblastoma copy number and gene expression data
• Pathways from NCI pathway interac+on database (PID)
EM Convergence
• Original data vs. permuted data
Red: real data Green: permuted data
Top PARADIGM Pathways of Breast Cancer
Top PARADIGM Pathways of Glioblastoma
Glioblastoma Subtypes
Survival Rates for Each Subtypes
CircleMap of ErbB2 Pathway
• ER status, IPAs, expression data, and copy-‐number data
Summary
• PARADIGM integrates different types of data, including gene-‐expression, copy number varia+on, and pathway database, in order to infer pathway ac+vi+es for individual cancer pa+ents. – Factor graph model for represen+ng pathway and modeling datasets
– Pathway ac+vi+es inferred by PARADIGM can be used to iden+fy cancer subtypes