identifying changes in signaling from high-throughput data

Post on 15-Jan-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Identifying Changes in Signaling from High-Throughput Data. Michael Ochs Fox Chase Cancer Center. Group 1 Patients. Group 2 Patients. Overall Survival (years). 0. 2. 4. 6. 8. 10. The “New” Paradigm. Group 1. Group 2. Targeted Therapies. Personalized Medicine. - PowerPoint PPT Presentation

TRANSCRIPT

Bioinformatics Fox Chase Cancer Center

Identifying Changes in Signaling from High-Throughput Data

Michael OchsFox Chase Cancer Center

Bioinformatics Fox Chase Cancer Center

The “New” Paradigm

Personalized MedicineTa

rget

ed T

herap

ies

Overall Survival (years)

0 2 4 6 8 10

Group 1Patients

Group 2Patients

Group 1 Group 2

Your Chromosomes

Here

Bioinformatics Fox Chase Cancer Center

Outline

• Signaling and Gene Expression

• Bayesian Decomposition

• Examples of Analyses

Bioinformatics Fox Chase Cancer Center

Cellular Signaling

Extracellular Signal

Signal Transduction

Metabolic Changes

Transcription

Downward, Nature, 411, 759, 2001

Bioinformatics Fox Chase Cancer Center

Gene Expression

Bioinformatics Fox Chase Cancer Center

Identifying PathwaysM F H

A

B D

C E

A

B

C

D

E

Bioinformatics Fox Chase Cancer Center

Goal of Analysis

Take measurements of thousands of genes, some of which are responding to stimuli of interest

* *

1 2 3

** *

*

then identify the pathways

And find the correct set of basis vectors that link to pathways

Bioinformatics Fox Chase Cancer Center

Biological ModelBlock Protein-Protein Interaction

Leads to Loss of Some Transcripts, Reduction of Others Depending on Active Signaling Pathways

But the Gene Lists are Incomplete as are theNetwork Diagrams!

Bioinformatics Fox Chase Cancer Center

Issues to Solve

• Overlapping Signals– Genes are involved in multiple processes– Various processes are active

simultaneously in any observed data

• Identification of Process Behind Signal

– If find a signal, what is the cause– Do identification without a complete model

Bioinformatics Fox Chase Cancer Center

Outline

• Signaling and Gene Expression

• Bayesian Decomposition

• Examples of Analyses

Bioinformatics Fox Chase Cancer Center

Data

(Spellman et al, Mol Biol Cell, 9, 3273, 1999)(Cho et al, Mol Cell, 2, 65, 1998)

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Bioinformatics Fox Chase Cancer Center

BD: Identification of Signals

* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Data

X

gene 1

gene N

* * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * *

gene 1

gene N

patt

ern

1

patt

ern

k

cond

itio

n 1

cond

itio

n M

* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * *

pattern 1

pattern k

cond

itio

n 1

cond

itio

n M

Distribution of Patterns

Patterns of Behavior

=

vsMock

complex behavior

is explained as combinations

of simpler behaviors

Bioinformatics Fox Chase Cancer Center

Markov Chain Monte Carlo

Markov Chain Monte Carlo is used to explore the possible solutions

We cannot always solve the problem directly, we can only estimate relative probabilities of possible solutions

Bioinformatics Fox Chase Cancer Center

Bayesian Statistics

p(model | data) =p(data | model) p(model)

p(data)

* * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * *

gene 1

gene N

patt

ern

1

patt

ern

k

* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * *

pattern 1

pattern k

cond

itio

n 1

cond

itio

n M

* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

gene 1

gene N

cond

itio

n 1

cond

itio

n M

X=

Bioinformatics Fox Chase Cancer Center

Outline

• Signaling and Gene Expression

• Bayesian Decomposition

• Examples of Analyses

Bioinformatics Fox Chase Cancer Center

Acknowledgements

• Tom Moloshok (Cell Cycle, Mouse)

• Ghislain Bidaut (Yeast Deletion Mutants)

• Andrew Kossenkov (TFs, YDMs)

• Bill Speier, DJ Datta, Daniel Chung, Ryan Goldstein, Matt Lewandowski

Bioinformatics Fox Chase Cancer Center

Cell Cycle

Tobin and Morel, Asking About Cells, Harcourt Brace, 1997

Bioinformatics Fox Chase Cancer Center

• Data: Expression data of 788 yeast cell-cycle regulated genes [Cho, 1998] across 17 different time points was taken for analysis.

• Coregulation: 11 groups (from 5 to 17 genes in each group – 67 genes in total, 18 from 67 genes belong to more than one group) were composed, based on literature review (not cell cycle literature).

• Analysis: with and without coregulation information

Data

Bioinformatics Fox Chase Cancer Center

Validation

Cherepinsky et al, PNAS, 100, 9668, 2003

Bioinformatics Fox Chase Cancer Center

Sensitivity =

TP

FNFNTP

TP

+=

+ 1

1

TN

FPFPTN

TN

+=

+ 1

1Specificity =

TP true positiveTN true negativeFP false positiveFN false negative

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 - specificity

sensitivity

almost perfectgoodworthless

ROCReceiver Operator Characteristic

Area under the curve is the measurement of algorithm efficacy

ROC Analysis

1 - Specificity

Sen

siti

vity Fraction of called positives that are correct

Fraction of called negatives that are correct

Bioinformatics Fox Chase Cancer Center

Hierarchical Clustering

ROC CurveCherepinsky et al, PNAS, 100, 9668, 2003

Bioinformatics Fox Chase Cancer Center

Bayesian Decomposition

1 - Specificity

Sens

itiv

ity

Bioinformatics Fox Chase Cancer Center

Deletion Mutant Data Set

• 300 Deletion Mutants in S. cerevisiae– Biological/Technical Replicates with Gene

Specific Error Model– Filter Genes

• >25% Data Missing in Ratios or Uncertainties• < 2 Experiments with 3 Fold Change

– Filter Experiments• < 2 Genes Changing by 3 Fold

228 Experiments/764 Genes

(Hughes et al, Cell, 102, 109, 2000)

Bioinformatics Fox Chase Cancer Center

BD: Matrix Decomposition

* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Data

X

gene 1

gene N

* * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * *

gene 1

gene N

patt

ern

1

patt

ern

k

Mut

ant 1

Mut

ant M

* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * *

pattern 1

pattern k

Mut

ant

1

Mut

ant

M

Distribution of Patterns

(what genes are in patterns)

Patterns of Behavior

(does mutant containpattern)

=

Bioinformatics Fox Chase Cancer Center

Analysis

• Bayesian Decomposition– Identify patterns and linked genes– Use genes to determine function

• Interpretation of Functions– Gene Ontology– Transcription factor data

• Validation

Bioinformatics Fox Chase Cancer Center

Use of Ontology: Pattern 1313 15

Bioinformatics Fox Chase Cancer Center

The Other Pattern: 1513 15

Bioinformatics Fox Chase Cancer Center

Transcription Factors

Signaling Pathways

to Transcription Factors

to mRNA Changes

Bioinformatics Fox Chase Cancer Center

Genes from Pattern 13*Fig1*Prm6*Fus1*Ste2*Aga1*Fus3Pes4*Prm1ORF*Bar1

* known to be involved in mating response

known to be regulated by Ste12p

Bioinformatics Fox Chase Cancer Center

Validation

(Posas, et al, Curr Opin Microbiology, 1, 175, 1998)

Amount of Behavior Explained by Mating Pathway for Mutants

Bioinformatics Fox Chase Cancer Center

Pattern 13 Mutants

Bioinformatics Fox Chase Cancer Center

Pattern 15 Mutants

Bioinformatics Fox Chase Cancer Center

Conclusions

• Transcriptional Response Provides Signatures of Pathway Activity

• Ontologies Can Guide Interpretation

• Bayesian Decomposition Can Dissect Strongly Overlapping Signatures

Bioinformatics Fox Chase Cancer Center

AcknowledgementsTom MoloshokJeffrey GrantYue Zhang Elizabeth GoralczykLiat ShimoniLuke Somers (UPenn)Olga TchuvatkinaMichael SlifkerSinoula ApostolouBrendan Reilly

CollaboratorsA. Godwin (FCCC)A. Favorov (GosNIIGenetika)J.-M. Claverie (CNRS)G. Parmigiani (JHU)O. Favorova (RMSU)

Ghislain Bidaut (UPenn CBIL)Andrew KossenkovVladimir Minayev (MPEI)Garo Toby (Dana Farber)Yan ZhouAidan Petersen

Bill Speier (Johns Hopkins)Daniel Chung (Columbia)DJ Datta (UCSF)Elizabeth Faulkner (UPenn)

Frank ManionBob Beck

Fox Chase

Bioinformatics Fox Chase Cancer Center

Patterns as Basis Vectors

PCA

BDFuzzy Clustering

Bioinformatics Fox Chase Cancer Center

MakingProteins

(Phenotype)

Bioinformatics Fox Chase Cancer Center

ROSETTA DATA

• From 5 to 20 patterns were posited in the analysis.

• Results were checked on information about Metabolic Pathways taken from Saccharomyces Genome Database - 11 groups of 4-6 genes, known to be involved in the same metabolic pathways.

• ROC analysis was performed

Bioinformatics Fox Chase Cancer Center

ROSETTA DATA

9 10 11 12 13 14 15 16 17 18 19 200.64

0.66

0.68

0.7

0.72

0.74

0.76

Number of patterns

area under ROC

WITH coregulation infoWITHOUT coregulation info

top related