spectral algorithms for biological networks · 2014. 6. 13. · nataša pržulj, u. c. irvine...

42
Spectral Algorithms for Biological Networks Des Higham Department of Mathematics University of Strathclyde [email protected] graphnet07 – p.1/42

Upload: others

Post on 26-Jul-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Spectral Algorithms for BiologicalNetworks

Des HighamDepartment of Mathematics

University of Strathclyde

[email protected]

graphnet07 – p.1/42

Page 2: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Collaborators

Julie Morrison, formerly U. Strathclyde

David Gilbert, U. Glasgow

Rainer Breitling, U. Groningen

A lock-and-key model for protein-protein interactions,Bioinformatics, 2006

Nataša Pržulj, U. C. Irvine

Modelling protein-protein interaction networks via astickiness index, J. Royal Society Interface, 2006

Marija Rašajski, U. C. Irvine & U. Belgrade

graphnet07 – p.2/42

Page 3: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

EPSRC-funded project

Theory and Tools for Complex Biological Networks(2007–2010)

Peter Grindrod, University of Reading

Gabriela Kalna, University of Strathclyde

Alastair Spence, University of Bath

Zhivko Stoyanov, University of Bath

Keith Vass, Beatson Inst. for Cancer Research

graphnet07 – p.3/42

Page 4: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Overview

Protein-protein interaction (PPI) networks

Random graph models

Geometric model

Algorithm for testing geometric model

Lock-and-key model

Algorithm for discovering locks and keys

Results on biological data

graphnet07 – p.4/42

Page 5: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Central Dogma of Molecular Biology

graphnet07 – p.5/42

Page 6: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Yeast 2-Hybrid Protein-Protein Interaction Networks

Data:

list of N proteins (nodes)

list of protein pairs (edges)

This is an undirected, unweighted graph

Also, a symmetric N × N matrix of 0’s and 1’sYeast has N ≈ 3, 000

graphnet07 – p.6/42

Page 7: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Uetz et al. 2000, Yeast PPIyeast.gif (GIF Image, 612x695 pixels) http://www-personal.umich.edu/~mejn/networks/yeast.gif

1 of 1 17/10/05 13:54

Specificity and stability in topology of proteinnetworks, S. Maslov & K. Sneppen, Science, 2002

graphnet07 – p.7/42

Page 8: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Adjacency Matrix: Uetz et al. 2000, Yeast PPI

0 100 200 300 400 500 600 700 800 900 1000

0

100

200

300

400

500

600

700

800

900

1000

graphnet07 – p.8/42

Page 9: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Adjacency Matrix: Ito et al. 2001, Yeast PPI

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

graphnet07 – p.9/42

Page 10: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Y2H Protein-Protein Interaction Networks

Noisy: 50–90% false positive, 50–90% false negative

Two types of false positive

Technical: experimental limitations

Biological: don’t occur in vivo

not expressed at same timenot in same sub-cellular compartment, or sametissue

Interactions may also depend on the environment

How can we use this data . . . . . . ?

graphnet07 – p.10/42

Page 11: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Fine details. . .

Typical questions:

Are there any other proteins like protein Y?

What is the biological function of protein X?

Which proteins act together?

What happens if protein Z is removed?

Also: which are the false pos/negs ?

graphnet07 – p.11/42

Page 12: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Big picture. . .

PPI networks are not regularDescribe them by a random graph model?

capture many PPI networks with a small number ofparameters:

distinguish between different organismsget evolutionary insights

generate synthetic data sets to test algorithms

Several random graph “models” have been proposed . . . . . .

graphnet07 – p.12/42

Page 13: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Comparing Networks

Global Measures

Degree distribution

Pathlength distribution

Clustering coefficients

Local Measure

graphlet frequencies . . .

graphnet07 – p.13/42

Page 14: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Graphlet Frequencies (Pržulj et al.)

7 86543

9 1110 12 13 14 15 16

4-node graphlets3-node graphlets

292827262524232221

191817

21

20

5-node graphlets

Frequency of graphlet i (0 ≤ i ≤ 29)

number of graphlets of type i

total number of graphletsgraphnet07 – p.14/42

Page 15: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Geometric Model (Pržulj et al., 2004)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

graphnet07 – p.15/42

Page 16: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Geometric Random Graphrandomly place N nodes in unit square

connect nodes within distance ε

Able to match PPI properties (pathlengths, clusteringcoefficients, degree distributions, graphlet frequencies)

Modeling Interactome: Scale-Free or Geometric?, N.Pržulj, D. Corneil and I. Jurisica, Bioinformatics, 2004

Analyzing Large Biological Networks . . . , N. Pržulj,Ph.D. Thesis, University of Toronto, 2005

Question: Given a PPI network, can we map it on to ageometric random graph?

⇒ develop a tool for reverse engineering a GRG

Given nodes and edges, optimally place the nodes in R2

such that nodes within a distance ε are connectedgraphnet07 – p.16/42

Page 17: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Multi-Dimensional Scaling (MDS)

Problem:

Given all pairwise distances {dij}Ni,j=1,

find vectors {x[i]}Ni=1 ∈ R

m such that

‖x[i] − x

[j] ‖ = dij , ∀i, j

i.e. go from pairwise distance to locationNotation

X =

x[1]

x[2]

x[N ]

......

......

......

......

∈ R

m×N

graphnet07 – p.17/42

Page 18: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

MDS Theory

Define (sym. pos. def.) A ∈ RN×N by

A(i,j) = -0.5*( Dsq(i,j) - mean(Dsq(i,:)) ...- mean(Dsq(:,j)) + mean(mean(Dsq)) );

Then XT X = A ⇒ ‖x[i] − x

[j] ‖ = dij

Symm. Real Schur Decomp. A = UT ΣU ⇒ use

X = Σ1

2 U =

√σ1u

[1] . . . . . . . . .√σ2u

[2] . . . . . . . . .. . . . . . . . . . . .√

σNu[N ] . . . . . . . . .

∈ RN×N

To embed into, say, R2 “best” approximation is

X = Σ1

2 U =

[ √σ1u

[1] . . . . . . . . .√σ2u

[2] . . . . . . . . .

]

graphnet07 – p.18/42

Page 19: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

MDS to reverse engineer a GRG?

PPI data is “0 or 1”, we don’t have Euclidean distances

Idea use pathlengthd2

ij = pathlength from node i to node j

Compute pathlengths 1, 2, . . . ,K, set the rest to Kmax, so

distance matrix is sparse plus rank 1

∞’s avoided

Now apply MDS to recover locations in R2

graphnet07 – p.19/42

Page 20: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

N = 100, ε = 0.25 (K = 4, Kmax

= 5)

Eigs of A: 38.2, 30.1, 10.7, 8.9, 6.1, 3.8, . . .

0 0.5 10

0.2

0.4

0.6

0.8

1Original Nodes

0 50 100

0

20

40

60

80

100

Adjacency Matrix

0 0.5 10

0.2

0.4

0.6

0.8

1Relocated Nodes

0 0.1 0.2 0.3 0.4 0.50

0.5

1

1.5

2

ε

|| W

− W

ε ||2 /

|| W

||2

graphnet07 – p.20/42

Page 21: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Same Example, “optimal” ε

0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

Correct Neg: 4091Correct Pos: 610False Neg: 102False Pos: 147

graphnet07 – p.21/42

Page 22: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Same Example, ROC curve

Area under curve is 0.965

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1S

ensi

tivity

: TP

/(T

P +

FN

)

1−Specificity: 1−[TN/(TN+FP)]

graphnet07 – p.22/42

Page 23: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

GRG data, coin flip to predict links

Area under curve is 0.48

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1S

ensi

tivity

: TP

/(T

P +

FN

)

1−Specificity: 1−[TN/(TN+FP)]

graphnet07 – p.23/42

Page 24: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Erdõs–Rényi Random Graph with MDS algorithm

Area under curve is 0.67

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1S

ensi

tivity

: TP

/(T

P +

FN

)

1−Specificity: 1−[TN/(TN+FP)]

graphnet07 – p.24/42

Page 25: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Nineteen PPI networks

YU YICUYIC YIP YHCY11K YK YD YM FH WS WC WE HV HSHHSMHSL HM HH0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1Values of the areas under the ROC curve for PPI networks

dim=2

dim=3

dim=4

High Confidence Von Mering et al. gave 0.89graphnet07 – p.25/42

Page 26: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Lock-and-Key Model

On the structure of protein-protein interactionnetworks, A. Thomas, R. Cannings, N. Monk, C. Cannings,Biochemical Society Transactions 31, 2003

Idea: two proteins interact because they ‘fit together’⇒ complementary domains, i.e. locks and keys

graphnet07 – p.26/42

Page 27: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Lock-and-Key Model

Thomas et al. model: m locks and m matching keys

let each protein have each lock and key withindependent probability p

put an edge between two proteins ⇔ they share at leastone lock/key pair

Thomas et al. looked at big picture issue:Does this model reproduce the almost scale free nature ofPPI networks?

Our approach:

introduce different modelling assumptions

develop an algorithm for inferring locks and keys

answer both big picture and fine detail questions

graphnet07 – p.27/42

Page 28: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Our AssumptionsThere exists a lock/key pair in the network such that anyprotein with this lock/key

does not have the matching key/lock

will only interact with a protein having the matchingkey/lock

only has a fixed proportion 0 ≤ θ ≤ 1 of its lock/keymatches recorded as interactions

⇒ the adjacency matrix has a pair of eigenvalues

λ = ±θ√

locksum × keysum

with eigenvectors√

keysum ind[lock] ±

√locksum ind

[key]

graphnet07 – p.28/42

Page 29: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Algorithm

Calculate eigenvals/vecs

Group into ≈ ±λ pairs

For each pair with eigvecs ua and ub

choose a threshold, K

|ua + ub|i ≥ K means protein i has lock

|ua − ub|i ≥ K means protein i has key

Successful at recovering locks and keys in syntheticallygenerated networks (good sensitivity and specificity)

graphnet07 – p.29/42

Page 30: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Spectral Properties: Uetz (2000) data>> [U,D] = eigs(W,8,’BE’);

>> diag(D)

ans =

-6.4614

-5.1460

-4.1557

-4.1270

4.3778

5.4309

5.8397

7.4096

200 400 600 800 1000−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8Sum and Difference

graphnet07 – p.30/42

Page 31: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Result for Uetz et al. (2000) yeast data

YFR024C

SLA1

YSC84

BZZ1

KTR3

YPL246C YPR171W

YKR030W

SNA3

YOR284W

YGR268C

APP1

SYS1

BOS1

VPS73

YMR253C

YLR064WYJR083C

YMR192W

ACF2

MMS4

RVS167

graphnet07 – p.31/42

Page 32: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Further Investigation . . .

Other biological data shows thatall five proteins in one group possess the SH3 domain

⇒ we have identified the key!

Recent experiments (Kessels & Qualmann 2004, Friesen etal. 2005) show that the SH3 domain is involved in traffickingof vesicles

All proteins in the other group are part of the actin corticalpatch assembly mechansim of vesicle endocytosis (Dreeset al. 2001)

[vesicle: small, enclosed compartment within a cell ]graphnet07 – p.32/42

Page 33: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Arabidopsis Thaliana (small flowering plant)

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

Homeobox Transcription Factor module?

graphnet07 – p.33/42

Page 34: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Saccharomyces Cerevisiae (yeast)

0 2 4 6 8 10 12 14 16 18 20

0

2

4

6

8

10

12

14

16

18

20

Protein Trafficking module?

graphnet07 – p.34/42

Page 35: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Homo Sapiens (us!)

0 1 2 3 4 5 6 7 8 9 10 11

0

1

2

3

4

5

6

7

8

9

10

11

Smad Transcription Factor module?

graphnet07 – p.35/42

Page 36: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Drosophila Melanogaster (fruit fly)

0 5 10 15 20 25 30 35 40 45 50

0

5

10

15

20

25

30

35

40

45

50

Cell Cycle Transcriptional Regulation module?

. . . plus many more . . .graphnet07 – p.36/42

Page 37: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Recap

Lock-and-key model: Extension of Thomas et al. (2003)model to

make testable predictions about PPI network structure

extract important structural information from (noisy)PPI data sets

Note: different to traditional clusteringEssentially clustering on paths of length two

Match local/global PPI properties? . . .

graphnet07 – p.37/42

Page 38: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Stickiness Model

Back to the big picture

Can we produce a model that matches PPI networkproperties?

Inferring number and distribution of locks and keys in a real(noisy) network: very challenging

Idea summarize abundance/popularity of binding domainsas a single number per protein: stickiness index

[Analogous idea of fitness in physics community]

graphnet07 – p.38/42

Page 39: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Modelling Assumptions

Assumption 1High degree implies many and/or popular binding domains:high stickinessSo high degree ⇒ high stickiness

Assumption 2A pair of proteins is more likely to interact (sharecomplementary binding domains) if they both have highstickiness indexTake the product of stickiness indices

Hence, we suppose P (i ↔ j) = f (degi) f(

degj

)

Match expected degree ⇒ f (degi) = degi√

P

N

k=1deg

k

graphnet07 – p.39/42

Page 40: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

PseudoCode

input {degi}Ni=1

output{wij}Ni,j=1

for i = 1 to N

θi = degi/√

∑Nj=1 degj

endInitialize all wij = 0for i = 1 to N − 1

for j = i + 1 to Ncompute a uniform (0, 1) sample, r

if r ≤ θiθj

wij = 1 and wji = 1end if

end forend for

graphnet07 – p.40/42

Page 41: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

Graphlet Frequency Comparison

0

50

100

150

200

250

YHC Y11K YIC YU YICU FE FH WE WC HS HG

Rel

ativ

e G

raph

let F

requ

ency

Dis

tanc

e

PPI Networks

Relative Graphlet Frequency Distances Between PPI and Model Networks

ERER-DD

SFGEO-3DSTICKY

graphnet07 – p.41/42

Page 42: Spectral Algorithms for Biological Networks · 2014. 6. 13. · Nataša Pržulj, U. C. Irvine Modelling protein-protein interaction networks via a stickiness index, J. Royal Society

What’s New?

Spectral algorithm for discovering bi-partite subgraphs(locks and keys)

Realistic results on PPI networks

Spectral algorithm for reverse engineering ageometric graph

Supports the claim that PPI networks have somegeometric structure

Simplified stickiness model gives excellent local andglobal fit to PPI data

with Alan Taylor:CONTEST (CONTrolable TEST matrices) for MATLAB athttp://www.maths.strath.ac.uk/research/groups/numerical_analysis/contest

graphnet07 – p.42/42