preservation of protein-protein interaction networks ...dev.off(); the resulting plot is shown in...
TRANSCRIPT
![Page 1: Preservation of protein-protein interaction networks ...dev.off(); The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank and Z summary](https://reader034.vdocuments.mx/reader034/viewer/2022052011/6026ae551bba125955622216/html5/thumbnails/1.jpg)
Preservation of protein-protein interaction networks
Simple simulated example
Peter Langfelder and Steve Horvath
May 31, 2011
Contents
1 Overview 11.a Setting up the R session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Calculation of module preservation 2
3 Analysis of module preservation statistics 2
A Simulation of PPI networks 5
1 Overview
This document contains a simple illustration of the use of the function modulePreservation [1] to study the preser-vation of complexes in protein-protein interaction (PPI) networks. We simulate two PPI networks. Each networkcontains 10 complexes with sizes between 10 and 50 proteins. Five of the 10 complexes, labeled 1–5, are preservedbetween the two networks, while the other five complexes (labeled 6-10) are not preserved.We encourage readers unfamiliar with any of the functions used in this tutorial to type, in an active R session,
help(functionName)
(replace functionName with the actual name of the function) to get a detailed description of what the functions does,what the input arguments mean, and what is the output.
1.a Setting up the R session
After starting R we execute a few commands to set the working directory and load the requisite packages:
# Display the current working directory
getwd();
# If necessary, change the path below to the directory where the data files are stored.
# "." means current directory. On Windows use a forward slash / instead of the usual \.
workingDir = ".";
setwd(workingDir);
# Load the package
library(WGCNA);
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE);
1
![Page 2: Preservation of protein-protein interaction networks ...dev.off(); The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank and Z summary](https://reader034.vdocuments.mx/reader034/viewer/2022052011/6026ae551bba125955622216/html5/thumbnails/2.jpg)
2 Calculation of module preservation
We use simulated PPI networks that are generated using code provided in Appendix A. For simplicity, we simplyload the networks saved there.
load(file = "simulatedPPInetworks.RData");
The above command loads two object, PPInetwork1 and PPInetwork2. Each of them is a list with two components:the component adjacency contains the network adjacency matrix, and the component labels contains the module(or protein complex) labels. The modules are labeled by numbers 1–10. Proteins that are not part of any complexcarry the label 0. To get a basic idea of how big the network is, we can use
dim(PPInetwork1$adjacency)
which will tell us that the network contains 350 proteins. Also note that the columns of the adjacency matrix mustcarry protein names. In our example we named the simulated proteins simply "Protein.1"–"Protein.350":
colnames(PPInetwork1$adjacency)
Column names for the adjacency matrices are important because they allow the module preservation function tomatch proteins between reference and test networks – even though here we use the same proteins in the same order,in practice this may not be the case.We next create “multi-adjacency” and module “multi-labels”. These variables are lists with one component per dataset. In this example we study two data sets, a reference set (1) and a test set (2). Note that the components of thelist must be named. The names are used as identifiers for the data set.
multiAdj = list( network1 = list(data = PPInetwork1$adjacency),
network2 = list(data = PPInetwork2$adjacency));
multiLabels = list(network1 = PPInetwork1$labels);
We now call the modulePreservation function to calculate network module preservation statistics. This calculationmay take up to a few hours, depending on the available computational speed.
mp = modulePreservation(multiAdj, multiLabels, dataIsExpr = FALSE,
referenceNetworks = 1, restrictSummaryForGeneralNetworks = FALSE,
nPermutations = 100,
calculateCor.kIMall = FALSE, verbose = 3);
# Save the results
save(mp, file = "mp.RData");
We saved the results so the calculation only need to be run once. The results can be re-loaded using the followingcommand:
load(file = "mp.RData");
3 Analysis of module preservation statistics
We now isolate the medianRank and the Z statistics and plot them as a function of module size.
stats = cbind(medianRank = mp$preservation$observed[[1]][[2]]$medianRank.pres[-c(1,2)],
mp$preservation$Z[[1]][[2]][-c(1,2), -1]);
moduleSizes = mp$preservation$Z[[1]][[2]][-c(1,2), 1];
# Order rows by module label
order = order(as.numeric(rownames(stats)))
stats = stats[order, ]
moduleSizes = moduleSizes[order]
labels = as.numeric(rownames(stats))
# Indicate preserved modules by red color and non-preserved by black color
preserved = c(1:5);
2
![Page 3: Preservation of protein-protein interaction networks ...dev.off(); The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank and Z summary](https://reader034.vdocuments.mx/reader034/viewer/2022052011/6026ae551bba125955622216/html5/thumbnails/3.jpg)
presInd = match(preserved, labels);
presColor = rep(1, length(labels));
presColor[presInd] = 2;
# Open a suitably sized graphics window or, alternatively, open a pdf file to hold the plot
sizeGrWindow(10,7);
#pdf(file=spaste("Plots/PPIsimulation-halfPreserved"), wi=10, he=8)
# Set sectioning and margins
par(mfrow = c(3,4))
par(mar = c(3.2, 3.2, 2, 0.5))
par(mgp = c(2.0, 0.6, 0))
# Plot the individual statistics
for (s in 1:ncol(stats))
{
min = min(stats[, s], na.rm = TRUE);
max = max(stats[, s], na.rm = TRUE);
if (s > 1)
{
if (min > -max/5) min = -max/5;
} else {
tmp = min; min = max; max = tmp;
}
plot(moduleSizes, stats[, s], main = colnames(stats)[s],
ylab = colnames(stats)[s], type = "n", xlab = "Module size",
cex.main = 1, ylim = c(min, max))
text(moduleSizes, stats[, s], labels = labels, col = presColor);
box = par("usr");
if (s==1) legend(x = box[2], y = (max+min)/2, xjust = 1, yjust = 0.5,
legend = c("Preserved", "Non-preserved"), fill = c(2,1), cex = 0.8)
if (s>1)
{
abline(h=0)
abline(h=2, col = "blue", lty = 2);
abline(h=10, col = "darkgreen", lty = 2);
}
}
# If plotting into a file, close it.
dev.off();
The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank andZsummary work best at separating the preserved and non-preserved modules. While medianRank appears largelyindependent of module size, the Z statistics for preserved modules show a marked dependence on module size. Thisagrees with the intuition that it is more significant to observe a preservation of a pattern among 50 proteins thanamong 10 proteins.
3
![Page 4: Preservation of protein-protein interaction networks ...dev.off(); The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank and Z summary](https://reader034.vdocuments.mx/reader034/viewer/2022052011/6026ae551bba125955622216/html5/thumbnails/4.jpg)
10 20 30 40 50
86
42
medianRank
Module size
med
ianR
ank
1
2
3
45
6
7
8
9
10
PreservedNon−preserved
10 20 30 40 50
−5
515
25Zsummary
Module size
Zsu
mm
ary
12
34
56
78910
10 20 30 40 50
−10
1030
50
Zdensity
Module sizeZ
dens
ity
12
34
56
78
910
10 20 30 40 50
−2
02
46
8
Zconnectivity
Module size
Zco
nnec
tivity
1
2
3
4
5
67
89
10
10 20 30 40 50
02
46
Z.propVarExplained
Module size
Z.p
ropV
arE
xpla
ined 1
2
34
56
7
8
910
10 20 30 40 50
−10
1030
50
Z.meanKIM
Module size
Z.m
eanK
IM
12
34
56
78
910
10 20 30 40 50
−10
1030
50Z.meanAdj
Module size
Z.m
eanA
dj
12
34
56
78
910
10 20 30 40 50
−0.
50.
51.
52.
5
Z.meanClusterCoeff
Module sizeZ
.mea
nClu
ster
Coe
ff
1
23
45
67
89
10
10 20 30 40 50
−2
02
46
Z.cor.kIM
Module size
Z.c
or.k
IM
123
4
5
6
7
8
9
10
10 20 30 40 50
−2
02
46
8
Z.cor.kME
Module size
Z.c
or.k
ME
12
34
5
67
8
910
10 20 30 40 50
−2
02
46
8
Z.cor.adj
Module size
Z.c
or.a
dj
1
23
45
67
8
910
10 20 30 40 50
−1
01
23
Z.cor.clusterCoeff
Module size
Z.c
or.c
lust
erC
oeff
1
2
3
45
6
78
910
Figure 1: Module preservation statistics of simulated modules in this study. Each plot shows one of the preservationstatistics (indicated in the title) as a function of the module size. Modules are labeled by their numeric labels; redcolor denotes preserved and black non-preserved modules. The blue and green dashed lines denote the thresholdsZ = 2 and Z = 10. The statistics medianRank and Zsummary do the best job of distinguishing the preserved andnon-preserved modules in this study.
4
![Page 5: Preservation of protein-protein interaction networks ...dev.off(); The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank and Z summary](https://reader034.vdocuments.mx/reader034/viewer/2022052011/6026ae551bba125955622216/html5/thumbnails/5.jpg)
A Simulation of PPI networks
Here we generate the reference and test networks used in this tutorial. We start by defining two functions, one forsimulating a protein complex (a group of densely interconnected proteins), and for simulating a network consistingof several complexes.
simulateComplex = function(nProteins, minScaledK, maxScaledK)
{
k = seq(from = maxScaledK, to=minScaledK, length.out = nProteins) * nProteins;
K = sum(k);
adjacency = matrix(1, nProteins, nProteins);
pMat = matrix(NA, nProteins, nProteins)
for (i in 1:(nProteins-1))
for (j in (i+1):nProteins)
{
p = k[i]*k[j] / (K - (k[i] + k[j])/2);
if (p >1) p = 1;
pMat[i,j] = pMat[j,i] = p;
adjacency[i,j] = adjacency[j,i] = sample(c(0,1), size = 1, prob = c(1-p, p))
}
adjacency;
}
simulateProteinNetwork = function(
complexSizes, nSigletons,
minScaledK = 0.2, maxScaledK = 0.9,
propMissingLinks = 0,
propInterComplexLinks = 0)
{
nProteins = sum(complexSizes) + nSingletons;
adjacency = matrix(0, nProteins, nProteins);
diag(adjacency) = 1;
labels = rep(0, nProteins);
starts = c(1, cumsum(complexSizes)+1);
ends = c(cumsum(complexSizes), nProteins);
for (c in 1:nComplexes)
{
st = starts[c];
en = ends[c];
adj.complex = simulateComplex(complexSizes[c], minScaledK, maxScaledK);
adj.dst = as.dist(adj.complex);
leaveOut = sample(c(FALSE, TRUE), size = length(adj.dst),
prob = c(1-propMissingLinks, propMissingLinks),
replace = TRUE);
adj.dst[leaveOut] = 0;
adj.complex = as.matrix(adj.dst);
diag(adj.complex) = 1;
adjacency[st:en, st:en] = adj.complex;
labels[st:en] = c;
}
for (c1 in 1:(nComplexes+1))
{
if (c1 <= nComplexes)
{
c1x = c1 + 1
} else
5
![Page 6: Preservation of protein-protein interaction networks ...dev.off(); The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank and Z summary](https://reader034.vdocuments.mx/reader034/viewer/2022052011/6026ae551bba125955622216/html5/thumbnails/6.jpg)
c1x = c1;
for (c2 in c1x:(nComplexes + 1))
{
st1 = starts[c1];
en1 = ends[c1];
st2 = starts[c2];
en2 = ends[c2];
n1 = en1 - st1 + 1;
n2 = en2 - st2 + 1;
interAdj = sample(c(0, 1), size = n1*n2, prob = c(1-propInterComplexLinks, propInterComplexLinks),
replace = TRUE);
dim(interAdj) = c(n1, n2);
if (c1==c2)
{
interAdj = as.matrix(as.dist(interAdj));
diag(interAdj) = 1;
}
adjacency[st1:en1, st2:en2] = interAdj;
adjacency[st2:en2, st1:en1] = t(interAdj);
}
}
colnames(adjacency) = spaste("Protein.", c(1:nProteins));
rownames(adjacency) = spaste("Protein.", c(1:nProteins));
list(adjacency = adjacency, labels = labels);
}
We next define basic paramaters of the simulation.
nComplexes = 10;
nPreserved = 5;
preserved = c(1:nPreserved)
nNonPreserved = nComplexes - nPreserved;
nonPreserved = c(1:nComplexes)[-preserved];
complexSizes1 = seq(from = 50, to = 10, length.out = nPreserved);
complexSizes = rep(complexSizes1, 2);
nSingletons = 50;
We call the simulation function twice, to generate two separate networks with the same complex structure, but detailsof the connections within complexes differ a bit. For simplicity we do not simulate any connections between proteinsin different complexes although the above functions support it.
set.seed(10);
PPInetwork1 = simulateProteinNetwork(complexSizes, nSingletons);
PPInetwork2 = simulateProteinNetwork(complexSizes, nSingletons);
The networks can be visualized, for example, using the heatmap function:
sizeGrWindow(8,8);
#pdf(file = "Plots/networkImage.pdf", wi=8, he=8);
image(PPInetwork1$adjacency, xaxt = "none", yaxt = "none")
dev.off();
The plot is shown in Figure 2. The network image verifies that we have simulated 10 complexes of different sizes.We now permute the proteins in the non-preserved complexes in the test data set.
starts = c(1, cumsum(complexSizes)+1);
ends = cumsum(complexSizes);
scramble = starts[ min(nonPreserved)]:ends[max(nonPreserved)];
newOrder = sample(scramble);
PPInetwork2$adjacency[scramble, scramble] = PPInetwork2$adjacency[newOrder, newOrder];
6
![Page 7: Preservation of protein-protein interaction networks ...dev.off(); The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank and Z summary](https://reader034.vdocuments.mx/reader034/viewer/2022052011/6026ae551bba125955622216/html5/thumbnails/7.jpg)
Figure 2: Image of the simulated reference PPI network. Each row and column represents one protein; red color meansnot connected and white color means connected. Squares along the diagonal with dense connections correspond tosimulated complexes.
PPInetwork2$labels[scramble] = PPInetwork2$labels[newOrder];
Lastly, we save the networks for future use.
save(PPInetwork1, PPInetwork2, file = "simulatedPPInetworks.RData");
The resulting file is used as input at the start of this tutorial.
7
![Page 8: Preservation of protein-protein interaction networks ...dev.off(); The resulting plot is shown in Figure 1. We note that in this example the composite statistics medianRank and Z summary](https://reader034.vdocuments.mx/reader034/viewer/2022052011/6026ae551bba125955622216/html5/thumbnails/8.jpg)
References
[1] Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath. Is my network module preserved andreproducible? PLoS Comput Biol, 7(1):e1001057, 01 2011.
8