graphs and networks with bioconductor wolfgang huber embl/ebi bioconductor conference 2005 based on...
TRANSCRIPT
![Page 1: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/1.jpg)
Graphs and Networks with Bioconductor
Wolfgang Huber
EMBL/EBI
Bioconductor Conference 2005
Based on chapters from "Bioinformatics and Computational Biology Solutions using R and
Bioconductor", Gentleman, Carey, Huber, Irizarry, Dudoit, Springer Verlag.
![Page 2: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/2.jpg)
Graphs
Set of nodes and set of edges.
Nodes: objects of interest
Edges: relationships between them
A useful abstraction to talk about relationships and interactions (think of integer numbers, apples and fingers)
Edges may have weights, directions, types
![Page 3: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/3.jpg)
Practicalities
As always, need to distinguish between the true, underlying property of nature that you want to measure, and the actual result of a measurement (experiment)
1. False positive edges2. False negative edges (were tested, were not found, but are there in nature)3. Untested edges (were not tested, are not in your data, but are there in nature)
Uncertainty is not usually considered in mainstream graph theory, but cannot be ignored in functional genomics.
Nice application of these concepts to protein interactions: Gentleman and Scholtens, SAGMB 2004
![Page 4: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/4.jpg)
Representation
Node-edge listsAdjacency matrix (straightforward)Adjacency matrix (sparse)From-To matrix
They are equivalent, but may be hugely different in performance and convenience for different applications.
Can coerce between the representations
![Page 5: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/5.jpg)
Algorithms
Bioconductor project emphasizes re-use and interfacing to existing, well-tested software implementations rather than reimplementing everything from scratch ourselves.
RBGL package: interface to Boost Graph Library; started by V. Carey, R. Gentleman, now driven by Li Long.
![Page 6: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/6.jpg)
Example: a pathway
![Page 7: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/7.jpg)
Elementary computations on IMCA pathway
> library("graph")> data("integrinMediatedCellAdhesion")> class(IMCAGraph)> s = acc(IMCAGraph, "SOS")Ha-Ras Raf MEK 1 2 3 ERK MYLK MYO 4 5 6F-actin cell proliferation 7 5
![Page 8: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/8.jpg)
Machine-readable pathway databases
KEGG
reactome
BioCarta (biocarta.com)
National Cancer Institute cMAP
![Page 9: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/9.jpg)
Gene Ontology (GO)
A structed vocabulary to describe molecular function of gene products, biological processes, and cellular components.
Plus
A set of "is a", "is part of" relationships between these terms
Directed acyclic graph
![Page 10: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/10.jpg)
GO graphs
>tfG=GOGraph("GO:0003700", GOMFPARENTS)
![Page 11: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/11.jpg)
Gene-Literature graphs
DKC1
![Page 12: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/12.jpg)
Graphs: vocabulary
Directed, undirected graphsAdjacent nodesAccessible nodesSelf-loopMulti-edgeNode degreeWalk: alternating sequence of nodes and incident edgesClosed walkDistance between nodes, shortest walkTrail: walk with no repeated edgesPath: trail with no repeated nodes (except possibly first/last)CycleConnected graphWeakly connected directed graph (see next page)
![Page 13: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/13.jpg)
Strong and weak connectivity
![Page 14: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/14.jpg)
Graphs: vocabulary
Cut: remove edges to disconnect a graphCut-set: remove nodes - " -Connectivity of a graphCliques
![Page 15: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/15.jpg)
Special types of graphs
![Page 16: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/16.jpg)
Bipartite graph
![Page 17: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/17.jpg)
Bipartite graphs
AG adjacency matrix (n x m) of a bipartite graph G with node sets U, V
One mode graphs
AU = AGt AG
AV = AG AG
t
(Boolean algebra)
![Page 18: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/18.jpg)
Multigraphs
Can have different types of edges
![Page 19: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/19.jpg)
Hypergraphs
:= set of Nodes + set of hyperedges
A hyperedge is a set of nodes (can be more than 2)
A directed hyperedge: pair (tail and head) of sets of nodes
![Page 20: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/20.jpg)
Directed acyclic graphs
Useful for representing hierarchies, partial orderings (e.g. in time, from general to special, from cause to effect)
Many applications:GOMeSHGraphical models
![Page 21: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/21.jpg)
Random Edge Graphs
n nodes, m edges
p(i,j) = 1/m
with high probability:
m < n/2: many disconnected components
m > n/2: one giant connected component: size ~ n.
(next biggest: size ~ log(n)).
degrees of separation: log(n).
Erdös and Rényi 1960
![Page 22: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/22.jpg)
Random graphs
Random edge graph: randomEGraph(V, p, edges)V: nodeseither p: probability per edgeor edges: number of edges
Random graph with latent factor: randomGraph(V, M, p, weights=TRUE)V: nodesM: latent factorp: probabilityFor each node, generate a logical vector of length length(M), with P(TRUE)=p. Edges are between nodes that share >= 1 elements. Weights can be generated according to number of shared elements.
Random graph with predefined degree distribution:randomNodeGraph(nodeDegree)
nodeDegree: named integer vectorsum of all node degrees must be even
![Page 23: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/23.jpg)
Random edge graph
100 nodes 50 edges
degree distribution
![Page 24: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/24.jpg)
Random graphs versus permutation graphs
For statistical inference, one can consider null hypotheses based on aforementioned random graph models; and ones based on node permutation of data graphs.
The second is often more appropriate.
![Page 25: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/25.jpg)
Cohesive subgroups
For data graphs, the concept of clique is usually too restrictive (false negative or untested edges)
n-clique: distance between all members is <=n. (Clique: n=1)
k-plex: maximal subgraph G in which each member is neighbour of at least |G|-k others. (Clique: k=1)
k-core: maximal subgraph G in which each member is neighbour of at least k others. (Clique: k=|G|-1)
After: Social Network Analysis, Wasserman and Faust (1994)
![Page 26: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/26.jpg)
graph, RBGL, Rgraphviz
graph basic class definitions and functionality
RBGL interface to graph algorithms
Rgraphviz rendering functionality Different layout algorithms. Node plotting, line type, color etc. can be controlled by the user.
![Page 27: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/27.jpg)
Creating our first graph
> library("graph"); library(Rgraphviz)
> myNodes = c("s", "p", "q", "r")
> myEdges = list(s = list(edges = c("p", "q")), p = list(edges = c("p", "q")), q = list(edges = c("p", "r")), r = list(edges = c("s")))
> g = new("graphNEL", nodes = myNodes, edgeL = myEdges, edgemode = "directed")
> plot(g)
![Page 28: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/28.jpg)
Querying nodes, edges, degree
> nodes(g)[1] "s" "p" "q" "r"
> edges(g)$s[1] "p" "q"$p[1] "p" "q"$q[1] "p" "r"$r[1] "s"
> degree(g)$inDegrees p q r1 3 2 1$outDegrees p q r2 2 2 1
![Page 29: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/29.jpg)
Graph manipulation
> g1 <- addNode("e", g)
> g2 <- removeNode("d", g)
> ## addEdge(from, to, graph, weights)
> g3 <- addEdge("e", "a", g1, pi/2)
> ## removeEdge(from, to, graph)
> g4 <- removeEdge("e", "a", g3)
> identical(g4, g1)
[1] TRUE
![Page 30: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/30.jpg)
adjacent and accessible nodes
> adj(g, c("b", "c"))$b[1] "b" "c"$c[1] "b" "d"
> acc(g, c("b", "c"))$ba c d3 1 2
$ca b d2 1 1
![Page 31: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/31.jpg)
![Page 32: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/32.jpg)
Graph representations: from-to-matrix
> ft [,1] [,2][1,] 1 2[2,] 2 3[3,] 3 1[4,] 4 4
> ftM2adjM(ft) 1 2 3 41 0 1 0 02 0 0 1 03 1 0 0 04 0 0 0 1
![Page 33: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/33.jpg)
GXL: graph exchange language
<gxl> <graph edgemode="directed" id="G"> <node id="A"/> <node id="B"/> <node id="C"/> … <edge id="e1" from="A" to="C"> <attr name="weights"> <int>1</int> </attr> </edge> <edge id="e2" from="B" to="D"> <attr name="weights"> <int>1</int> </attr> </edge> …</graph></gxl>
from graph/GXL/kmstEx.gxl
GXL (www.gupro.de/GXL)
is "an XML sublanguage
designed to be a standard exchange format for graphs". The graph package
provides tools for im- and exporting
graphs as GXL
![Page 34: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/34.jpg)
RBGL: interface to the Boost Graph Library
Connected componentscc = connComp(rg) table(listLen(cc)) 1 2 3 4 15 18 36 7 3 2 1 1
Choose the largest componentwh = which.max(listLen(cc)) sg = subGraph(cc[[wh]], rg)
Depth first searchdfsres = dfs(sg, node = "N14")nodes(sg)[dfsres$discovered] [1] "N14" "N94" "N40" "N69" "N02" "N67" "N45" "N53" [9] "N28" "N46" "N51" "N64" "N07" "N19" "N37" "N35" [17] "N48" "N09"
rg
![Page 35: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/35.jpg)
depth / breadth first search
dfs(sg, "N14")bfs(sg, "N14")
![Page 36: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/36.jpg)
connected componentssc = strongComp(g2)
nattrs = makeNodeAttrs(g2, fillcolor="")
for(i in 1:length(sc)) nattrs$fillcolor[sc[[i]]] =
myColors[i]
plot(g2, "dot", nodeAttrs=nattrs)
wc = connComp(g2)
![Page 37: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/37.jpg)
shortest path algorithms
Different algorithms for different types of graphs o all edge weights the sameo positive edge weightso real numbers
…and different settings of the problemo single pairo single sourceo single destinationo all pairs
Functionsbfsdijkstra.spsp.betweenjohnson.all.pairs.sp
![Page 38: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/38.jpg)
shortest path
1
set.seed(123)rg2 = randomEGraph(nodeNames, edges = 100)fromNode = "N43"toNode = "N81"sp = sp.between(rg2,
fromNode, toNode)
sp[[1]]$path [1] "N43" "N08" "N88" [4] "N73" "N50" "N89" [7] "N64" "N93" "N32" [10] "N12" "N81"
sp[[1]]$length [1] 10
![Page 39: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/39.jpg)
shortest path
ap = johnson.all.pairs.sp(rg2)hist(ap)
![Page 40: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/40.jpg)
minimal spanning tree
mst = mstree.kruskal(gr)gr
![Page 41: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/41.jpg)
connectivity
Consider graph g with single connected component.Edge connectivity of g: minimum number of edges in g that can be cut to produce a graph with two components. Minimum disconnecting set: the set of edges in this cut.
> edgeConnectivity(g)$connectivity[1] 2
$minDisconSet$minDisconSet[[1]][1] "D" "E"
$minDisconSet[[2]][1] "D" "H"
![Page 42: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/42.jpg)
![Page 43: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/43.jpg)
Rgraphviz: the different layout engines
dot: directed graphs. Works best on DAGs and other graphs that can be drawn as hierarchies.
neato: undirected graphs using ’spring’ models
twopi: radial layout. One node (‘root’) chosen as the center. Remaining nodes on a sequence of concentric circles about the origin, with radial distance proportional to graph distance. Root can be specified or chosen heuristically.
![Page 44: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/44.jpg)
Rgraphviz: the different layout engines
![Page 45: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/45.jpg)
Rgraphviz: the different layout engines
![Page 46: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/46.jpg)
Combining R graphics and graphviz: custom node drawing functions
![Page 47: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/47.jpg)
Combining: graphviz layout and R plot
![Page 48: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/48.jpg)
ImageMap
lg = agopen(g, …)
imageMap(lg, con=file("imca-frame1.html", open="w") tags= list(HREF = href, TITLE = title, TARGET = rep("frame2", length(AgNode(nag)))), imgname=fpng, width=imw, height=imh)
Show drosophila interaction network example
![Page 49: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/49.jpg)
Application: comparing gene co-expression and protein interaction data
Nodes: all yeast genes
Graph 1: co-expression clusters from yeast cell cycle microarray time course
Graph 2: protein interactions reported in the literature
Graph 3: protein interactions found in a yeast-two-hybrid experiment
Questions:Do the graphs overlap more than random? Is there anything special about overlapping edges?
![Page 50: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/50.jpg)
Application: comparing gene co-expression and protein interaction data
![Page 51: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/51.jpg)
Application: comparing gene co-expression and protein interaction data
nPdist: number of common edges as computed by a node label per-mutation model.
Number observed in data: 42
![Page 52: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/52.jpg)
Further questions for exploratory data analysis
• Which expression clusters have intersections with which of the literature clusters?
• Are known cell-cycle regulated protein complexes indeed clustered together in both graphs?
• Are there expression clusters that have a number of literature cluster edges going between them suggesting that expression clustering was too fine, or that literature clusters are not cell-cycle regulated.
• Is the expression behavior of genes that are involved in multiple protein complexes different from that of genes that are involved in only one complex?
![Page 53: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/53.jpg)
Generalization
Nothing in the preceding treatment was specific to physical protein interactions or microarray clustering. Can you similar reasoning for many other graphs! - e.g. genomic vicinity, domain composition similarity
![Page 54: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/54.jpg)
Application: Using GO to interprete gene lists
![Page 55: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/55.jpg)
Using GO to interprete gene lists
Packages: Gostats, Rgraphviz
![Page 56: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/56.jpg)
Using GO to interprete gene lists
![Page 57: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/57.jpg)
Gene-Literature graphs
DKC1
![Page 58: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/58.jpg)
The bipartite gene-literature graph: actor and event size adjustment
actors: genesactor size: number of papers that a gene appears inevent: paperevent size: number of genes that appear in a paper
Example: R. Strausberg et al. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. PNAS 99:16899–903, 2002
cites 15,000 genes
![Page 59: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/59.jpg)
Are two genes remarkably often co-cited?
Note, usually one count (w.l.o.g. n22) is much larger than everybody else. Test statistics that do not depend on n22:
![Page 60: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/60.jpg)
Closing gene lists with literature
Boundary of gene list L: set of all genes that have co-citation (above threshold weight) with genes in L.
Gene 1
Gene X
Gene 2
Gene Y
Gene 4
Gene 5
Gene 3
![Page 61: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/61.jpg)
A pathway graph
![Page 62: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/62.jpg)
A pathway graph
![Page 63: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/63.jpg)
CGH aberration data
From: B. Gunawan et al., Cancer Res. 63: 6200-6205 (2003)Tumours
Gen
etic aberratio
ns
![Page 64: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/64.jpg)
oncotree package by Anja von Heydebreck
Graphical model for CGH aberration data
![Page 65: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/65.jpg)
Summary
Graphs are a natural way to represent relationships, just as numbers are a natural way to represent quantities.
Three main applications: (1) to represent data (e.g. PPI)(2) to represent knowledge (e.g. GO)(3) to represent high-dimensional probability distributions
Bioconductor provides a rich set of tools mainly for (1) and (2). Various parts of R for (3), see also gR project.
There are still many challenges that call for methods to model uncertainty, make inference, and predictions.
![Page 66: Graphs and Networks with Bioconductor Wolfgang Huber EMBL/EBI Bioconductor Conference 2005 Based on chapters from "Bioinformatics and Computational Biology](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56649e795503460f94b79367/html5/thumbnails/66.jpg)
Further exercises
Fine control of graph rendering
GOstats example