Download - External-Memory MST
External-Memory MST
(Arge, Brodal, Toma)
Minimum-Spanning Tree
• Given a weighted, undirected graph G=(V,E), the minimum-spanning tree (MST) problem is the problem of finding a spanning tree for G of minimum weight.
• Assumptions:
1. G is connected;
2. No two edges in G have the same weight.
External-Memory Graph Algorithms
• Standard two-level I/O model with a single disk:• N = V + E• M = number of vertices/edges that can fit into
internal memory.• B = number of vertices/edges per disk block.
• The graph is given as a list of edges sorted by vertex.
External-Memory Graph Algorithms (2)
• For MST and CC, randomize O(sort(E)) I/Os algorithms are known.
Prim’s Algorithm
a
b
d
c
e
f
1
5
6
4
37
92
8
Priority Queue: a b c d e f
b
1 5 6 3 6 7 2 8 7
a
4 7
c
7
d
b,a
a,c
c,d
d,e
a, fe
Prim’s Algorithm (2)
• Prim’s algorithm cannot be implemented efficiently in external memory:
• It is not guaranteed that even the priority queue alone fits in memory.
• Thus, we cannot in general get the current vertex priority without using an I/O.
• A direct implementation leads to an Ω(E) I/O algorithm.
Prim’s Algorithm (3)
a
b
d
c
e
f
1
5
6
4
37
92
8
Priority Queue:
b,a
a,c
c,d
d,e
a, f
Modification: store edges in the priority-queue instead of vertices.
b
b,a (1)
b,c (5)
b,d (6)
a
a,c (3)
b,c (5)
b,d (6)
a, f (7)
c
c,d (2) b,d (6)
c,b (5) a, f (7)
b,c (5) c,e (8)
d
d,e (4) b,d (6)
c,b (5) a, f (7)
b,c (5) c,e (8)
d,b (6)
c,b (5) a, f (7)
b,c (5) e,c (8)
b,d (6) c,e (8)
d,b (6) e, f (9)
e
b,d (6) e,c (8)
d,b (6) c,e (8)
a, f (7) e, f (9)
a, f (7)
e,c (8)
c,e (8)
e, f (9)
f
e,c (8)
c,e (8)
e, f (9)
f, e (9)
Any two edges have distinct weights
Modified Prim Algorithm
• The correctness follows directly from the correctness of the original algorithm (“blue rule” still applies).
• Efficiency:– At least one I/O per vertex in order to read its
adjacency list => O(V + E/B) I/Os.– O(E) operations on external priority queue can be
performed in O(sort(E)).– Thus in total we have O(V + sort(E)) I/Os.
Boruvka’s Algorithm
a
b
d
c
e
f
1
5
6
4
37
92
8
b,a
c,d
d,e
a, f
(1) Select for each vertex the minimum weight edge adjacent to it.
(2) Contract the graph and return to (1)
Boruvka’s Algorithm
abf
cde
3,5,6,9
b,a
a,c
c,d
d,e
a, f
(1) Select for each vertex the minimum weight edge adjacent to it.
(2) Contract the graph and return to (1)
External-Memory Boruvka’s Step
• For each vertex v, let C(v) be the lightest vertex adjacent to it.
• Let G’ be the graph obtained by taking only edges of the form (v, C(v)) for each v.
• Let G’d be the graph obtained by directing each edge (v, C(v)) in G’ from C(v) to v.
• The goal is to contract each connected component in G’ into a single vertex.
Unique Representatives
In each connected component of G’d:
• Each vertex has indegree 1.
•The weight of the edges along any root-leaf path is increasing.
• There is exactly one cycle, consisting of the minimal weight edge.
External-Memory Boruvka’s Step (2)
• The roots can be easily identified, and we can choose them to be the unique representatives of the components in G’.
• We would like to replace each edge (u, v) with an edge (ur, vr), where ur and vr are the unique representatives of the components containing u and v respectively.
• Then, we can remove parallel & self edges, and obtain the contracted graph.
External-Memory Boruvka’s Step (3)
a
b
d
c
e
f
1
5
6
4
3 7
92
8
GG’G’d
L:
(b,a) (1); (a, f) (7)
(c,d) (2); (d,e) (4)
(d,e) (4)
(a, f) (7)
Priority Queue:
a (1) [b]
d (2) [c]
Initialized with each vertex that is an immediate successor of a root vertex.
d (2) [c]
f (7) [b]
Output:
b → bc → ca → bd → c
f → be → c
e (4) [c]
f (7) [b]
External-Memory Boruvka’s Step (4)
To finish the contraction:
1. sort the output of the previous phase and E by the first component. Then scan the two lists simultaneously, replacing each edge (v, u) in E with (vr,u).
2. sort the output and E by the second component, and then scan the two lists replacing each edge (vr, u) in E with (vr, ur).
3. sort E by both components and by weight, and with a single scan remove duplicate & self edges.
Boruvka’s Step - I/O efficiency
1. Lightest incident edges can be collected in O(E/B) I/Os in a simple scan of the edge-list representation of G (we assume it is sorted).
2. Detection of cycles in G’d can be done in O(sort(V)) I/Os:
• sort the collected edges by weight and find duplicates in a single scan.
• remove edges to break cycles and identify unique representatives.
Boruvka’s Step - I/O efficiency (2)3. The list L contains each edge in G’d at most
twice, and can be constructed in O(sort(V)) I/Os:
• sort one instance of the list of edges by the second component.
• sort another instance by the first component.
• create the structure of L in a single scan and sort it by weight.
4. The PQ can be initialized in a similar way in O(sort(V)) I/Os.
Boruvka’s Step - I/O efficiency (3)5. We perform a total of V insertions to PQ, and V
extract-min operations. That can be performed in O(sort(V)) I/Os.
6. Replacing the edges of G with the unique representatives is done using a few sorting and scanning operations as described before. Here the entire edge list is sorted, and thus O(sort(E)) I/Os are needed.
Total:
O(E/B + sort(V) + sort(E)) = O(sort(E)) I/Os.
Results So Far
O(sort(E)·lg(V·B/E)) I/Os1. Contract G until V ≤ E/B
using Boruvka’s steps.
2. Run Prim on the result.
O(sort(E) · lgV) I/OsModified Boruvka
O(V + sort(E)) I/OsModified Prim
It is possible to perform lg(V·B/E) Boruvka’s steps using lglg(V·B/E) superphases requiring O(sort(E)) I/Os each.
Yet a better MST algorithmSuperphase Algorithm
At superphase i :
• Let Ni = 2(3/2)i (Ni+1= Ni·(Ni)1/2)
• Let Gi = (Vi, Ei) be the graph prior to superphase i.
• Let Ei‘ Ei be the set that for each vertex contains the √Ni lightest edges incident to it.
• Let the blocking value for a vertex be the weight of the √Ni + 1th lightest edge incident to it (or infinity if no such edge exists).
• Ei‘ and blocking values can be found with O(sort(Ei)) I/Os as described earlier.
Superphase Algorithm• At superphase i, perform on Gi‘ log√Ni contraction phases
as described before, but now select the lightest edge incident to a vertex only if it is smaller than its blocking value.
• After a single contraction, the blocking value of a supervertex is set to be the minimum of the blocking values of the contracted vertices.
• After that, the remaining edges of Ei‘ contain all edges of Ei adjacent to supervertex v with weight smaller than the blocking value of v.
• Thus only edges that actually belong to the MST are contracted.
Superphase Algorithm (2)But how many vertices remain after each superphase?
• The blocking value might prevents us from selecting an edge for v. But if so than:• The blocking value of v corresponds to the blocking value of
some vertex u in Vi, and v must contain the √Ni edges adjacent to u in Ei‘.
• Thus v must be the contraction of at least √Ni vertices from Vi
• If no blocking value prevents us from selecting an edge for v, then after log√Ni phases, v must be the contraction of at least 2log√Ni
= √Ni vertices.
Superphase Algorithm (3)• It can be proved by induction on i that Vi ≤ 2V / Ni :
• For i = 0, Ni = 2 and V0 = V.
• Vi+1 ≤ Vi / √Ni ≤ (2V / Ni) / √Ni = 2V / Ni+1
• Conclusion: Ei‘ ≤ Vi √Ni ≤ 2V / √Ni
• Thus, in order to reduce the number of vertices by a factor of √Ni we used so far:
O(sort(Ei) + sort(Ei‘) · log√Ni) =
O(sort(E) + sort(V / √Ni) · log√Ni) =
O(sort(E)) I/Os.
Superphase Algorithm (4)• In order to finish a superphase, we need to
reincorporate edges from Ei not selected to Ei‘:
• During the contraction phases, maintain a list C of the form (v, vs) for v Vi.
• Use the output of the Boruvka’s step, as described earlier, in order to update C:
• Sort C by second component and the output by first component and scan them simultaneously.
• This is done using O(sort(Vi)) I/Os.
• In total, in order to maintain C, we use:
O(sort(Vi)·log√Ni) = O(sort(V / Ni)·log√Ni) = O(sort(V)) I/Os.
Superphase Algorithm – I/O Efficiency
1. Ei‘ and blocking values are computed in O(sort(Ei)) I/Os.
2. Each superphase takes up O(sort(E)) I/Os.
3. Maintaining the list C during the superphase is done with O(sort(V)) I/Os.
4. Given C, the edges in (Ei \ Ei‘) can be reincorporated in O(sort(E)) as we did in the single contraction algorithm.
5. Finally, in order to reduce V to E/B, log3/2lg(V·B / E) superphases are needed.
6. Total: O(sort(E)·lglg(V·B / E)) I/Os.