an efficient parallel algorithm for the row minima of a
TRANSCRIPT
![Page 1: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/1.jpg)
Purdue University Purdue University
Purdue e-Pubs Purdue e-Pubs
Department of Computer Science Technical Reports Department of Computer Science
1990
An Efficient Parallel Algorithm for the Row Minima of a Totally An Efficient Parallel Algorithm for the Row Minima of a Totally
Monotone Matrix Monotone Matrix
Mikhail J. Atallah Purdue University, [email protected]
S. Rao Kosaraju
Report Number: 90-959
Atallah, Mikhail J. and Kosaraju, S. Rao, "An Efficient Parallel Algorithm for the Row Minima of a Totally Monotone Matrix" (1990). Department of Computer Science Technical Reports. Paper 813. https://docs.lib.purdue.edu/cstech/813
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information.
![Page 2: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/2.jpg)
AN EFFICIENT PARALLEL ALGORITHMFOR THE ROW MINIMA OF A
TOTALLY MONOTONE MATRlX
Mikhail J. AtaJJahS. Rna Kosaraju
CSD-TR-959February 1990
(Revised April 1991)
![Page 3: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/3.jpg)
An Efficient Parallel Algorithm for the Row Minimaof a Totally Monotone Matrix
Mikhail J. AtaUah"Purdue University
S. Rae Kosarajut
Johns Hopkins University
Abstract
We give a parallel algorithm for the problem of computing the row minima of atotally monotone two-dimensional matrix. Whereas the previous best CREW-PRAMalgorithm for this problem ran in O(log n log log n) time with O(n/ log log n) processors,our algorithm runs in O(log n) time with O(n) processors in the (weaker) EREW·PRAMmodel. Thus we simultaneously improve the time complexity without any deteriorationin the time x processors produd, even using a weaker model of parallel computation.
·Dept. of Computer Science, Purdue University, Weat La£ayette, IN 47907. This author's research wassupported by the Office of NavaJ Research under Contracts N00014·84-K-0502 and N00014-86-K-0689, theAir Force Office of Scientific Re~earch under Grant AFOSR-90-0107, Lhe National Science Foundation underGrant DCR-8451393, and the National Library of Medicine under Grant R01·LM05118.
IDept. of Computer Science, Johns Hopkins University, Baltimore, MD 21218. This author's researchwas supported by the National Science Foundation under Grant CCR-8804284 and NSF/DARPA GrantCCR·890a092.
1
![Page 4: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/4.jpg)
1 Introduction
First we review the problem, which was introduced and solved in linear sequential time in
[2]. It has myriads of applications to geometric and combinatorial problems [1,2].
For any m X n matrix A, the row minima are an m-vector 0A such that, for every row
index T (1 ::; r ::; m), OA(r) is the smallest column index c that minimizes A(r,c) (that
is, among all c's that minimize A(r,c), BA(r) is the smallest). If 0A satisfies the following
sorted property:
and if for every submatrix A' of A, (JA' also satisfies the sorted property, then matrix A is
said to be totally monotone [1, 2]. (Note: this definition may appear different from that
given in [1,2]. but it is easily seen to be equivalent, by negating all the entries of the matrix.)
Given a totally monotone matrix A, the problem of computing the (}A array is known as
that of "computing the row minima of a totally monotone matrix" [I}. The best previous
CREW-PRAM algorithm for this problem ran in O(logn log log n) time and O(nj loglogn)
processors [1] (where m = n). The main result of our paper is an EREW~PRAM algorithm
of time complexity O(logn) and processor complexity O(n). This improves on the time
complexity of [1] without any increase in the time x processors product, and using a weaker
parallel model. It also implies corresponding improvements on the parallel complexities of
the many applications of this problem (which we refrain from listing-they can be found in
[1,2]). More specifically, our bounds are as follows.
Theorem 1.1 The row minima (that is, the arTay (}A) of an m x n totally monotone matrix
A can be computed in O(logm+log n) time with O(m+ n) processors in the EREW-PRAM
model.
In fact we prove a somewhat stronger result: that an implicit description of (}A can be
computed, within the same time bound as in the above theorem, by O(n) processors. From
this implicit description, a single processor can obtain any particular (}A(r) value in O(1ogn)
time.
Section 3 gives a preliminary result that is a crucial ingredient of the scheme of Section 4:
an O(ma.x(n,logm)) time algorithm using m.in(n,logm) processors in the EREW-PRAM
model. Section 4, which contains the heart of our CREW method, uses a new kind of
sampling and pipelining, where samples are evenly spaced (and progressively finer) clusters
2
![Page 5: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/5.jpg)
of elements and where the "help" for computing the information at a node does not come
only from its children (as it did in [5]) but also from some of its subtree's leaves. Section 5
transforms the CREW algorithm of Section 4 into an EREW algorithm. This is not achieved
by using known methods such as [6], but rather relies on (i) storing each leaf solution in a
suitable parallel data structure, and (ii) re-defining the nature of the information stored at
the internal nodes.
Recall that the EREW-PRAM is the parallel model where the processors operate syn
chronously and share a common memory, but no two of them are allowed simultaneous
access to a memory cell (whether the access is for reading or for writing in that cell). The
CREW-PRAM differs from the EREW·PRAM in that simultaneous reading is allowed (but
simulateous writing is still forbidden).
2 Preliminaries
In this section we introduce some notation, terminology, and conventions.
Since the matrix A is understood, we henceforth use 0 as a shorthand for 0A. Through
out, R will denote the set of m row indices of A, and C will denote its n column indices.
To avoid cluttering the exposition, we assume that m and n are powers of two (our scheme
can easily be modified for the general case).
An interval of rows or columns is a non-empty set of contiguous (row or column) indices
[i,j] = {i, i +1,' .. ,j}. We imagine row indices to lie on a horizontal line, so that a row is
to the left of another row if and only if it has a smaller index (similarly "left of" is defined
for columns). We say that interval II is to the left of interval 12 , and 12 is to the right of II,
if the largest index of II is smaller than the smallest index of h. (Note: for rows, it might
seem more appropriate to say "above" rather than "to the left of' , but the latter js in fact
quite appropriate because we shall later map the rows to the leaves of a tree.)
Let I be a column interval, and let AI be the m X III submatrix of A consisting of the
columns of A in I. We use Of as a shorthand for OAr' That is,jfr is arowjndex, then Of(r)
denotes the smallest c E I for which A(r,c) is minimized. Note that Of(r) usually differs
from O(r), since we are minimizing only over I rather than C.
Throughout the paper, instead of storing Of directly, we shall instead store a function
1rf which is an implicit description of 0[.
Definition 2.1 For any column interval! and any column index c, 1r[(c) is the row interval
3
![Page 6: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/6.jpg)
such that, for every row index T in that interval, we have t1[(r) = c; 1I"[(c) is empty if no
such r exists.
Note that the monotonicity of A implies that, if Cl < C2, then 'Il"[(ct} is to the left of
IT/(C2).
Note that each 'Il"/(c) can be stored in 0(1) space, since we need only store the beginning
and end of that row interval. Throughout the paper, we shall use 11"/ as an implicit descrip
tion of 8/. The advantage of doing so is that we use 0(1/1) storage instead of the O(lRI)
that would be needed for explicitly storing 8/. The disadvantage is that , given a row index
T, a processor needs to binary search in the 'Il"/ array for the position of row index r in order
to determine 6'/(r). Had we stored directly e[, 6'[(r) would be readily available in constant
time. From now on, we consider our problem to be that of computing the 1I"c array. Once
we have 1I"C, it is easy to do a postprocessing computation that obtains (explicitly) 9 from
1I"c with m processors: each column c gets assigned 111"c(c) I processors which set 9(r) = c
for every T E 1I"c(c). Therefore Theorem 1.1 would easily follow if we can establish the
following.
Theorem 2.1 1I"c can be computed in O(logm + logn) time and O(n) processors in the
EREW-PRAM model.
The rest of this paper proves the above theorem.
Definition 2.2 The 8-sample of the set R of row indices is obtained by choosing every 8-th
element of R (i.e., every row index which is a multiple of s). For example, the 4-sample of
R is (4,8,··· ,m). For k E [O,logm], let R, denote the (m/2')-sample of R.
For e.xample, Ro = (m),
R, = (m/8,m/4,3m/8,m/2,5m/8,3m/4, 7m/8,m),
and R!ogm = (1,2,"',m) = R.
Nole lhal IR,I = 2' = 2IR,_d.
3 A log m Processor Algorithm
This section gives an algorithm which is needed as an important ingredient in the algorithm
of the next section. It has the feature that its complexity bounds depend on the number of
columns in a stronger way than on the number of rows.
4
![Page 7: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/7.jpg)
Lemma 3.1 The trc array can be computed in O(max(n,logm)) time with min(n,logm)
processors in ihe EREW-PRAM model.
The bounds of the above lemma might look unappealing at first sight, but their signifi
cance lies in the fact that m can be much larger than n. In fact we shall use this lemma, in
the next section, on problems of size m X (log n). The rest of this sedion proves the lemma.
Simple-minded approaches like "use one processor to binary search for trc(c) in parallel
for each c E G" do not work, the difficulty being that we do not know which trc{c)'s are
empty. In fact, if we knew which trc(c)'s are empty then we could easily achieve O(log m)
time with n processors (by using the above-mentioned straightforward binary search-e.g.,
binary search for the right endpoint of trc( c) by doing log m comparisons involving the two
columns c and c' , where d is the nearest column to the right of c having a nonempty trc(c')).
We shall compute the trc array by computing two arrays,LeftExtend and RightExtend,
whose significance is as follows.
Definition 3.1 For any column c, let LeftExtend(c) be the left endpoint of row interval
tr[l,c](C). That is, LeftExtend(c) is the minimum row index r such that, for any d < c,
A(r,c) < A(r,c'). Let RightExtend(c) be ihe right endpoint of row interval trlc,n)(c). That
is, RightExtend(c) is the maximum row index r such that, for any d > c, A(r, c) ~ A(r, c').
Intuitively, LeftExtend(c) measures how far to the left (in R) column c can "extend its
influence ll if the only competition to it came from columns to its left. The intuition for
RightExtend(c) is analogous, with the roles of "left" and "right" being interchanged. Note
that LeftExtend(c) (resp., RightExtend(c)) might be undefined, which we denote by set·
ting it equal to the nonexistent row m + 1 (resp., 0).
Once we have the RightExtend and LeftExtend arrays, it is easy to obtain the trc array,
as follows. If either LeftExtend(c) =m+ 1 or RightExtend(c) =0 then obviously trc( c) is
empty. Otherwise we distinguish two cases: (i) if RightExtend(c) < LeftExtend(c) then
trc(c) is empty, and (ii) if LeftExtend(c) ~ RightExtend(c) then interval trc(c) is not
empty and has LeftExtend(c) and RightExtend(c) as its two endpoints. Hence it suffices
to compute the RightExtend and LeftExtend arrays. The rest of this section explains how
to compute the LeftExtend array (the computation of RightExtend is symmetrical and is
therefore omitted). Furthermore, for the sake of definiteness, we shall describe our scheme
assuming n ~ log m (it will be easy for the reader to see that it also works if n < log m).
5
![Page 8: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/8.jpg)
Thus we have min{n,logm} = logm processors and wish to achieve O(max{n,logm}) =
O(n) time performance. We shall show how to use the logm processors to compute the
LeftExtend array in n + logm (= O(n)) time steps. To simplify the presentation we
assume, without loss of generality, that A(m, 1) > A(m,2) > ... > A(m,n) (one can always
add a "dummy" last row to A in order to make this hold-obviously this does not destroy
the monotonicity of A). Th.is assumption simplifies the presentation because it causes every
LeftExtend(c) to be ~ m (i.e., it is defined).
We first give a rough overview of our scheme. Imagine that the row indices Rare
organized as a complete binary search tree TR: the leaves contain R sorted by increalling
order, and each internal node 11 contains the row index r of the largest leaf in the subtree of
11 's left child (in which Calle we can simply refer to 11 as "internal node r" rather than the more
cumbersome "the internal node that contains T"). Note that a row index T < m appears
exactly twice in TR: once at a leaf, and once at an internal node (m appears only once, as
the rightmost leaf). Having only logm processors, we clearly cannot afford to build all of
Tn. Instead, we shall build a portion of it, starting from the root and expanding downwards
along n root-to-Ieaf paths PI,'" I Pn. Path Pc is in charge of computing LeftExtend(c),
and does so by performing a binary search for it as it traces a root-to-leaf path in the binary
search tree Tn. If path Pc exits at leaf T then LeftExtend(c) = T. The tracing of all the
Pi'S is done in a total of n+logm time steps. Path Pc is inactive until time step c, at which
time it gets assigned one processor and begins at the root, and at each subsequent step it
makes one move down the tree, until it exits at some leaf at step c + log m. Clearly there
are at most logm paths that are simultaneously active, so that we use logm processors.
At time step n + logm the last path (PI1 ) exits a leaf and the computation terminates. If,
at a certain time step, path Pc wants to go down to a node of Tn not traced earlier by a
PCI (c' < c), then its processor builds that node of Tn (we use a pointer representation for
the traced portion of Tn- we must avoid indexing since we cannot afford using m space).
During path Pc's root-to-Ieaf trip, we shall maintain the property that, when Pc is at node
r of Tn, LeftExtend(c) is guaranteed to be one of the leaves in r's subtree (this property
is obviously true when Pc is at the root, and we shall soon show how to maintainlt as Pc
goes from a node to one of its children). It is because of tltis property that the completion
of a Pc (when it exits from a leaf of Tn) corresponds to the end of the computation of
LeftExtend(c).
6
![Page 9: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/9.jpg)
The above overview implies that at each time step, the lowermost nodes of the logm
active paths are "staggered" along the levels of TR in that they are at levels 1,2,· .. ,log m
respectively. Hence at each time step, at most one processor is active at each level of Tn,
and the computation of exactly one LeftExtend(e) gets completed (at one of the leaves).
We have omitted a crucial detail in the above overview: what additional information
should be stored in the traced portion of Tn in order to aid the downward tracing of each
Pc (such information is needed for Pc to know whether to branch left or right on its trip
down). The idea is for each path to leave behind it a trail of information that will help
subsequent paths. Note that LeftExtend(e) depends upon columns l,2,···,c -1 and is
independent of columns c+ 1,"',n. Before specifying the additional information, we need
some definitions.
Definition 3.2 Let c and r! be column indices, r be a row index. We say thai c is better
than c' for T (denoted by c <r c') iff one of the following holds: (i) A(r,c) < A(r,d), or
(ii) A(T,C) = A(T,c') and c < c'.
Note that for any columns c,f! and row T, we must have either c <r c: or c' <r c.
When the algorithm compares A(r,e) to A(r,e') in order to determine whether C <r e' or
e' <r c, it is useful if the reader thinks of such a comparison as a competition between c
and c' for r: e wins the competition over e' if the outcome is e <r c', otherwise it loses T to
c'. When a path Pc is at an internal node r, it competes for r with the best column for r
among the columns in [l,c -1] (that is, it competes with 8[1,c_lj(r». If c beats 8[l,c_Ij(r)
in this competition for r, then Pc obviously branches down to the left ehild of r in Tn (its
LeftExtend(c) is certainly not greater than r). Otherwise it branches to the right child of
T. However, the above assumes that the 8[1.c_lj(r) values are available when needed. We
must now make sure that, when Pc enters node r, it can easily (i.e., in constant time) obtain
8[1,c_lj(r). Note that we can neither maintain the needed 8[1,C_I)(r) at r, nor can we carry
it down with Pc on its downward trip (it is not hard to see that either one of these two
approaches runs into trouble). Instead, we shall use a judicious combination of both: some
information is maintained locally in r, some is carried along by Pc. When Pc enters r, Pc
combines the information it is carrying, with the information in T, to obtain in constant
time 8[l,c_l](T). This is made more precise below.
Each internal node r' that has been already visited by a path contains, in a register
label(r'), the best c' for T' among the subset of columns whose path went through r' (hence
7
![Page 10: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/10.jpg)
label( r') is empty if no path visited r' so far).
When Pc enters internal node r of TR' it carries with it, in a register rival(c), the largest
cl such that e' < c and Le/tExtend(c' ) is smaller than the leftmost leaf in the subtree of r
(hence rival(c) is empty when Pc is at the root).
Note that the current rival(e) and label(r) allow Pc to obtain 8[1,e_1j(r) in constant time,
as follows. Recall that 8[1,e_1j(r) is the best for r (Le., smallest under the <r relationship)
among the columns in [1, e -1]. Now, view [1, e - 1] as being partitioned into three subsets:
the subset Sl (resp., S3) consisting of the columns c! whose Le/tExtend(c') is smaller
(resp., larger) than the leftmost (resp., rightmost) leaf in the subtree of r, and the subset
52 of the columns cl whose Le/tExtend(c!) is in the subtree of r. Now, no column in 53
can be 8(1,e_1j(r) (by the definition of S3). The best for r in S2 is label(r) (by definition).
The best for r in 51 is rival(c), by the following argument. Suppose to the contrary
that there is a c" E Sl, c" < rival(c), such that e" is better for r than rival(c). A
contradiction with the monotonicity of A is obtained by observing that we now have: (i)
elf < rival(c), (ii) Le/tExtend(rival(c)) < r, and (iii) c" is better than rival(e) for r but not
for Le/tExtend(rival(c)). Hence rival(e) must be the best for r in 51. Hence 8(1,e_lj(r)
is one of {rival(c),label(r)}, which can be obtained in constant time from rival(c) and
label(r), both of which are available (Pe carried rival(c) down with it when it entered r,
and r itself maintained label(r) during the previous time step).
The main problem that remains is how to update, in constant time, the rival(c) and the
label(r) registers when Pe goes from r down to one of r's two children. We explain below
how Pc updates its rival(c) register, and how r updates itslabel(r) register. (We need not
worry about updating the label( r') of a row r' not currently being visited by a Pc" since
such it label(r') remains by definition unchanged.)
By its very definition, label(r) depends on all the paths Pc' (c' < c) that previously
went through r. Sjnce Pe has just visited r, we need to make c compete, for row r, with the
previous value of label(r): jf c wjns then label(r) becomes c, otherwise it remains unchanged.
The updating of rival(c) depends upon one of the following two cases.
The first case is when c won the compeHtion at r, i.e., Pe has moved to the left child
of r (call it rl). In that case by its very definition rival(c) remains unchanged (since the
leftmost leaf in the subtree of r' is the same as the leftmost leaf in the subtree of r).
The second case is when c lost the competition at r, i.e., Pc has moved to the right child
of r (call it r"). In that case we claim that it suffices to compare the old label(r) to the
8
![Page 11: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/11.jpg)
old rival(c): the one which is better for r" is the new value of rival(c). We now prove the
claim. Let rl (resp., r2) be the leftmost leaf in the subtree of r (resp., r"). Let C' (resp'l Gil)
be the set of paths consisting of the columns (J such that (J < c and LeftExtend(fJ) < rl
(resp., LeftExtend(fJ) < f2). By definition, the old (resp., new) value of rival(c) is the
largest column index in C 1 (resp., Gil). The claim would follow if we can prove that the
largest column index in Gil - G' is the old label(r) (by "old label(r)" we mean its value
before updating, Le., its value when Pc first entered f). We prove this by contradiction:
let c denote the old labe.l(r), and suppose that there is a '/ E Gil - G' such that '/ > c.Since both '/ and care in Gil - C1
, their respective paths went from r to the left child of r.
However, P"f did so laler than Pc (because '/ > c). This in turn implies that '/ is better for
r than c, contradicting the fact that cis the old label(r). This proves the claim.
Concerning the implementation of the above scheme, the assignment of processors to
their tasks is trivial: each active Pc carries with it its own processor, and when it exlts from
a leaf it releases that processor which gets assigned to Pc+logm which is just beginning at
the root of TR.
4 The Tree-Based Algorithm
This section builds on the algorithm of the previous section and establishes the CREW
version of Theorem 2.1 (the next section will extend it to EREW). The sampling and
iterative refinement methods used in this section are reminiscent of the cascading divide
and-conquer technique [5, 4], and are similar to those used in [3] for solving in parallel the
string editing problem. In fact, although [3] deals with a different problem, it implicitly
contains a solution to our problem whose complexity is also O(1ogn) time but with a
disappointing O(nlogn) processors. As in [3], it is useful to think of the computation
as progressing through the nodes of a tree T which we now proceed to define (it differs
substantially from [3)).
Partition the column indices into nJ log n adjacent intervals II,' .. ,!n/logn of size log n
each. Call each such interval Ii a fat column. Imagine a complete binary tree T on top
of these fat columns, and associate with each node v of this tree a fat interval I(v) (i.e.,
an interval of fat columns) in the following way: the fat interval associated with a leaf is
simply the fat column corresponding to it, and the fat interval associated with an internal
node is the union of the two fat intervals of its children. Thus a node v at height h has a fat
9
![Page 12: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/12.jpg)
interval I(v) consisting of II(v)1 = 2h fat columns. The storage representation we use for
a fat interval I(v) is a list containing the indices of the fat columns in it; we also call that
list I(v), in order to avoid introducing extra notation. For example, if v is the left child of
the root, then the I(v) array contains (1,2,···,nj(2Iogn». Observe that LueTII(v)1 =
O(ITllogITI) = O«n/logn) log n) = O(n).
The ultimate goal is to compute 1I"0(c) for every c E C.
Let leaf problem Ii be the problem of computing 1I"r,(c) for all c E Ii. Thus a leaf problem
is a subproblem of size m X logn. From Lemma 3.1 it follows that a leaf problem can be
solved in O(log n + log m) time by min{log n,log m} (~ log n) processors. Since there are
njlogn leaf problems, they can be solved in O(logn + logm) time by n processors. We
assume that this has already been done, i.e., that we know the 1I"Ii array for each leaf
problem Ii. The rest of this section shows that an additional O(logn + logm) time with
O(n) processors is enough for obtaining 11"0·
Definition 4.1 Let J(v) be the interval of original columns tllat belong to fat intervals in
l(v) (hence IJ(v)1 = II(v)1 ·logn ). Fo' every vET, lot column I E ltv), and subset
R' of R, let ¢u(R',j) be the internal in R' such that, for every r in that interval, 8J(u)(r)
is a column in fat column f. We use ''¢u(R' ,.)" as a shorthand for 'VJv(R',j) for all
IE l(v))".
We henceforth focus on the computation of the ¢roo!(T)(R,.) array, where root(T) is the
root node of T. Once we have the array ¢root(T)(R,*), it is easy to compute the required
11"0 array within the prescribed complexity bounds: for each fat column f, we replace the
¢root(T)(R,j) row interval by its intersection with the row intervals in the'1rIJ array (which
are already available at the leaf If of T). The rest of this section proves the following
lemma.
Lemma 4.1 tPu(R, *) for every VET can be computed by a CREW·PRAM in timeO(height(T)+
log m) and O(n) processors, where height(T) is the height of T (= O(log n)).
Proof. Since LveT(II(v )I+log n) = O(n), we have enough processors to assign 11(v)l+log n
of them to each vET. The computation proceeds in log m +height(T) - 1 stages, each
of which takes constant time. Each vET will compute Wu(R' ,*) for progressively larger
subsets R' of R, subsets R' that double in size from one stage to the next of the computation.
10
![Page 13: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/13.jpg)
We now state precisely what these subsets are. Recall that 14 denotes the (m/2i )-sample
of R, 00 that 1&1 = 2'.
At the t-th stage of the algorithm, a node v of height h in T will use its II(v)1 + log n
processors to compute, in constant time, ¢tJ(RI_h, *) if h $ t $ h+log m. It does 50 with the
help of information from ¢t/(R1-1-h, *), ¢Le/tChild{tJ)(Rt-h, *), and ¢RightChild{t/)(Rt-h, *),
all of which are available from the previous stage t -1 (note that (t - 1) - (h - 1) = t - h).
If t < h or t > h + log m then node v does nothing during stage t. Thus before stage h the
node v lies "dormant", then at stage t = h it first "wakes up" and computes ¢tJ(14J,*), then
at the next stage t = h + 1 it computes WI/(Rt, *), etc. At stage t = h + logm it computes
wt/(R!ogm,*), after which it is done.
The details of what information v stores and how it uses its lIe v)1 + log n processors to
perform stage t in constant time are given below. In the description, tree nodes u and w
are the left and right child, respectively, of v in T.
After stage t, node v (of height h) contains wt/(RI-h, *) and a quantity CriticaltJ(Rt_h)
whose significance is as follows.
Definition 4.2 Let R' be any subset of R. Criticall/(R') is the largest r E R' that is
contained in WI/(R',I) for some f E I(u); if there is no such r then Criticalt/(R') = O.
The monotonicity of A implies that for every r' < Criticalt/(R') (resp., r ' > Criticalt/(R')),
r' is contained in wt/(R',I) for some f E I(u) (resp., f E lew)).
We now explain how v performs stage t, i.e., how it obtains CriticaltJ(Rt_h) and
¢t/(Rt_h,*) using 1/Ju(RI-h,*), Ww(RI-h,*), and CriticaltJ(R1_1_h) (all three of which were
computed in the previous stage t - 1). The fact that the II(v)1 + logn processors can do
this in constant time is based on the following observations, whose correctness follows from
the definitions.
Observation 4.1 1. Criticalt/(Rl_h) is either the same as CriticaluCRt_1_h), or the
successor of Criticalv(Rt_l_h) in Rt-h'
2. Iff E I(u), then wtJ(RI-h, J) is the portion of intervalWu(Rt_h, I) that is $ Criticall/(Rt_h).
3. Iff E I(w), then ¢t/(RI_h, I) is the portion of interval Ww(Rt-h, J) that is > CriticaltJ(Rt_h)'
The algorithmic implications of the above observations are discussed next.
11
![Page 14: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/14.jpg)
4.1 Computing Critical.(R'_h).
Relationship (1) of Observation 4.1 implies that, in order to compute Criticalv(RI_h), all v
has to do is determine which of Criticalv(R1_1_h) or its successor in RC_h is the correct value
of CriticaliRt_h). This is done as follows. If Critica[u(Rt_1_h) has no successor in RC-h
then Critical l1(Rt_l_h) = m (the last row) and hence Criticalv(Rt_h) = Criticalv(Rt_1_h).
Otherwise the updating is done in the following two steps. For conciseness, let T denote
Criticalv(Rt_1_h), and let s denote the successor of T in Rt_h.
• The first step is to compute 8J (u)(s) and 8J(w)(s) in constant time. This involves
a search in leu) (resp., lew)) for the fat column f' E leu) (resp., r E lew»~ whose
1/J..(Rt-h,J') (resp., tPw(Rt-h,I"» contains s. These two searches in I(u) and I(w) are
done in constant time with the II(v)l processors available. We explain how the search
for f' in I(u) is done (that for 1" in I(w) is similar and omitted). Node v assigns a
processor to each I E I(u), and that processor tests whether s is in tPu(Rt-Ii,J)i the
answer is "yes" for exactly one of those II(u)l processors and thus can be collected
in constant time. Next, v determines 9J (u)(s) and 9J (w)(3) in constant time by using
logn processors to search for s in constant time in the leaf solutions 'TrII, and'TrI
J"
available at leaves If and pi, respectively. If the outcome of the search for s in 7r11,
is that s E 1I"[J'(c') for If E If" then OJ(u)(s) = C. Similarly, 8J(w)(s) is obtained from
the outcome of the search for s in 7rII".
• The next step consists of comparing A(s,8J(u)(s» to A(s,OJ(w)(s)). If the outcome is
A(s,OJ(u)(s)) > A(s,8J(w)(s »), then Critical..(Rt_h) is the same as Critical,,(Rt_ 1_ h ).
Otherwise Critical,,(Rt_h) is s.
We next show how the just computed Critical,,(Rt_h) value is used to compute tP,,(Rt.-h, *)
in constant time.
4.2 Computing ,p.(R,_h,*).
Relationship (2) of Observation 4.1 implies the following for each I E I(u):
• If ,p.(R'_h, J) is to the right of Critical.(R'_h) then VJ.(R'_h,j) = 0.
12
![Page 15: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/15.jpg)
• If .,pu(R~_h,J) contains Criticalu(RI_h) then Wu(RI-h, J) consists of the portion of
v>,(Rt-h,j) up to (and including) Critical,(Rt_h)'
The above three facts immediately imply that 0(1) time is enough for II(u)1 of the II(v)1
processors assigned to v to compute .,pu(Rt-h, J) for all j E I(u) (recall that the .,pu(RI_h, *)
array is available in u from the previous stage t - 1, and Criticalu(Rt_h) has already been
computed).
A simllar argument, using relationship (3) of Obervation 4.1, shows that II(w)j pro
cessors are enough for computing 'l/Ju(Rt-h,J) for all f E I(w). Thus .,pu(Rt_h,*) can be
computed in constant time with II(v)1 processors.
This completes the proof of Lemma 4.1. 0
5 Avoiding Read Conflicts
The scheme of the previous section made crucial use of the "concurrent read" capability of
the CREW-PRAM. This occurred in the computation of Criticalu(RI_h) and also in the
subsequent computation of .,pu(Rt-h,*). In its computation of Criticalu(Rt_h), there are
two places where the algorithm of the previous section uses the "concurrent read" capability
of the CREW (both of them occur during the computation of OJ(u)(.9) and 9;(w)(s) in
constant time). After that, the CREW part of the computation of 'l/Jv(Rt-h,*) is the
common reading of Criticalu(RI_h). We review these three problems next, using the same
notation as in the previous section (Le., u is the left child of v in T, w is the right child of
v in T, v hail height h in T, s is the successor of Criticalu(Rt_1_h) in Rt-h, etc.).
• Problem 1: This arises during the search in I(u) (resp., I(w)) for the fat column
f' E I(u) (resp., !" E I(w)) whose v>,(Rt-h,f') (,esp., V>w(Rt-h,!")) contains •.
Specifically, for finding (e.g.) ff, node v assigns a processor to each j E I(u), and
that processor tests whether s is in 'l/Ju(R~-h,J)i the answer is "yes" for exactly one
of those II(u)1 processors and thus can be collected in constant time.
• Problem 2: Having found jf and r, node v determlnes 9](u)(.9) and O](w)(s) in
constant time by using logn processors to search for s, in constant time, in the leaf
solutions 1r/f, and Tr/
f" available at leaves l' and j", respectively. There are two
parts to this problem: (i) many ancestors of a leaf If (possibly all log n of them) may
simultaneously access the same leaf solution 7rII , and (ii) each of those ancestors uses
13
![Page 16: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/16.jpg)
log n processors to do a constant-time search in the leaf solution 1fIJ (in the EREW
model, it would take Q(loglogn) time just to tell the processors in which leaf to
search).
• Problem 3: During the computation of Wv(Rt_h,*), the common reading of the
CriticaluCRt_h) value by the fat columns f E f(v).
Any solution we design for Problems 1-3 should also be such that no concurrent reading of
an entry of matrix A occurs. We begin by discussing how to handle Problem 2.
5.1 Problem 2.
To avoid the "many ancestors" part of Problem 2 (i.e., part (i)), it naturally comes to
mind to make logn copies of each leaf If and to dedicate each copy to one ancestor of
If, especially since we can easily create these logn copies of If in O(logn) time and logn
processors (because the space taken by 1fTJ
is O(logn)). But we are still left with part (ii)
of Problem 2, Le., how an ancestor can search the copy of If dedicated to it in constant
time by using its log n processors in an EREW fashion. On the one hand, just telling all of
those logn processors which If to search takes an unacceptable Q(loglogn) time, and on
the other hand a single processor seems unable to search 1fIJ in constant time. We resolve
this by organizing the information at (each copy of) If in such a way that we can replace the
log n processors by a single processor to do the search in constant time. Instead of storing a
leaf solution in an array rr IJ
, we store it in a tree structure (call it Tree(f)) that enables us
to exploit the highly structured nature of the searches to be performed on it. The search to
be done at any stage t is not arbitrary, and is highly dependent on what happened during
the previous stage t - 1, which is why a single processor can do it in constant time (as we
shall soon see).
We now define the tree Tree(f). Let List = [1,Tl],[rl + I,T2],···,[Tp + I,m] be the
list of (at most logn) nonempty intervals in 1fIJ
, in sorted order. Each node of Tree(f)
contains one of the intervals of List. (It is implicitly assumed that the node of Tree(f)
that contains 1I"IJ (C) also stores its associated column c.) Imagine a procedure that builds
Tree(f) from the root down, in the following way (this is not how Tree(J) is actually built,
but it is a convenient way of defining it). At a typical node x, the procedure has available a
contiguous subset of List (call it L(x)), together with an integer d(x), such that no interval
of L(x) contains a multiple of mj(2d(:z:l). (The procedure starts at the root of Tree(J) with
14
![Page 17: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/17.jpg)
Figure 1. illustrating Tree(J).
L(root) = List and d(root) = -1.) The procedure determines which interval of L(x) to
store in x by finding the smallest integer i > d(x) such that a multiple of mj(2 i ) is in an
interval of L(x) (we call i the priority of that interval), together with the interval of L(x)
for which this happens (say it is interval [rk + 1,rk+1], and note that it is unique). Interval
[Tk + 1, rk+1] is then stored at x (together with its associated column), and the subtree of
x is created recursively, as follows. Let L' (resp., L") be the portion of L(x) to the left
(resp" right) of interval [Tk + l,rk+ll. If L' #- 0 then the procedure creates a left child for
x (call it y) and recursively goes to y with d(y) = i and with L(y) =L'. If L" #- 0 then the
procedure creates a right child for x (call it z) and recursively goes to z with d(z) = i and
with L(z) = L".
Note that the root of Tree(J) has priority zero and no right child, and that its left child
w has d(w) =0 and L(w) = (List minus the last interval in List).
Figure 1 shows the Tree(J) corresponding to the case where m = 32, logn = 8, and
"IJ = [1,6) [7,8) [9,14) [] [15,15] [16,17] [18,22) [23,32].
In that figure, we have a8sumed (for convenience) that the columns in If are numbered
1"" ,8, and we have shown both the columns c (circled) and their associated intervals
'1l'l](C). For this example, the priorities of the nonempty intervals of 7fT] are (respectively)
3,2,3,5,1,3,0. The concept of priority will be useful for building Tree(J) and for proving
various facts about it. The following proposition is an easy consequence of the above
definition of Tree(J).
Proposition 5.1 Let X be an interoal in List, of priority i. Let X' (resp., XII) be the
nearest interval that is to the left (resp., right) of X in List and that has priority smaller
15
![Page 18: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/18.jpg)
than i. Let i' (resp., i") be the priority of X' (resp., X"). Then we have the following:
1. If i > 0 then at least one of {X', X"} exists.
2. If only one of {X',X"} exists, then X is its child in Tree(J) (right child in case of
X', left child in case of X").
3. If X' and X" both exist, then i' =f:. i". Furthermore, if i' > iff then X is the right child
of X' in Tree(J), otherwise (i.e., if i' < i") X is the left child of X".
Proof. The proof refers to the hypothetical procedure we used to define Tree(f). For
convenience, we denote a node x of Tree(J) by the interval X that it contains; i.e., whereas
in the description of the procedure we used to say "the node x of Tree(J) that contains
X" and "list L(x)", we now simply say "node X" and "list L(X)" (this is somewhat of an
abuse of notation, since when the procedure first entered x it did not yet know X).
That (1) holds is obvious.
That (2) holds follows from the following observation. Let X be a child of X' in Tree(J).
When the procedure we used to define Tree(J) was at X', it went to node X with the list
L(X) set equal to the portion of L(X') before X' (if X is left child of X') or after X' (if
X is right child of X'). This implies that the priorities of the intervals between X' and X
in List are all larger than the priority of X. This implies that, when starting in List at
X and moving along List towards X', X' is the first interval that we encounter that has a
lower priority than that of X. Hence (2) holds.
We now prove (3). Note that the proof we just gave for (2) also implies that the parent
of X is in {X/,X"}. Hence to prove (3), it suffices to show that one of {X',X"} is ancestor
of the other (this would imply that they are both ancestors of X, and that the one with
the larger priority is the parent of X). We prove this by contradiction: let Z be the lowest
common ancestor of X' and X" in Tree(J), with Z ¢ {X', X"}. Since Z has lower priority
than both XI and X", it cannot occur between X' and X" in List. However, the fact that
X' (resp., X") is in the subtree of the left (resp., right) child of Z implies that it is in the
portion of L(Z) before Z (resp., after Z). This implies that Z occurs between X' and X"
in List, a contradiction. 0
We now show that Tree(J) can be built in O(logm + log n) time with IListl (:::; logn)
processors. Assign one processor to each interval of List, and do the following in parallel
for each such interval. The processor assigned to an interval X computes the smallest
16
![Page 19: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/19.jpg)
integer i such that a. multiple of mJ(2i ) is in that interval (recall that this integer i is the
priority of interval X). Following this O(logm) time computation of the priorities of the
intervals in List, the processor assigned to each interval determines its parent in Tree(f),
and whether it is left or right child of that parent, by using the above Proposition 5.1.
Reading conflicts are eaBy to avoid during this G(IListl) time computation (the details are
trivial and omitted).
Note: Although we do not need to do so, it is in fact possible to build Tree(J) in G(log m+
log n) time by using only one processor rather than log n processors, but the construction
is somewhat more involved and we refrain from giving it in order not to break the flow of
the exposition.
As explained earlier, after Tree(J) is built, we must make log n copies of it (one for each
ancestor in T of leaf If).
We now explain how a single processor can use a copy of Tree(J) to perform constant
time searching. From now on, if a node of Tree(J) contains intervall!'TJ(c), we refer to that
node as either "node l!'TJ(C)" or as "node c". We use RightChild(c) and LeftChild(c) to
denote the left and right child, respectively, of c in Tree(f).
Proposition 5.2 Let r E Ric. r' be the predecessor of r in Rk. Let r E l!'/J(e), and let
r' E l!'TJ(c'). Then the predecessor of r in Rk+l (call it r") is in l!'/,(c") where d' E
(.I, HightChild(.I), Le/tChild(c), cj.
Proof. If c" E {d,e} then there is nothing to prove, so suppose that d' rt. {c,d}. This
implies that c 1: d (since c = d would imply that ell;:: c). Then l!'/,(c'), 1!'I,(e") and 11'/, (c)
are distinct and occur in that order in List (not necessarily adjacent to one another in
List). Note that 11'1, (c') and l!'/J(c) each contains a row in Rk' that l!'/J(c") contains a row
in Rk+l but no row in Rk' and that all the other intervals of List that are between 11'1, (e')
and l!'/J(c) do not contain any row in Rk+l. This, together with the definition of the priority
of an interval, implies that 7l'[J(c') and 1!'I,(c) have priorities no larger than k, that l!'/,(c")
has a priority equal to k +1, and that all the other intervals of List that are between 11'1,(e')
and 1!'/,(e) have priorities greater than k + 1. This, together with proposition 5.1, implies
that .I' E (HightChild(<!),Le/tChild(c)). o
Proposition 5.3 Let r E Rk' r' be the successor of r in Rk. Let r E 11'1, (c), and let
r' E l!'/J(e'). Then the successor of r in Rk+l (call it r") is in l!'/J(c") where e" E
(c, HightChild(c) ,Le/tChild(c') , <!j.
17
![Page 20: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/20.jpg)
Proof. Similar to that of Proposition 5.2, and therefore omitted. o
Now recall that, in the previous section, the searches done by a particular v in a leaf
solution at If had the feature that, if at stage t node vET asked which column ofIf contains
a certain row r E Ric, then at stage t +1 it is asking the same question about r' where r f is
either the successor or the predecessor of r in Rk+!_ This, together with Propositions 5.2
and 5.3, implies that a processor can do the search On Tree(J)) at stage t +1 in constant
time, so long as it maintains, in addition to the node of Tree(J) that contains the current
r, the two nodes of Tree(J) that contain the predecessor and (respectively) successor of r
in RI;. These are clearly easy to maintain. 0
We next explain how Problems 1 and 3 are handled.
5.2 Problems 1 and 3.
Right after stage t - 1 is completed, v stores the followjng information (recall that the
height of v in T is h). The fat columns I E I(v) for which interval ¢v(Rt- 1-h,J) is not
empty are stored in a doubly linked list. For each such fat column I, we store the following
information: (i) the row interval 'l/Jv(Rt-1-h,J) = [<:1:1,0'21, and (ii) for row 0'1 (resp., 0'2), a
pointer to the node, jn v's copy of Tree(J) , whose interval contains row 0'1 (resp., 0'2)' In
addition, v stores the following. Let z be the parent of 11 in T, and let Sf he the successor
of Criticalz-(Rt_1_(h+l)) in Rt-(h+l)' with s' in 'I/J'U(Rt-1-h,g) for some 9 E lev). Then v
has 9 marked as being its distinguished lat column, and v also stores a pointer to the node,
in v's copy of Tree(g), whose interval contains s'. Of course, information similar to the
above for v is stored in every node x ofT (including v's children, u and w), with the height
of x playing the role of h. In particular, in the formulation of Problem 1, the fat columns
we called J' and jff are the distinguished fat columns of u and (respectively) w, and thus
are available at u and (respectively) w, each of which also stores a pointer to the node
containing s in its copy of Tree(J') (for u) or of Tree(f//) (for w).
Assume for the time being that we are able to maintain the above information in constant
time from stage t -1 to stage t. This would enable us to avoid Problem 1 because instead of
searching for the desired fat column If (resp., r), node u (resp., w) already has it available
as its distinguished fat column. Problem 3 would also be avoided, because now only the
distinguished fat columns of u and w need to read from v the Critical'U(Rt_ll) value (whereas
previously aU of the fat columns in I(u) UI( w) read that value from v). It therefore suffices
to show how to maintain, from stage t - 1 to stage t, the above information (Le., v's linked
18
![Page 21: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/21.jpg)
list and its associated pointers to the Tree(J)s of its elements, v's distinguished fat column
g, and the pointer to v's copy of Tree(g)). We explain how this is done at v.
First, v computes its Criticalv(Rt_h): since we know from stage t - 1 the distinguished
fat columns t and r of u and (respectively) w, and their associated pointers to u's copy of
Tree(f') and (respectively) w's copy of Tree(f"), v can compare the two relevant entries of
matrix A (Le., A(.9, 9J (u)(s)) and A(s,9J (w)(.9))) and it can decide, based on this comparison,
whether Criticalv(Rt_h) remains equal to Criticalv(Rt_1_h) or becomes equal to s, its
successor in Rt_h. But since this is done in parallel by all v's, we must show that no two
nodes of T (say, v and Vi) try to access the same entry of matrix A. The reason this does
not happen is as follows. If none of {v, Vi} is ancestor of the other then no read conflict in A
can occur between v and Vi because their associated columns (that is, J(v) and J(v')) are
disjoint. Ifone of v, Vi is ancestor of the other, then no read conflict in A can occur between
them because the rows they are interested in are disjoint (this is based on the observation
that v is interested in rows in RI_h - ut;;~-hR; and hence will have no conflict with any of
its ancestors).
As a side effect of the computation of Criticalv(RI_h), v also knows which fat column
j E {P .J"} is such that 'l./Jv(Rt-h,i) contains Criticalv(Rt_h)' It uses its knowledge of j
to update its linked list of nonempty fat columns (and their associated row intervals) as
follows (we distinguish two cases):
1. j = 1'. In that case v's new linked list consists of the portion of u's old (Le., at
stage t - 1) linked list whose fat columns are < !' (the row intervals associated
with these fat columns are as in u's list at t - 1), followed by l' with an associ
ated interval 'l./Jv(RI-h,I') equal to the portion of 'l./Ju(RI-h,!') up to and including
Criticalv(Re_h)' followed by Iff with an associated interval 'l./Jv(Rt-h' r) equal to the
portion of 'l./Jw(Rt_h' r) larger than Criticalv(Rt_h) if that portion is nonempty (if
it is empty then r is not included in v's new linked list), followed by the portion of
w's old (Le., at stage t - 1) linked list whose fat columns are> r (the row intervals
associated with these fat columns are as in w's list at t - 1).
2. j = r. Similar to the first case, except that Criticalv(Rt_h) is now in the interval
associated with r rather than 1'.
It should be clear that the above computation of v's new linked list and its associated
intervals can be implemented in constant time with II(v)1 processors (by copyjng the needed
19
![Page 22: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/22.jpg)
information from u and w, since these are at height h - 1 and hence at stage t - 1 already
"knew" their information relative to R t_1_(h_l) = RI_h).
In either one of the above two cases (1) and (2), for each endpoint Q of a ¢v(Rt-h,j)
in v's linked list, we must compute the pointer to the node, in v's copy of Tree(f), whose
interval contains that a. We do it as follows. If ! was not in v's list at stage t - I, then
we obtain the pointer from u or w, simply by copying it (more specifically, if in u or w it
points to a node of u's or w's copy of Tree(f), then the "copy" we make of that pointer
is to the same node but in v's own copy of Tree(J». On the other hand, if f was in v's
list at stage t -I, then we distinguish two cases. In the case where Q was also an endpoint
of ¢v(Rt_ 1_h, I), we already have its pointer (to v's copy of Tree(J» available from stage
t - 1. H Q was not an endpoint of ¢v(Rt-1-h,f) then a is predecessor or successor of an
endpoint of Wv(Rt- 1-h,!) in Rt-hl and therefore the pointer for Q can be found in constant
time (by using Propositions 5.1 and 5.2).
Finally, we must show how v computes its new distinguished fat column. It does so by
first obtaining, from its parent z, Criticalz(Rt_(hH» that z has just computed. The old
distinguished fat collUlln 9 stored at v had its Wv(Rt- 1-h,g) containing the successor Sf of
Criticalz(Rt_1_(h+1» in Rt-(hH)' It must be updated into a fJ such that Criticalv(Rt_h, g)
contains the successor 5" of Criticalz(Rt_(h+1» in RtH-(hH)' We distinguish two cases.
1. Criticalz(Rt_(hH) = Critical%(Rt_1_(h+t). In that case g" is the predecessor of 5'
in R t+1_(h+l), and the fat column fJ E 1(v) for which Wv(Rt_h,g) contains 5" is either
the same as 9 or it is the predecessor of 9 in the linked list for v at stage t (which we
already computed). It is therefore easy to identify 9 in constant time in that case.
2. Criticalz(Rt_(h+t) = 5'. In that case SIl is the successor of 5' in Rt+t-(h+l), and the
fat column 9 E 1(v) for which Wv(Rt-h,fJ) contains 1/ is either the same as 9 or it is
the successor of 9 in the linked list for v at stage t (which we already computed). It
is therefore easy to identify 9 in constant time in that case as well.
In either one of the above two cases, we need to also compute a pointer value to the
node, in v's copy of Tree(fJ), whose interval contains SIl. This is easy if 9 = g, because we
know from stage t - 1 the pointer value for 5/ into v's copy of Tree(g), and the pointer
value for gil can thus be found by using Propositions 5.2 and 5.3 (since gil is predecessor
or successor of Sf in Rt_h). So suppose fJ =f:. g, Le., 9 is predecessor or successor of 9 in v's
20
![Page 23: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/23.jpg)
linked list of nonempty fat columns at t. The knowledge of the pointer for s' to v's copy of
Tree(g) at t - 1 is of little help in that case, since we now care about v's copy of Tree(g)
rather than Tree(g). What saves us is the following observation: 3" must be an endpoint of
row interval 'l/Jv(Rt-h, g). Specifically, Sll is the left (i.e., beginning) endpoint of 'l/Jv(Rt-h, g)
if 9 is the successor of 9 in v's linked list at stage t (Case 2 above), otherwise it is the right
endpoint of 'l/Jv(Rt-h,g) (Case 1 above). Since 51 is such an endpoint, we already know the
pointer for if to v's copy of Tree(g) (because such pointers are available, in v's linked list,
for all the endpoints of the row intervals in that linked list).
References
[1] A. Aggarwal and J. Park, "Parallel searching in multidimensional monotone arrays," to appearin J. of Algorithms. (A preliminary version appeared in Proe. 29th Annual IEEE Symposiumon Foundations of Computer Science, 1988, pp. 497-512.)
[2] A. Aggarwal, M. M. Klawe, S. Moran, P. Shor and R. Wilber, "Geometric Applications of aMatrix Searching Algorithm," Algorilhmica, Vol. 2, pp. 209-233,1987.
[3] A. Apostolieo, M. J. Atallah, L. Larmore, and H. S. McFaddin, "Efficient Parallel AlgorithlDBfor String Editing and Related Problems," SIAM J. Comput., Vol. 19, pp. 968-988, 1990.
[4] M.J. Atallah, R. Cole and M.T. Goodrich, "Cascading Divide-and-Conquer: A Technique rorDesigning Parallel Algorithms," SIAM J. Comput., Vol. 18, pp. 499--532, 1989.
[5] R. Cole, "Parallel merge sort," SIAM J. Comput., Vol. 17, pp. 770-785, 1988.
[6] Paul, W., Vishkin, U., and Wagener, H. "Parallel dictionaries on 2-3 trees." Proc. 10th Call.on Autom., Lang., and Prog., LNCS 154, Springer, Berlin, 1983, pp. 597-609.
21
![Page 24: An Efficient Parallel Algorithm for the Row Minima of a](https://reader031.vdocuments.mx/reader031/viewer/2022020622/61edf6cd4de9e864f5705e38/html5/thumbnails/24.jpg)
CD[1,61
[16.17]®
[23.321®
[15.15J
Figure