the design of a distributed matlab-based environment for...

12
Future Generation Computer Systems xxx (2004) xxx–xxx The design of a distributed MATLAB-based environment for computing pseudospectra C. Bekas a , E. Kokiopoulou a , E. Gallopoulos b,a Computer Science and Engineering Department, University of Minnesota, Minneapolis, MN USA b Computer Engineering and Informatics Department, University of Patras, Patras, Greece Received 24 September 2003; received in revised form 30 October 2003 Abstract It has been documented in the literature that the pseudospectrum of a matrix is a powerful concept that broadens our understanding of phenomena based on matrix computations. When the matrix A is non-normal, however, the computation of the pseudospectrum becomes a very expensive computational task. Thus, the use of high performance computing resources becomes key to obtaining useful answers in acceptable amounts of time. In this work we describe the design and implementation of an environment that integrates a suite of state-of-the-art algorithms running on a cluster of workstations to enable the matrix pseudospectrum become a practical tool for scientists and engineers. The user interacts with the environment via the graphical user interface PPsGUI. The environment is constructed on top of CMTM, an existing environment that enables distributed computation via an MPI API for MATLAB. © 2003 Elsevier B.V. All rights reserved. Keywords: Eigenvalues; Pseudospectrum; Problem solving environments; Parallel MATLAB; MPI; CMTM; Grid computing 1. Introduction and motivation The -pseudospectrum, Λ (A), of a matrix (pseu- dospectrum for short) describes the locus of eigenval- ues of A + E for all possible E such that E for some matrix norm and given , that is Λ (A) ={z C : z Λ(A + E), E}, (1) where Λ(A) denotes the set of eigenvalues of matrix A. For the remainder of this paper we will assume that the underlying matrix norm bounding the perturba- tions is the two-norm. The -pseudospectra are regions Corresponding author. E-mail addresses: [email protected] (C. Bekas), [email protected] (E. Kokiopoulou), [email protected] (E. Gallopoulos). of the complex plane that show where the eigenvalues of a matrix could go when the matrix is subjected to perturbations. They have many interesting properties and can provide useful information regarding the be- havior of iterative methods for the major problems of numerical linear algebra as well for stability studies that frequently are more informative than the eigen- values of the underlying matrix A (e.g. [15,37–39]). It is thus argued that pseudospectra calculations are a necessary component in any suite of tools that re- turns qualitative information about a matrix computa- tion. As a result, it becomes of interest to build a tool that would compute pseudospectra, possibly together with other information about the matrix (e.g. condi- tion numbers, indices of eigenvalue sensitivity, etc.) Unfortunately, it is well known that in many cases of interest the computation of the pseudospectrum 0167-739X/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2003.12.017

Upload: others

Post on 07-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

Future Generation Computer Systems xxx (2004) xxx–xxx

The design of a distributed MATLAB-basedenvironment for computing pseudospectra

C. Bekasa, E. Kokiopouloua, E. Gallopoulosb,∗a Computer Science and Engineering Department, University of Minnesota, Minneapolis, MN USA

b Computer Engineering and Informatics Department, University of Patras, Patras, Greece

Received 24 September 2003; received in revised form 30 October 2003

Abstract

It has been documented in the literature that the pseudospectrum of a matrix is a powerful concept that broadens ourunderstanding of phenomena based on matrix computations. When the matrixA is non-normal, however, the computation ofthe pseudospectrum becomes a very expensive computational task. Thus, the use of high performance computing resourcesbecomes key to obtaining useful answers in acceptable amounts of time. In this work we describe the design and implementationof an environment that integrates a suite of state-of-the-art algorithms running on a cluster of workstations to enable the matrixpseudospectrum become a practical tool for scientists and engineers. The user interacts with the environment via the graphicaluser interfacePPsGUI. The environment is constructed on top ofCMTM, an existing environment that enables distributedcomputation via an MPI API forMATLAB.© 2003 Elsevier B.V. All rights reserved.

Keywords:Eigenvalues; Pseudospectrum; Problem solving environments; Parallel MATLAB; MPI; CMTM; Grid computing

1. Introduction and motivation

The ε-pseudospectrum,Λε(A), of a matrix (pseu-dospectrum for short) describes the locus of eigenval-ues ofA + E for all possibleE such that‖E‖ ≤ ε forsome matrix norm and givenε, that is

Λε(A) = z ∈ C : z ∈ Λ(A + E), ‖E‖ ≤ ε, (1)

whereΛ(A) denotes the set of eigenvalues of matrixA. For the remainder of this paper we will assume thatthe underlying matrix norm bounding the perturba-tions is the two-norm. Theε-pseudospectra are regions

∗ Corresponding author.E-mail addresses:[email protected] (C. Bekas),[email protected] (E. Kokiopoulou), [email protected](E. Gallopoulos).

of the complex plane that show where the eigenvaluesof a matrix could go when the matrix is subjected toperturbations. They have many interesting propertiesand can provide useful information regarding the be-havior of iterative methods for the major problems ofnumerical linear algebra as well for stability studiesthat frequently are more informative than the eigen-values of the underlying matrixA (e.g. [15,37–39]).It is thus argued that pseudospectra calculations area necessary component in any suite of tools that re-turns qualitative information about a matrix computa-tion. As a result, it becomes of interest to build a toolthat would compute pseudospectra, possibly togetherwith other information about the matrix (e.g. condi-tion numbers, indices of eigenvalue sensitivity, etc.)Unfortunately, it is well known that in many casesof interest the computation of the pseudospectrum

0167-739X/$ – see front matter © 2003 Elsevier B.V. All rights reserved.doi:10.1016/j.future.2003.12.017

Page 2: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

2 C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

becomes a difficult task because of its high computa-tional cost. For example, the application of definition(1) would entail a cost that would be a multiple ofthe computation of all the eigenvalues of the matrix,a daunting task for large matrices. As a result, therehave been several research efforts leading to a varietyof methods that researchers actively deploy to approx-imate pseudospectra ranging from simple applicationsof (1) or its equivalents (2, 3 presented in the next sec-tion) (e.g. see[22] for some codes), to state-of-the-artpolyalgorithms (cf.[8,38]).

In this work we outline the design and implemen-tation of a system designed to provide all the com-putational facilities to compute matrix pseudospectra.The system is built to be flexible so as to permit theincorporation of novel algorithms and is deployed viaPPsGUI, a powerful graphical user interface. Thesystem integrates a suite of state-of-the-art parallelalgorithms running in a distributed mode to facilitatescientists and engineers mine pseudospectral infor-mation from any matrix at a cost that is as little aspossible without specific knowledge of the algorithmsunderlying its computation. Related work includes thegraphical tool presented in[29] as well as the popularand continuously evolvingeigtool; cf. [40–42].These tools implement a limited combination ofmatrix-based and domain-based algorithms. To these,our environment contributes a flexible combinationof parallel “domain” and “matrix-based” approaches(cf. Section 2) that permit the fast and convenientcomputation of the pseudospectrum of large matrices.Indeed, the work presented herein is the first effort tocombine a wide variety of methods from both of theabove categories in a single tool. Our programmingplatform isMATLAB that provides an open environ-

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

−0.4

−0.3

−0.2

−0.1

0 0.2 0.4 0.6 0.8 1−0.2

Fig. 1. EstimatingΛε(A) for matrix kahan(100) by random bounded perturbationsA + E, where‖E‖ ≤ ε, ε = 10−11, 10−7 and 10−3

(left to right). Dots denote “pseudoeigenvalues” while crosses denote the eigenvalues ofA.

ment equipped with sophisticated visualization capa-bilities. Parallelism is supported by means of the Mes-sage Passing Interface paradigm (MPI). This choiceis based on the specifications ofCMTM, a softwaremodule that providesMPI functionality forMATLABand thus allows the concurrent execution ofMATLABprocesses on clusters of workstations (COWs)[43].

The remainder of this paper is organized as follows.In Section 2 we present some examples of pseu-dospectra,GRID—a simple classical algorithm fortheir computation and a classification of methods. InSection 3we outline three domain-based algorithms,a matrix-based methodology and their hybrids. InSection 4we describe the cluster architecture and par-allel programming withCMTM in MATLAB. Section 5describes the GUI front-end of the environment andthe applicability of the proposed tool in the Gridframework. Finally, Section 6 contains concludingremarks.

2. Computing pseudospectra

A first example of pseudospectra is presented inFig. 1. We use definition(1) to estimateΛε(A) for amatrix that is routinely used as a benchmark; this ismatrix kahan (see[22] for the generatingMATLABcode) of sizen = 100 when the norm ofE is boundedby different values ofε = 10−3, 10−7 and 10−11. Tothis end, we computed the eigenvalues of random per-turbationsA + E where for each value ofε we con-ducted 100 experiments. Then, for eachε we created aplot where we superimpose the “pseudoeigenvalues”(eigenvalues ofA + E) and the original eigenvaluesof A. It readily follows from the definition that for

Page 3: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 3

different values ofε, theε-pseudospectra form nestedpoint sets that “collapse” to the spectrum ofA as ε

tends to zero. The figures indicate that the smallereigenvalues ofA are more sensitive to perturbations.When A is normal (i.e. satisfies the relationAA∗ =A∗A), the pseudospectrum is readily computed fromthe eigenvalues since classical matrix theory predictsthat it will be the union of the disks of radiusε sur-rounding each eigenvalue ofA. The pseudospectrumbecomes of interest on its own or as an alternativeto standard eigenvalue analysis in cases such as theabove, whereA is non-normal (e.g. nonsymmetric).Even for medium size matrices, however, the compu-tation of their pseudospectrum is a difficult task[38].Regarding definition(1) for example, even if we per-form a large number, says, of random perturbations,we cannot guarantee that they will capture the extremedislocations that are possible for the eigenvalues underall possibleε-bounded perturbations. Furthermore, asalready mentioned earlier, the cost of a pseudospec-trum approximation method based on definition(1) isat least equal to the cost of computing all the eigen-values ofs matrices of the formA + E. One way toget round this problem is by means of two alternativedefinitions ofΛε(A) that can be shown to be equiva-lent to (1) [38]:

Λε(A) = z ∈ C : ‖(A − zI)−1‖ ≥ ε−1, (2)

Λε(A) = z ∈ C : σmin(A − zI) ≤ ε, (3)

whereσmin denotes the smallest singular value of itsmatrix argument. The standard algorithm (GRID) forthe computation ofΛε(A) consists of three phases thatare presented in Algorithm (1).

Algorithm (GRID).

1. Let Ω be such thatΩ ⊇ Λε(A) for the largestεof interest andΩh be a discretization ofΩ withgridpointszk.

2. Computes(zk) := σmin(zkI − A) ∀zk ∈ Ωh.3. Plot the contours ofs(zk) for all values ofε of

interest to obtain∂Λε(A).

Therefore, if we knowΩ (an interesting issue in itsown right that can be addressed by special algorithms,e.g. see[8,10]), the cost to decide ifz is a member ofΛε(A) or not amounts to the cost of computings(z).Fig. 2 depicts the pseudospectra boundaries∂Λε(A)

0.5 0 0.5 1

0.4

0.3

0.2

0.1

0

0.1

0.2

0.3

0.4

Fig. 2. The pseudospectra boundaries∂Λε(A) of matrixkahan(100) for ε = 10−k, k = 3, . . . , , 11.

of kahan(100) for ε = 10−k, k = 3, . . . , , 11. Twoimportant features ofGRID are its straightforwardsimplicity, robustness and potential for embarrassinglyparallel implementation. Its cost is typically modeledby

CGRID = |Ωh| × Cσmin, (4)

where|Ωh| denotes the number of nodes ofΩh andCσmin is a measure of the average cost for the com-putation ofs(z). The total cost quickly becomes pro-hibitive as the size ofA and/or the number of nodesincrease. Given that the cost of computings(z) is atleast O(n2) and that a typical mesh could easily con-tain O(104) points, the cost can be very high even formatrices of moderate size. For example computing thepseudospectrum of a dense matrix of sizen = 1000on a Pentium III workstation using a 50×50 mesh andMATLAB 6.1 requires more than a day. Cost formula(4) readily indicates two major classes of methods foraccelerating the computation, based on the mathemat-ics and numerics of the problem: (a) domain-basedmethods, that aim to reduce the number of nodeswheres(z) is computed, and (b) matrix-based meth-ods that focus on reducing the cost for the evalua-tion of s(z). The above methods can be combinedwith system-level approaches, such as the exploitationof hierarchical memory and parallel processing. See[38,42] for informative surveys of recent efforts in thearea and thePseudospectra Gatewaysite [31] for acomprehensive repository of links to research efforts,references and software.

Page 4: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

4 C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

3. Parallel algorithms for the pseudospectrum

The algorithms driving our environment draw froma number of domain as well as matrix-based methodsthat were developed in recent years that lend them-selves to parallel computation.

3.1. Domain-based methods

The three major domain-based components of ourenvironment are the following (the numbers in boldbrackets denote the reference where extended discus-sion about the method can be found.)

• Cobra [4]: Parallel path following method that nu-merically traces a single boundary ofΛε(A) by con-current estimation, using Newton iteration to solveσmin(A−zI)−ε = 0, ofm pointszi

k, i = 1, . . . , m ∈∂Λε(A). The method inherits the flexibility of nu-merical continuation methods (see[1,11]), whileremedying a number of weaknesses such as dangerof breakdown at steep turns of∂Λε(A) and lack oflarge grain parallelism.

• PsDM [5]: Using the pointszk ∈ ∂Λε(A), k =1, . . . , , N that are already available (i.e. computedby Cobra), the method proceeds to estimate an in-ner boundary∂Λδ(A), δ < ε, by concurrent Newton

Fig. 3. Instances of parallel domain-based methods for computing matrix pseudospectra. Top:Cobra. Bottom:PsDM (left) andPMoG (right).

corrections towards the pointsyk ∈ ∂Λδ(A), k =1, . . . , , N. The method combines the flexibilityof path following methods with the robustness ofGRID, while preserving the opportunities for largegrain parallelism of the latter.

• PMoG [8]: Parallel variant ofGRID that drasticallyreduces the costCGRID by quickly excludingpoints zh of the meshΩh that are not part ofΛε(A). The method can be used to quickly esti-mate the region of the complex plane that containsthe pseudospectrum. If needed, it can also provideapproximations to∂Λε(A), at very low cost forcoarse resolutions.

Fig. 3depicts instances of the above methods. Eachof the aforementioned parallel domain-based meth-ods has a different intuitive geometrical interpretation.Cobra generates a single (or more)∂Λε(A) at a time.PsDM generates nested∂Λε(A). PMoG is a generaliza-tion of GRID that is based on fast exclusions. There-fore, it is natural to combine the above approachesin order to exploit all of their features.PMoG pro-vides a very effective scheme for deciding where onthe complex plane to concentrate our efforts. The dataresulting fromPMoG can be used as input points forPsDM. In particular, one can quickly come close tothe pseudospectrum boundary∂Λε(A) by means of

Page 5: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 5

PMoG and then usePsDM to obtain it at greater detail.Frequently, one is interested in observing how do the∂Λε(A) pseudospectral boundaries vary withε. To dothis, one can first applyCobra to compute a pseu-dospectral boundary for a singleε and then estimateadditional boundaries corresponding to smallerε us-ing PsDM (seeSection 5. As already mentioned, allthree methods lend themselves to parallel computa-tion.Cobra is the most limiting of the three because itallows the concurrent computation of a small numberof points defining a given pseudospectral boundary ata time. Thus we have a fixed amount of parallelismper step and the algorithm cannot scale on geometricalconsiderations alone; cf.[4]. PsDM allows the concur-rent computation of all points of a new pseudospectralcurve from a current one in a “level set”-like manner.Finally, PMoG is closely related toGRID and allowsthe concurrent computation at any points ofΩh, ex-cept that the algorithm also leads to the fast exclusionof points ofΩh.

3.2. Matrix-based methods and hybrids

In spite of the significant improvements in compu-tational savings that have been demonstrated in theliterature via domain-based methods, they are not suf-ficient for the fast computation of pseudospectra oflarge matrices. Even inGRID, the cost is linear in thenumber of gridpoints but at least quadratic in the sizeof the matrix. In fact, the computation ofs(z), even bymeans of advanced methods (e.g. those presented in[2,23,25,26]), is very expensive, especially when thesmallest singular values are clustered. Moreover, weneed to computes(z) for many values ofz. Unfortu-nately, however, the singular values ofA−zkI are notreadily obtained by shifting the singular values ofA;that is, in general,σmin(A− zkI) = σmin(A)− zk eventhough the set of eigenvaluesΛ(A−zkI) = Λ(A)−zk.This is the root of the domain-based complexity ofpseudospectra: Standard dense or iterative methodsfor computings(z) require a different run for eachzk.This is also the reason thatGRID is embarrassinglyparallel and apparently well suited for a distributed oreven Grid-like implementation. What we would liketo derive is a method that concurrently provides ac-curate approximations ofs(zk) for a large number ofvalues zk. This would enable the creation of pow-erful hybrid algorithms based on the aforementioned

domain approaches. One early method in that direc-tion was described in[36]. This method contains twomajor phases: The first is an (expensive) computationthat is common to all valueszk while the second con-sists of cheaper computations that are independent foreachzk. An alternative approach that appears to returnmore accurate approximations is based on a “transferfunction” framework[7,34]. This framework is basedon definition(2) and an approximation ofΛε(A) byΛε(Gz,m(A)), where

Gz,m(A) = V ∗m+1(A − zI)−1Vm+1. (5)

Here,Vm+1 denotes an orthogonal basis for the KrylovsubspaceKm(A, v1) computed by the Arnoldi method[33]. We call this “transfer function” framework in-spired by the term used in Control for functions likeGz,m(A). In [3,7,34] it was shown that‖Gzh,m(A)‖can be efficiently computed iteratively for a large num-ber of shiftszk using Krylov linear solvers such asGMRES and restartedFOM (see[33]) at an additionalcost of O(m3) for each shift compared to a standardsingular value solver that computess(zk). Since typ-ically m is much smaller than the size of the matrixwe achieve substantial computational savings.

As a natural next step we developed hybrid al-gorithms that combine the domain methods of theprevious section with the transfer function frame-work. The resulting algorithms permit us to addressa variety of problems with very large matrices. Inparticular,GRID-like hybrids withPMoG extensionsare ideal when we are interested in the completepseudospectrumΛε(A). On the other hand, hybridsbased onCobra andPsDM are particularly suitablewhen we are interested only in local behavior of thepseudospectrum, for example near the imaginary axisin applications concerning stability. InTable 1 weillustrate the expected performance of the transferfunction approach on a Pentium III workstation with1GB of RAM as matrix sizes scale.

Table 1Typical runtimes on a Pentium III workstation with 1GB RAMusing state-of-the-art hybrid pseudospectra computing algorithms

Matrix size

103 104 105 106

Runtimes 1–2 min 5–10 min 2–3 h 8–16 h

Page 6: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

6 C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

4. Computational platforms and issues

Given the high computational complexity of pseu-dospectra computations for large matrix sizes andresolutions, it becomes necessary to use high perfor-mance computing (HPC) resources. At the level of thehardware infrastructure, to render pseudospectra prac-tical for the common user, it would be preferable touse off-the-shelf systems such as clusters or networksof workstations that have recently become a particu-larly appealing, low-cost solution to supercomputingas well as a component element to more distributed,Grid-type configurations. In fact, network-based com-puting is rapidly becoming a dominant paradigm forComputational Science and Engineering. All algo-rithms described inSection 3and incorporated in ourenvironment were implemented to run on a COWused as an integrated computing resource. The ac-tual configuration used for most of our experimentsconsisted of eight nodes of Pentium III PCs runningWindows 2000 connected via a 100 Mb fast Ethernet.

On the algorithmic side, it is expected that a suc-cessful approach would combine several of the abovemethods into a polyalgorithm[32]. In particular, weseek an open software environment integrating andinterconnecting available algorithms, providing trans-parent access to parallel architectures and at the sametime offering visualization features. Moreover, in ad-dition to being fast and accurate, a practical systemwould also need to be user-friendly. The algorithmsdescribed inSection 3demonstrate parallelism both atthe level of the independent computations that occurin the domain as well as at the level of the parallelismthat is possible within the computation of the singulartriplets (meaning the triplets(σmin, umin, vmin) of min-imum singular value and left and right singular vec-tors); cf. Section 3as well as the original referencesfor details.

An important issue in the design of parallel anddistributed algorithms is load balancing. For the algo-rithms outlined above, we were mostly concerned withthe specific issue of distributing the computationalload equally amongst the processors of a dedicatedCOW in order to achieve high parallel efficiency andspeedup. Consider for example theGRID algorithmwhich requires one to computes(z) (and possibly thecorresponding left and right singular vectors) for ev-ery mesh pointzk ∈ Ωh. If we were to use the intrinsic

MATLAB’s svd function, the runtime fors(z) wouldbe almost independent of the value ofz (except for thefew occasions thatz is on the real line), so load balanc-ing is not an issue. Frequently, however, the problem islarge and sparse so to solve it we need to apply sparseiterative techniques, e.g. the intrinsicMATLAB svdsfunction that is based onARPACK [27]. In that case,performance becomes sensitive to the value ofz andruntimes can vary significantly. Therefore, a load bal-ancing scheme becomes necessary if we are to achievehigh efficiency on a dedicated COW. One possible ap-proach when applying iterative methods in the contextof GRID is to use system-level information obtainedduring execution to distribute the computational loadaccording to resource availability. Using a queue, forinstance, could ensure fully dynamic load distributionat the cost of increased network traffic. An alterna-tive approach is to attempt to exploit the numerics ofthe problem. It has been observed, for example, thatnearby points, sayz and z′, typically require similartimes to computes(z) ands(z′). Therefore, a carefulinterleaved distribution of the points inΩh across theprocessors could lead to better load balance. Similarload balancing issues arise inPsDM and PMoG; cf.[8] for a detailed description of some load balancingpolicies for the algorithms described inSection 3.

Parallelism and communication for our system run-ning on the COW are based on the message pass-ing paradigm, and in particularMPI [30]. MPI ad-dresses concepts and constructs such as point-to-pointmessage passing, collective communications, processgroups and topologies. There exist several commercialand free implementations ofMPI. We usedMPI/Pro[35], as this is currently the only system supportingthe MPI API forCMTM that is a major enabling com-ponent of our environment.

4.1. MATLAB and the Cornell multitask toolbox(CMTM)

MATLAB has evolved into a powerful domain-specific Problem Solving Environment (PSE)[24] forthe development and rapid prototyping of numericalapplications. The key ingredient to its success is thatinteraction withMATLAB is in linear algebra termsand Linear Algebra is admittedly a principal tool forscientific computing.MATLAB’s programming lan-guage is relatively simple and frees the application

Page 7: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 7

developers from a host of programming details,allowing them to focus on the problem to solve.MATLAB was originally conceived and designed as auser-friendly interface to theLINPACK andEISPACKlibraries but today incorporates and provides accessto sophisticated implementations of a large number ofstate-of-the-art numerical methods that address a widerange of problems.MATLAB offers sophisticated visu-alization routines that are necessary for an applicationlike pseudospectra. Because of its open environment,its large user community contributes new codes andother software to public repositories likeNetlib.

A common criticism of high-level rapid prototypingenvironments, likeMATLAB, is that ease of use comesat the cost of computational performance. As stated in[13], for example, “Environments like MATLAB andMathematica have proven to be especially attractivefor rapid prototyping of new algorithms and systemsthat may subsequently be implemented in a more cus-tomized manner for higher performance”. This hasmotivated several efforts to enhance the performanceof these environments. In the case ofMATLAB these ef-forts include the incorporation of “cache aware” linearalgebra primitives based onBLAS-3 andLAPACK aswell as kernels based on novel architecture-dependenttuning such as ATLAS[14] and FFTW[19]; use ofJIT compiler technology for the rapid execution ofMATLAB code[28]; enabling the linking of subrou-tines produced from C or Fortran source code; compil-ers to convertMATLAB applications into stand-aloneC or C++ code. Source-to-source compilation fromMATLAB to a high-level language such as C or Fortranwith mature compiler technology on parallel systemsis one of several possible approaches for enablingMATLAB for parallel processing. There exist severalothers, nicely surveyed in[12]. One of the most pop-ular approaches, judging by the number of efforts, isto engage COW processors in concurrentMATLABsessions and provide access to a message passinglibrary. This approach is followed in the Cornell Mul-titask Toolbox (CMTM) [43]. CMTM was developed atthe Cornell Theory Center and offers anMPI APIfor MATLAB consisting of approximately 40 com-mands. Once the masterMATLAB process has beeninitialized, additionalMATLAB processes are invokedby issuing the command wherenp is the total num-ber of processes. Variablemm er returns any errorcodes. The master process can kill all others via the

commandCMTM provides facilities to write programsunder the SIMD as well as the MIMD paradigms.In particular, by invoking on the master, processdest executes theMATLAB string command. Incasedest is omitted,command is executed by allprocesses. As inMATLAB, CMTM functions take ar-rays as arguments. A nice feature ofCMTM is that italso provides a very simplified interface to severalMPI commands. For example, to broadcast a matrix,say A, from node zero to all other nodes, all nodesexecuteA=MMPI Bcast(A). The system automat-ically distinguishes whether a node is a sender ora receiver and performs the communication. Simi-lar calls exist to perform other procedures requiringcommunication such as blocking send–receive andreductions. For example, in order to compute thedot product of vectorsx andy that are already dis-tributed across the processors, all nodes execute aninstruction d= MMPI Reduce(x′∗ y,MPI ADD)that performs a local dot product in each node andsubsequently accumulates the partial results in vari-abled of the master node.CMTM currently runs onlyon Windows 2000 systems. It would not be difficult,however, to port the environment onto many of theparallelMATLAB systems reviewed in[12] that belongto the message passing category and run Linux, someof which offer moreMPI-like commands thanCMTM;MPITB, for example, offers more than 150[17].

5. PPsGUI

The next concern in building the environment was todesign and build a user-friendly interface for the algo-rithms described inSection 3. We called thisPPsGUIand depict one instance of it inFig. 4 An importantdesign decision was to adopt a hierarchical presenta-tion of the parameters of the pseudospectra computa-tion. The first level that is immediately accessible tothe user provides information regarding (i) the matrixat hand; (ii) the active domain and matrix-based meth-ods (PsDM andsvd in the snapshot ofFig. 4) and (iii)the current number of processors. The user can set thespecific configuration parameters for the componentsof the first level at a second level. For example, push-ing the DETAILS button next to the active Domainmethod selection field (seeFig. 4) triggers the activa-tion of a dialog box that contains all the parameters of

Page 8: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

8 C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

Fig. 4. Graphical tool.

the specific Domain method, such as step length andtotal number of steps forCobra. This strategy ensuresthat the GUI is not overloaded with an overwhelmingnumber of edit dialogs and drop lists at any level. Onthe other hand, we limited the depth of the parame-ter selection hierarchy to two levels, thus facilitating asimple and swift switch between different configura-tions. The results of the computations are graphicallyillustrated in an axis box (seeFig. 4). MATLAB treatsgraphics as objects, having a number of system pre-defined as well as user-defined attributes. InPPsGUIwe exploit this property and assign to each graphi-cally illustrated result a number of attributes such as(i) the domain and matrix-based methods that havebeen used for its computation; (ii) performance indi-cators such as runtimes and speedups; and (iii) objecthandles. The latter are used to serve as identifiers forthe graphical manipulation of the computed results.In particular, the user can select all or part of a pseu-dospectrum boundary that was computed byCobraand mark that it is to be used as input for subsequentcomputations, i.e. inPsDM. PPsGUI provides severalgraphical manipulation capabilities. The user can se-lect a specific area and restart computations zoomingtherein. For example, suppose that we obtain a roughestimate of the pseudospectrum viaPMoG and that wewish to compute it at a specific area of the complexplane in greater detail. To do this withPPsGUI wefirst select the specific area using the mouse and letPsDM take over to compute the contours of interest.

An important goal for the system is to providetransparent use of the COW. All that is required is

that the software resides on a file system shared byall nodes participating in the computation. The toolautomatically decides which and how many of theavailable processors to use.PPsGUI, however, alsoallows an expert user to manually define the paral-lel configuration of his choice. The most importantfeature ofPPsGUI is that the user can specify thatresults computed by one method (e.g.Cobra) willserve as inputs of another (e.g.PsDM). Furthermore,it is possible to use data computed elsewhere and like-wise store its results for future use. The environmentalso incorporates and lets one use, via appropriate se-lections inPPsGUI, many state-of-the-art algorithmsfor computing singular values. So, in addition to thetransfer function framework, one can useMATLAB’sown direct and iterative methodssvd, svds aswell as methods described in[2,23,25,26]. PPsGUIalso allows the superposition of plots as well asthree-dimensional plotting. These allow, for example,the visual comparison or detection of the evolution ofpseudospectra of different matrices. A “recommender”system to help the non-expert select the best combi-nation of algorithms is currently under development.

5.1. Automatic and manual configuration

The automatic detection of the characteristics andconfiguration management of the COW on whichPPsGUI is deployed is a helpful component in anyautomatic or manual decision-making regarding thesetup of the problem solving process. For this purposewe developed theParallelism ConfigurationGUI (PCGUI), a software tool available within theenvironment. Our goals were for the software toenable transparent and user-friendly access to compu-tational resources while also permitting expert usersgreater control. In particular, the user can either man-ually choose a particular configuration, based on theinformation that the tool provides, or let the tool auto-matically decide the configuration at runtime. For eachof the available workstations in the COW, the masterworkstation, on whichPPsGUI was initiated, gathersinformation which includes the number of availableprocessors on the workstation, their type and the clockfrequency, the total amount of RAM memory andthe availability ofMATLAB. All that is needed is thatthe user has an account on all the COW nodes andaccess to a common file system. The communication

Page 9: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 9

Fig. 5. ThePCGUI.

mechanism is implemented as anMPI programcalled find cluster. The central node executesfind cluster using a standard call tompirun.The workstations on whichfind cluster willrun are typically defined in a specialmachines file,every line of which contains the name or IP addressof the workstations. Our environment currently runson Windows 2000 andPCGUI utilizes the registry ofWindows from which it reads information regardingthe present hardware and software resources as wellas the commandnet view which returns all avail-able workstations in the Windows 2000 domain. Itis worth noting, however, that above methodology isgeneral and virtually independent of the particular OS.

Fig. 5 illustratesPCGUI and an instance of its use.Its functionality is summarized as follows:

• The user can request the automatic detection of thefeatures of the COW (AUTO key). On the otherhand, it is possible to manually add a certain work-station (ADD key). All workstations that are eventu-ally available for computation appear in a list abovethe ADD key.

• For each workstation we can illustrate informationabout its hardware and software configuration.

• The user can make a selection of the available work-stations and form a configuration using the SELECTkey. Furthermore, the user can save a configurationor load a previously saved one.

• The CLEAR key resets the tool and the HELP keytriggers a help dialog.

PCGUI is a general tool, which is useful on itsown, and can be used in applications other thanpseudospectra.

5.2. Numerical experiments

We next present a number of experiments withPPsGUI. In the first example we usedCobra to com-pute an initial boundary curve∂Λε(A) with ε = 0.1for a very small matrix, namelykahan(100). Thetotal runtime for two processors was 5.5 s. Then weused the points of this curve as initial points forPsDMto compute five inner curves∂Λεi(A), log10(εi) =−3 − 0.05i, i = 1, . . . , , 5. Using again two proces-sors it required 8 s. Therefore, if we were to use justCobra to compute the inner curves we would needfour processors to top the performance ofPPsGUI.

The next test matrices weregre 1107 (the nu-meral indicates its dimension) from Matrix Market[9]and a random sparse matrix of sizen = 2000. WeusedTRGRID, that is a hybrid algorithm based onGRID and transfer functions[7]. Table 2 illustratesruntimes and corresponding speedups. We witness im-pressive results that exhibit superlinear speedups dueto the amortization of memory load across the work-stations of the cluster. We next usedTRCOBRA, that

Table 2Performance of parallelTRGRID

P

1 2 4 8

Matrix: gre 1107Time (s) 185 80.3 42.4 20.5Speedup – 2.3 4.4 9

Random sparse matrix (n = 2000)Time (s) 77.2 36.2 18.6 10.6Speedup – 2.1 4.2 7.32

Page 10: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

10 C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

Table 3Runtimes (s) and speedups (in parenthesis) for increasing numberof correction points of parallelTRCOBRA for matrix gre 1107and a bidiagonal matrix with random sparse elements and sizen = 40 000

P Points

2 4 8

gre 11071 63 104 1942 40 (1.58) – –4 38.5 (1.65) 39.5 (2.65) –8 38.2 (1.65) 38.8 (2.7) 40 (4.85)

Random matrix1 402 633 11352 244 (1.65) – –4 182 (2.21) 185 (3.42) –8 138 (2.91) 139 (4.56) 142 (8)

is a hybrid built fromCobra and the transfer func-tion framework[3,6]. We used the cluster of the pre-vious example with test matricesgre 1107 and abidiagonal matrix of sizen = 40 000 with randomsparse entries elsewhere[41]. Table 3 lists runtimesand speedups (in parentheses) for various number ofcorrection points at each step ofCobra.

The above experiments clearly illustrate that thereis much to be gained by hybrid methods that arebased on domain and matrix-based algorithms incombination with efficient parallel implementationsrunning on clusters of workstations. To appreciatethese results, we note that using a parallel implemen-tation of the classicalGRID algorithm together with astate-of-the-art general purpose SVD solver[26] forthe large (n = 40 000) sparse matrix of the previousexample on a 50× 50 mesh and an eight-node COW,the pseudospectrum took in excess of 4 h of runtime.

5.3. Towards a Grid PSE

The key motive behind our efforts was to provideaccess to state-of-the-art algorithms that would bringdown the high cost of computing detailed pseudospec-tra of large matrices. The environment presented in thispaper is a powerful and user-friendly system based onoff-the-shelf software and hardware components andaccomplishes this goal. As mathematical models be-come more realistic and numerical simulations moredetailed, however, matrices become larger, resolu-

tions finer and even our algorithms running on COWswill have difficulty producing answers in acceptableamounts of time. In view of recent advances in mid-dleware and network enabled server systems designedfor desktop scientific computing (see, e.g.[20]), weconsider the Grid (cf.[18]) as a next framework for ourenvironment. The fact that several of the algorithmsdescribed inSection 3are amenable to decomposi-tion at large granularities renders the Grid a viableframework for computing pseudospectra. Therefore,our ongoing effort is to evolve the environment intoa Grid-enabled PSE whose components would beselected and triggered as a function of the proper-ties of the problem and the availability of the Grid’sresources.

6. Conclusions

We have described the design of a software environ-ment for the effective computation and presentationof matrix pseudospectra on clusters of workstations.The computational kernel of the environment consistsof several state-of-the-art domain- and matrix-basedmethods. The described environment permits the userto combine algorithms into hybrid methods that appearto be much more effective than algorithms that relysolely on the geometry or matrix numerics. The envi-ronment is built overMATLAB and the Cornell Multi-task Toolbox that enablesMPI-based message passingbetween concurrent instances ofMATLAB. It must benoted that in spite of the fact that our implementationswere necessarily based on specific off-the-shelf com-ponents, one could build a similar tool selecting alter-native methods to enableMATLAB for parallel process-ing or even replaceMATLAB altogether with tools suchasSciLab [16,21]. We have also provided motivationbehind the use of Grid computing for the computationof pseudospectra. The tools described in this paper areexpected to provide important building blocks for thedesign of a Grid-enabled Pseudospectrum PSE that weconsider to be the next goal in our efforts.

References

[1] E.L. Allgower, K. Georg, Continuation and path following,in: Acta Numerica 1993, vol. 2, Cambridge University Press,Cambridge, 1993, pp. 1–64.

Page 11: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 11

[2] J. Baglama, D. Calvetti, L. Reichel, IRBL: an implicitlyrestarted block Lanczos method for large-scale Hermitianeigenproblems, SIAM J. Sci. Comput. 24 (5) (2003) 1650–1677.

[3] C. Bekas, Efficient computation of matrix pseudospectra:algorithms and tools, PhD Thesis, Computer Engineering andInformatics Department, University of Patras, June 2003 (inGreek).

[4] C. Bekas, E. Gallopoulos, Cobra: parallel path followingfor computing the matrix pseudospectrum, Parallel Comput.27 (14) (2001) 1879–1896.

[5] C. Bekas, E. Gallopoulos, Parallel computation of pseudo-spectra by fast descent, Parallel Comput. 28 (2) (2002) 223–242.

[6] C. Bekas, E. Gallopoulos, V. Simoncini, Transfer functionsand path following for computing pseudospectra, in: Procee-dings of the SIAM Conference in Applied Linear Algebra,Extended Abstract, Williamsburg, VA, July 2003.

[7] C. Bekas, E. Kokiopoulou, E. Gallopoulos, E. Simoncini,Parallel computation of pseudospectra using transfer functionson a MATLAB–MPI cluster platform, in: Proceedings of theNinth European PVM/MPI Users’ Group Meeting on RecentAdvances in Parallel Virtual Machine and Message PassingInterface, Lecture Notes in Computer Science, vol. 2474,Springer-Verlag, Berlin, 2002.

[8] C. Bekas, E. Kokiopoulou, I. Koutis, E. Gallopoulos, Towardsthe effective parallel computation of matrix pseudospectra,in: Proceedings of the 15th ACM International Conferenceon Supercomputing (ICS’01), Sorrento, Italy, June 2001,pp. 260–269.

[9] R.F. Boisvert, R. Pozo, K. Remington, R. Barrett, J. Dongarra,The matrix market: a Web repository for test matrix data,in: R.F. Boisvert (Ed.), The Quality of Numerical Software,Assessment and Enhancement, Chapman & Hall, London,1997, pp. 125–137.

[10] T. Braconnier, R.A. McCoy, V. Toumazou, Using the fieldof values for pseudospectra generation, Technical ReportTR/PA/97/28, CERFACS, Toulouse, September 1997.

[11] M. Brühl, A curve tracing algorithm for computing thepseudospectrum, BIT 33 (3) (1996) 441–445.

[12] L.Y. Choy, Matlab∗p 2.0: interactive supercomputingmade practical, Master’s Thesis, Massachusetts Instituteof Technology, Cambridge, MA, September 2002.http://theory.lcs.mit.edu/∼cly/survey.html.

[13] J.J. Dongarra, V. Eijkhout, Numerical linear algebra algori-thms and software, J. Comput. Appl. Math. 123 (2000) 489–514.

[14] J.J. Dongarra, V. Eijkhout, Self-adapting numerical softwarefor next generation applications, Technical Report LapackWorking Note 157, ICL-UT-02-07, University of Tennessee,Knoxville, TN, August 2002.

[15] M. Embree, Convergence of Krylov subspace methods fornon-normal matrices, PhD Thesis, Balliol College, Universityof Oxford, April 2000.

[16] E. Caron, et al., SCILAB to SCILAB//: the OURAGANproject, Parallel Comput. 27 (2001) 1497–1519

[17] J. Fernandez, A. Canas, A.F. Diaz, J. Gonzalez, J. Ortega, A.Prieto, Performance of message-passing MATLAB toolboxes,

in: Proceedings of the Fifth International Conference onHigh Performance Computing for Computational Science(VECPAR 2002), Porto, Portugal, Lecture Notes in ComputerScience, vol. 2565, Springer, Heidelberg, January 2003,pp. 228–241.

[18] I. Foster, C. Kesselman (Eds.), The Grid: Blueprint for a NewComputing Infrastructure, Morgan Kaufmann, San Mateo,1998.

[19] M. Frigo, FFTW: an adaptive software architecture for theFFT, in: Proceedings of the ICASSP Conference, vol. 3, 1998,p. 1381.

[20] NetSolve/GridSolve Web site.http://icl.cs.utk.edu/netsolve.[21] Scilab Group, Introduction to Scilab: User’s Guide, Technical

Report, INRIA-Unité de recherche de Rocquencourt-ProjectMéta2, 1997.http://www-rocq.inria.fr/scilab.

[22] N.J. Higham, The matrix computation toolbox, TechnicalReport, Manchester Centre for Computational Mathematics,2002.http://www.ma.man.uc.uk/∼higham/mctoolbox.

[23] M.E. Hochstenbach, Subspace methods for eigenvalueproblems, PhD Thesis, Department of Mathematics, UtrechtUniversity, May 2003.

[24] E.N. Houstis, J.R. Rice, E. Gallopoulos, R. Bramley(Eds.), Enabling Technologies For Computational Science:Frameworks, Middleware, and Enviroments, KluwerAcademic Publishers, Dordrecht, 2000.

[25] Z. Jia, D. Niu, An implicitly restarted refined bidiagona-lization Lanczos method for computing a partial singularvalue decomposition, SIAM J. Matrix Anal. Appl. 25 (1)2003.

[26] E. Kokiopoulou, C. Bekas, E. Gallopoulos, Computingsmallest singular triplets with implicitly restarted Lanczosbidiagonalization, Appl. Numer. Math. 49 (1) (2004).

[27] R. Lehoucq, D.C. Sorensen, C. Yang, Arpack User’sGuide: Solution of Large-Scale Eigenvalue Problems withImplicitly Restarted Arnoldi Methods, SIAM, Philadelphia,1998.

[28] The MATLAB JIT-Accelerator, Mathworks TechnologyBackgrounder, September 2002.http://www.mathworks.com/company/digest/sept02/accelmatlab.pdf.

[29] D. Mezher, A graphical tool for driving the parallelcomputation of pseudosprectra, in: Proceedings of the 15thACM International Conference on Supercomputing (ICS’01),Sorrento, Italy, June 2001, pp. 270-276.

[30] Message Passing Interface Forum.http://www.mpi-forum.org.[31] Pseudospectra gateway, At the Oxford University site

http://web.comlab.ox.ac.uk/projects/pseudospectra.[32] J.R. Rice, The algorithm selection problem, in: Advances

in Computers, vol. 15, Academic Press, New York, 1976,pp. 65–118.

[33] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM,Philadelphia, 2003.

[34] V. Simoncini, E. Gallopoulos, Transfer functions and resolventnorm approximation of large matrices, Electron. Trans.Numer. Anal. 7 (1998) 190–201.

[35] MPI Software Technology, MPI/Pro.http://www.mpi-softtech.com.

Page 12: The design of a distributed MATLAB-based environment for ...scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/FGCS.pdf · tional cost. For example, the application of definition

12 C. Bekas et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

[36] K.-C. Toh, L.N. Trefethen, Calculation of pseudospectra bythe Arnoldi iteration, SIAM J. Sci. Comput. 17 (1) (1996)1–15.

[37] L.N. Trefethen, Pseudospectra of matrices, in: D.F. Griffiths,G.A. Watson (Eds.), Proceedings of the 14th DundeeConference on Numerical Analysis 1991, Essex, UK,Longman Science and Technical, London, 1991, pp. 234–266.

[38] L.N. Trefethen, Computation of pseudospectra, in: ActaNumerica 1999, vol. 8, Cambridge University Press,Cambridge, 1999, pp. 247–295.

[39] L.N. Trefethen, A.E. Trefethen, S.C. Reddy, T.A. Driscoll,Hydrodynamic stability without eigenvalues, Science 261(1993) 578–584.

[40] T. Wright, Eigtool: a graphical tool for nonsymmetriceigenproblems, December 2002. At the Oxford UniversityComputing Laboratory site http://web.comlab.ox.ac.uk/pseudospectra/eigtool.

[41] T. Wright, L.N. Trefethen, Large-scale computation ofpseudospectra using ARPACK and Eigs, SIAM J. Sci.Comput. 23 (2) (2001) 591–605.

[42] T.G. Wright, Algorithms and software for pseudospectra, PhDThesis, University of Oxford, 2002.

[43] J.A. Zollweg, A. Verma, The Cornell Multitask Toolbox,Directory Services/Software/CMTM. http://www.tc.cornell.edu.

Constantine Bekas is a Post Doctoralassociate at the Computer Science andEngineering Department of the Univer-sity of Minnesota. His research interestsinclude computation of pseudospectra ofvery large matrices, eigenvalue and largesingular value problems and high perfor-mance and parallel computing. He wasa recipient of a Bodossaki FoundationDoctoral Scholarship and a Technical

Chamber of Greece award for excellent academic performance.He received a Computer Engineering Diploma in 1998 (withdistinction), Master’s Diploma in 2001 and PhD in 2003, allfrom the Computer Engineering and Informatics Department ofthe University of Patras, Greece.

Effrosyni Kokiopoulou is a graduatestudent and research associate at theComputer Science and Engineering De-partment of the University of Minnesota.Her research interests include largesingular value problems, data mining,information retrieval and parallel com-puting. She was a recipient of a Bodos-saki Foundation scholarship for graduatestudies and numerous awards from the

Hellenic Scholarship Foundation for excellent performance inundergraduate studies. She received an Engineering Diploma(class valedictorian) in 2002, from the Computer Engineering andInformatics Department of the University of Patras, Greece.

Efstratios Gallopoulos has been pro-fessor at the Department of ComputerEngineering and Informatics since 1996.Before that he held positions of SeniorComputer Scientist at the University ofIllinois at Urbana-Champaign; assistantprofessor at the University of Califor-nia Santa Barbara; visiting researcher atNASA Goddard Space Flight Center. Assenior computer scientist he participated

in research and development of the Cedar vector multiprocessorat the University of Illinois Center for Supercomputing Re-search and Development and in the software development of theGoodyear Aerospace Massively Parallel Processor (MPP). Hehas been member of scientific committees of many internationalconferences; editor of Computing in Science and Engineering(1994–1999) and the International Journal of High Speed Com-puting; co-organizer in 1992 of the NSF Workshop in FutureDirections PSEs for Computational Science and member of theSteering Committee of the European Research Foundation EU-RESCO Conference on Advanced Environments and Tools forHigh Performance Computing. He received his PhD (computerscience) at the University of Illinois at Urbana-Champaign in1984 and his BSc (first class honors) in Mathematics from theImperial College of Science and Technology in 1979.