high performance verified computing using c-xsc

16
Comp. Appl. Math. DOI 10.1007/s40314-013-0028-4 High performance verified computing using C-XSC Walter Krämer Received: 13 March 2012 / Accepted: 24 December 2012 © SBMAC - Sociedade Brasileira de Matemática Aplicada e Computacional 2013 Abstract So called self-validating or self-verifying numerical methods allow to prove math- ematical statements (existence of a fixed point, of a solution of an ODE, of a zero of a con- tinuous function, of a global minimum within a given range, etc.) using a digital computer. To validate the assertions of the underlying mathematical theorems only fast finite precision machine arithmetic is used. The results are absolutely rigorous. We report on the accuracy as well as on the efficiency of the C++ class library C-XSC, our well known open source software tool designed to facilitate self-verifying numerical calculations. We focus mainly on solvers for dense and sparse interval linear systems. In recent years, these solvers have been improved significantly with respect to high performance computing within our bilateral Probral project HPVC (see Acknowledgments). As a motivating nontrivial example, where we need in an intermediate step an efficient solver for large dense interval linear systems, the computation of a verified functional enclosure for the solution of an integral equation is briefly discussed. The newest version C-XSC 2.5.1 released on June 9, 2011 allows using C-XSC in multi-threaded environments. The library as well as some further packages not mentioned in this paper are open source and freely available from the web site of the author’s research group Scientific Computing/Software Engineering at the University of Wuppertal: http://www2.math.uni-wuppertal.de/org/WRST/index_de.html. Keywords Verified computing · Self-validating methods · High performance computing · Parallelization · Thread-safety · Sparse methods · C-XSC Mathematics Subject Classification (2010) Primary 65G20; Secondary 65G30 Communicated by Renata Hax Sander Reiser. To see/download the latest file release please consult the C-XSC web page http://www.math.uni-wuppertal. de/wrswt/xsc/cxsc_new.html. W. Krämer (B ) Scientific Computing/Software Engineering, Faculty of Mathematics and Natural Sciences, University of Wuppertal, 42119 Wuppertal, Germany e-mail: [email protected] 123

Upload: walter-kraemer

Post on 12-Dec-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Comp. Appl. Math.DOI 10.1007/s40314-013-0028-4

High performance verified computing using C-XSC

Walter Krämer

Received: 13 March 2012 / Accepted: 24 December 2012© SBMAC - Sociedade Brasileira de Matemática Aplicada e Computacional 2013

Abstract So called self-validating or self-verifying numerical methods allow to prove math-ematical statements (existence of a fixed point, of a solution of an ODE, of a zero of a con-tinuous function, of a global minimum within a given range, etc.) using a digital computer.To validate the assertions of the underlying mathematical theorems only fast finite precisionmachine arithmetic is used. The results are absolutely rigorous. We report on the accuracyas well as on the efficiency of the C++ class library C-XSC, our well known open sourcesoftware tool designed to facilitate self-verifying numerical calculations. We focus mainlyon solvers for dense and sparse interval linear systems. In recent years, these solvers havebeen improved significantly with respect to high performance computing within our bilateralProbral project HPVC (see Acknowledgments). As a motivating nontrivial example, wherewe need in an intermediate step an efficient solver for large dense interval linear systems,the computation of a verified functional enclosure for the solution of an integral equation isbriefly discussed. The newest version C-XSC 2.5.1 released on June 9, 2011 allows usingC-XSC in multi-threaded environments. The library as well as some further packages notmentioned in this paper are open source and freely available from the web site of the author’sresearch group Scientific Computing/Software Engineering at the University of Wuppertal:http://www2.math.uni-wuppertal.de/org/WRST/index_de.html.

Keywords Verified computing · Self-validating methods · High performance computing ·Parallelization · Thread-safety · Sparse methods · C-XSC

Mathematics Subject Classification (2010) Primary 65G20; Secondary 65G30

Communicated by Renata Hax Sander Reiser.

To see/download the latest file release please consult the C-XSC web page http://www.math.uni-wuppertal.de/wrswt/xsc/cxsc_new.html.

W. Krämer (B)Scientific Computing/Software Engineering, Faculty of Mathematics and Natural Sciences,University of Wuppertal, 42119 Wuppertal, Germanye-mail: [email protected]

123

W. Krämer

1 Preliminary remarks

This paper is based on a plenary talk the author has given at the conference CNMAC2010, XXXIII Congresso Nacional de Matemática Aplicada e Computacional, Águas deLindóia/SP, Brazil in 2010. The paper particularly summarizes newer developments con-cerning the C-XSC library and high performance verified computing (HPVC). One of thedriving forces of these developments was the bilateral Brasilien/German Probral projectHPVC (see the Acknowledgments at the end of this paper) funded by CAPES and DFG.

The need for numerical validation is shown by simple examples and exemplarily the needfor HPVC is demonstrated when functional enclosures of the solution of an integral equationare to be computed. Typical (sub)tasks require self-verifying solvers for (dense or sparse)linear interval systems. Thus, the paper concentrates on such solvers and on time-consumingoperations like dense matrix/matrix multiplications involved and their HPVC realizations.Please note that the development of sparse HPVC solvers in C-XSC is still in progress. Thetime measurements concerning such solvers are very promising but still preliminary. Detailson the final versions of the sparse solvers will be published in Zimmer (2012).

In this paper, we do not discuss good practices when using the newly introduced C-XSC features concerning HPVC (compiler optimization, error-free transformations, usingopenMP, MPI-interface, . . .). Such topics are very important for the successful developmentof own highly efficient C-XSC software. Therefore, they are addressed in a separate paper(Krämer et al. 2012).

2 Introduction

Fortunately, numerical computations using floating-point operations are in practice in mostcases reliable. Nevertheless, there are situations where typical tests used by scientists, engi-neers, economists (like performing the same computation twice using, e.g. single precision inthe first and double precision in the second run and supposing the number of figures to whichthe two results agree to be the number of correct figures in the final result) do not indicatethat computed results are totally wrong. The following very simple example demonstratesthat this kind of checking the quality of numerical results is not really reliable.

Consider the recursion formula xk+1 = x2k , k = 0, 1, . . . , and suppose that x0 = 1 − ε2

with ε = 2−37. We seek x80. The following C++ program performs the correspondingcomputations using IEEE single and IEEE double precision operations. To this end we callthe template function f twice. When calling f( float(eps) ) the computations arebased on the C++ data type float, i.e., IEEE single precision operations are used. Callingf( double(eps) ) gives the result of the same computation but now using IEEE doubleprecision operations.

#include <iostream>using namespace std;

template<typename T> T f(T x) {//Compute the value (1-x*x)**(2**kMax)

T xk=1-x*x; // x0= (1-x*x)**(2**0)int kMax=80;

123

C-XSC, an environment for high performance verified computing

for (int k=1; k<=kMax; k++) {xk= xk*xk; // x0**(2**k)

}return xk; // x0**(2**kMax)

}

int main() {float p=1024;p= p*p*p*128; //2**37

float eps=1.0/p; //no rounding errorcout << ‘‘eps: ’’ << eps << endl<< endl;

cout << ‘‘f(eps) using float : ’’<< f( float(eps) ) << endl;

cout << ‘‘f(eps) using double: ’’<< f( double(eps) ) << endl << endl;

}

Running the program produces the following output

eps: 7.27596e-12f(eps) using float : 1.000000f(eps) using double: 1.000000

The result value 1.0 produced by single precision computations is equal to the result ofthe double precision computations. However, drawing the conclusion that f(eps)= x80 isclose to 1.0 is wrong. Indeed we find analytically:

With f(x)=(1-x*x)**(2**kMax), x=eps=2**(-37), kMax=80it holds

ln( f(x) )= 2**kMax * ln(1-x*x)= 2**kMax * (-x*x - 0.5*(x*x)**2 - ...) //Taylor< 2**kMax * (-x*x)= 2**80*( -2**(-37)*2**(-37) )= -2**6 = -64,

i.e. f(x) = exp(ln( f(x) )) < exp(-64) < 1.603811*10**(-28)

Thus, the correct value of x80 is smaller than 1.7 × 10−28 (far away from the computedvalues 1.0).

Do we get a better (more useful) result when we use interval operations for machineintervals with double precision bounds? Yes and no. Replacing all operations by intervaloperations produces an enclosure of the correct mathematical value. The resulting intervalcontains the correct value x80. But this interval must also contain the numerical result com-puted with double precision operations (interval operations give worst case error bounds withrespect to the underlying floating point arithmetic, here double precision). In other words, theresulting interval must contain a numerical value smaller than 1.7×10−28 as well as the value1.0, thus, it must contain the interval [1.7 × 10−28, 1]. Of course, this is not very helpful if

123

W. Krämer

we are interested in the correct value of x80 say to ten decimal figures. Nevertheless, gettinga wide interval (often) signals that the numerical computation is in some sense not stable.The user is warned, and thus, may attack the problem by reorganizing the computation ormodifying the solution method.

Let us now use C-XSC (Hofschuster and Krämer 2004; Klatte et al. 1993) interval datatypes and interval operations to compute enclosures for x80:

#include <iostream>using namespace std;

#include <l_interval.hpp> //cxsc interval data typesusing namespace cxsc;

template<typename T> T f(T x) //as in previous listing

int main() {float p=1024;p= p*p*p*128; //2**37float eps=1.0/p; //no rounding error

cout << ‘‘f(eps) using ordinary interval: ’’<< f( interval(eps) ) << endl;

stagprec= 2; //use twofold precision intervalscout << ‘‘f(eps) using twofold precision: ’’

<< f( l_interval(eps) ) << endl;}

This program produces the following output:

f(eps) using ordinary interval: [ 0.000000, 1.000000]f(eps) using twofold precision: [ 1.6038108905E-28,1.6038108906E-28]

As explained above, the result [ 0.000000, 1.000000] using ordinary intervalscomposed by two double precision numbers (C-XSC data type interval) contains theinterval [1.7×10−28, 1]. Indeed, the large diameter (width) of the resulting interval indicatescomputational problems. Here it is easy to find a remedy: we use some kind of higherprecision interval operations. The corresponding C-XSC datatype is l_interval. Theinteger variable stagprec may be used to control the precision. We use twofold precisionto find the sharp enclosure x80 ∈ [ 1.6038108905E−28, 1.6038108906E−28](compare this result with the estimation based on the Taylor series approach discussed above).

The source code shows how simple it is to use the interval data types in C-XSC. Functionand operator overloading allows the common mathematical notation of expressions involvinginterval types. This is also true for (interval) matrix/vector expressions. Complex intervaldata types are also available. In addition, there are a lot of advanced features incorporated inC-XSC. Please refer to Klatte et al. (1993) and Hofschuster and Krämer (2004) and Blomquistet al. (2011).

123

C-XSC, an environment for high performance verified computing

(Machine) interval computations allow in an automated way to verify that assump-tions/assertions necessary for the validity of mathematical theorems are fulfilled in concretesituations (e.g. in connection with Brouwer’s or Schauder’s fixed point theorems that a con-tinuous function f maps a given set into itself, Kulisch and Miranker 1986). This capabilitymakes interval computations to a fast (compared to symbolic manipulations typically doneby computer algebra packages) and powerful tool in the field of computer assisted proofs.Thus, verification methods can assist in achieving a mathematically rigorous result. Thesemethods allow to guarantee that a mathematical problem, e.g. a set of given differentialequations or an integral equation, has a solution and that this solution is unique within thecomputed (functional) bounds. Numerical verification methods, also called self-validatingmethods, are constructive and they allow to handle uncertain data with mathematical rigor.The intention of the C-XSC library is to support the user in using and developing efficientnumerical verification methods.

The main focus in the early days of C-XSC (Klatte et al. 1993) has been to provide theverification functionality in the most convenient and portable way. In recent years also theenhancement of the efficiency became more and more important. The main starting point wasRump (1999b) where performance issues of interval operations are addressed and it is shownthat midpoint-radius arithmetic allows very fast implementation using BLAS (Lawson et al.1979). Meanwhile, highly efficient parallel C-XSC solvers, e.g. for dense linear systems andfirst versions of sparse solvers are available ( Hölbig et al. 2004; Grimmer and Krämer 2007;Zimmer 2007; Zimmer et al. 2010, 2011; Kolberg 2009; Kolberg et al. 2008a, 2009, 2011;Krämer et al. 2012; Krämer and Zimmer 2009) (many thanks especially to Michael Zimmer,University of Wuppertal). This work is still in progress.

There are several further packages available extending the functionality of C-XSC con-siderably. Let us just mention the newest one: it allows arbitrary precision real, real interval,complex, and complex interval calculations. Not only the basic operations are realized butalso a comprehensive set of real and complex interval functions are available (Krämer 2011).As the C-XSC library itself, the additional packages are also open source and available viahttp://www2.math.uni-wuppertal.de/org/WRST/index_de.html.

3 Motivating nontrivial example: computing functional bounds for the solution of anintegral equation

We are interested in a rigorous functional enclosure of the solution of an integral equation(operator equation) Claudio and Dobner 1997; Atkinson and Shampine 2008; Obermaier2003; Grimmer 2007. We will see that using fine subdivisions of the domain will result in the(intermediate) task to solve large dense interval systems of linear equations. To accomplishthis task we need an efficient (parallel) solver for such systems. Such solvers are discussedin Sect. 4 below.

To be explicit, let us consider the following task:Find a continuous function y(.) s.t.

y(s) −2∫

−2.5

(s + t2 sin(t − arctan(s2)) erf(s + 2t))︸ ︷︷ ︸=:k(s,t)

y(t) dt!= g(s)

with forcing term g(s) := sin(exp(erf(2s) + 2)) in the domain −2.5 ≤ s ≤ 2.

123

W. Krämer

Integral Equation Solution Enclosure

–2

–1

0

1

2

Integral Equation Solution Enclosure

–1.5

–1

–0.5

0

0.5

1

Integral Equation Solution Enclosure

–1.5

–1

–0.5

0

0.5

1

Integral Equation Solution Enclosure

–1.5

–1

–0.5

0

0.5

1

–2 –1 0 1 2 –2 –1 0 1 2

–2 –1 0 1 2 –2 –1 0 1 2

(a) (b)

(c) (d)

Fig. 1 Enclosures of y(s) when increasing the number of subdomains

The solution cannot be given in closed form. The numerical method used here anddescribed, e.g. in Grimmer (2007) relies (among a lot of other things) on the verified solutionof a (large) dense interval linear system.

Our goal is to compute an interval polynomial (or a set of interval polynomials over sub-domains, i.e., a piecewise interval polynomial enclosure) containing the true mathematicalsolution y(s). Figure 1 shows graphically some results (note the different scaling for the yaxes). To generate these figures the domain was subdivided into 35, 55, 110, and 300 subdo-mains, respectively. We plot the ranges of the interval polynomials including the graph of theexact mathematical solution (our algorithm verifies the existence and the uniqueness of such asolution automatically; computer-assisted proof). When using more and smaller subdomainsthe enclosures become better and better. It turns out that in case (d) in an intermediate stepof the solving process a linear interval system of order 1 500 has to be solved.

123

C-XSC, an environment for high performance verified computing

Let us briefly comment on the general solution method: Let k : [a, b] × [a, b] → IR,g : [a, b] → IR be continuous, λ ∈ IR be given. We are looking for a continuous functiony : [a, b] → IR s. t.

y(s) − λ

b∫

a

k(s, t)︸ ︷︷ ︸kernel

y(t) dt = g(s)︸︷︷︸inhomogenity

This is a Fredholm integral equation of the second kind. A kernel k(s, t) = ∑Ti=0 ai (s)bi (t)

with {ai }, {bi } being sets of linearly independent functions is called degenerate kernel of orderT . Using Taylor order T to split a general kernel into the sum of a degenerate kernel and a con-tractive kernel (remainder part of twodimensional Taylor expansion) and using an equidistantpartition of [a, b] into N subdomains results in an intermediate step, where a dense linearinterval system of order N × (T + 1) has to be solved. This linear interval system may be(very) large if

• the domain [a, b] is large,• the kernel k(s, t) is complicated,• the forcing term g(s) is complicated,• we are interested in high accuracy,• we want to solve a system of integral equations from the beginning.

For example, using an equidistant partition into say 5 000 subdomains and Taylor order 10results in a dense system with 55 000 unknowns. The entries of the (interval) matrix aregiven by the values (enclosures) of definite integrals. The final enclosure produced by theverification algorithm applied to the integral equation is a set of 5 000 interval polynomials(centered at the midpoints of the respective subdomains) of degree 10, each containing thesolution y(s) on the respective subdomains. The interval polynomials over the subdomains aresticked together resulting in a piecewise interval polynomial function, describing a functiontube enclosing y on the complete domain [a, b].

4 Efficient self-validating linear system solvers

The serial C-XSC solver for the linear (interval) system [A]x = [b] with dense system(interval) matrix [A] and right hand side interval vector [b] is based on the Krawczyk-operator. The basic algorithm is described in many papers; a very readable introduction is inHammer et al. (1993). We concentrate on its realization in C-XSC (Hofschuster and Krämer2004) for real interval data:

Let Am denote the midpoint matrix (center) of the interval matrix [A] and bm the midpointvector of the right hand side interval vector [b]. Our goal is to compute an interval vectorcontaining the solution set �([A], [b]) := {x ∈ Rn |∃A ∈ [A] ∃b ∈ [b] with Ax = b}.Interval quantities are bracketed (all C-XSC linear system solvers accept and produce machinerepresentable quantities in infimum–supremum representation).

1. Compute approximate inverse R of Am using BLAS and LAPACK2. Compute approximate solution x̃ := R bm using BLAS3. Repeat

x̃ := x̃ + R(bm − Am x̃) using DotK algorithmuntil x̃ accurate enough or max. iterations reached

4. Compute [Z ] := R � ([b] − [A]̃x) using DotK algorithm

123

W. Krämer

Table 1 New serial C-XSC solvers compared to corresponding Intlab solvers (dimension 1 000 and conditionnumber about 1010)

What? Solver Real Interval Complex cinterval

Time Intlab 3.86 5.16 16.00 17.14

C-XSC, K = 2 3.96 5.34 15.82 18.88

C-XSC, K = 3 4.38 5.65 16.80 19.02

Correct figures Intlab 6.09 0.93 (6.67, 5.90) (0.81, 0.05)

C-XSC, K = 2 14.22 1.93 (13.63, 12.87) (1.86, 1.11)

C-XSC, K = 3 15.79 1.93 (15.82, 15.78) (1.86, 1.11)

5. Compute [C] := �(I − R[A]) using an intermediate midpoint-radius representation ofthe interval matrix [A] and BLAS

6. [Y ] := [Z ]7. Repeat

[Yold] := blow ([Y ], ε)[Y ] := [Z ] + [C] · [Yold] using the DotK algorithmuntil [Y ] ⊆ interior([Yold]) or the maximum number of iteration steps is reached

If [Y ] ⊆ interior([Yold]) it holds that all point matrices A ∈ [A] are regular and �([A], [b])⊆ x̃ + [Y ].

Using BLAS/LAPACK means calling the BLAS/LAPACK (Demmel 1989; Lawson et al.1979) routine(s) for the indicated task. The term DotK refers to a fast algorithm based only onordinary floating point computations and error free transformations producing the value ofscalar products of vectors with floating point components as computed with K -fold precision(Bohlender 2010; Krämer and Zimmer 2009; Ogita et al. 2005; Zimmer et al. 2010). The �symbols emphasize that (machine-)enclosures have to be computed. To this end occasionallythe rounding mode has to be manipulated appropriately (Krämer and Zimmer 2009).

C-XSC also allows rectangular (real or complex interval) matrices as system matricesand rectangular (real or complex interval) matrices as right hand sides. Thus, computing anenclosure for the set of inverse matrices {A−1|A ∈ [A]} may be done by just specifying theidentity matrix of appropriate dimension as right hand side.

Some time and accuracy measurements for the new C-XSC serial solvers compared toalso highly sophisticated corresponding Intlab (Rump 1999a) solvers are summarized inTable 1 taken from Krämer and Zimmer (2009). The presented integer values of K indicatethat scalar products are computed in a simulated K -fold precision (see Krämer and Zimmer2009; Zimmer et al. 2010). Using K equal to two is recommended. The C-XSC solversare then as fast as the Intlab solvers but the accuracy of the computed C-XSC results issignificantly better.

The serial C-XSC solvers for linear interval systems are supplemented by parallelized(Zimmer 2007; Kolberg et al. 2011; Krämer and Zimmer 2009) versions. All solversuse openMP (http://www.openmp.org/mp-documents/OpenMP3.0-SummarySpec.pdf/) toallow several threads to improve their efficiency on multicore systems. The parallelizedversions are built on MPI (http://www.mcs.anl.gov/research/projects/mpi/; Grimmer andKrämer 2007) and scaLAPACK (Choi et al. 1996). The underlying C-XSC data distributionfor large matrices on parallel systems is the block cyclic storage scheme. To avoid a stor-age bottleneck the data of matrices are distributed blockwise to all processing nodes (thereis no master node holding a complete point or interval matrix). To show the efficiency of

123

C-XSC, an environment for high performance verified computing

Table 2 Speed-up parallel solvers, cond = 1010, n = 5 000, P = number of processes

Computed with… P Real Interval Complex cinterval

DotK, K = 2 1 1.0 1.0 1.0 1.0

2 1.66 1.76 2.17 2.34

4 2.65 2.83 3.71 4.19

8 4.31 4.65 6.42 7.18

DotK, K = 3 1 1.0 1.0 1.0 1.0

2 1.62 1.77 2.08 2.34

4 2.69 2.86 3.66 4.21

8 4.37 4.73 6.35 7.29

the parallelized solvers we have done some time measurements. Table 2 (again taken fromKrämer and Zimmer 2009) summarizes typical results for matrices with condition numberabout 1010 and dimension n = 5 000.

The speed-up mainly depends on the ratio between communication parts and computa-tional tasks to be performed. The communication parts typically involve sending and receiv-ing dotprecision variables (long accumulators, Kulisch 1997) and parts of interval vectorsand interval matrices. If the system size increases and/or if the computational part increases(e.g. when going from real interval to complex interval computations) the speed-up becomesbetter. The entries in the last column of Table 2 show an almost linear speed-up for com-plex interval systems. For real interval systems (column with heading “interval” of Table 2)the speed-up is not so good. Here the system size is still too small resulting in a highercommunication/computation ratio and/or a higher serial/parallel part ratio.

Additional time measurements with larger systems and involving up to 100 cores will begiven in Zimmer (2012). They show that for large enough systems linear speed-up is reachedwithin a 90–95 % margin. For example, experiments on two quite different super computerslocated at the research center Julich and the computer center of the KIT in Karlsruhe producevery similar results: Dimension of the dense real point linear system is n = 25 000 and timemeasurements using 20 and 50 processors/nodes are performed. On both machines more than91 % of the expected linear speed-up factors are achieved. For more details and additionaltimings see Zimmer (2012).

5 Sparse data types and solvers in C-XSC

A very new feature of C-XSC are special data types for sparse matrices (Zimmer et al. 2011).The C-XSC data type for a sparse interval matrix is simatrix and for a sparse complexinterval matrix scimatrix (the usual name for a dense C-XSC data type is prefixed by theletter s to indicate sparsity). For sparse matrices the common compressed column storagestructure (Gilbert et al. 1992) is used, allowing an easy interfacing with other sparse matrixlibraries. Mixed expressions built with sparse and dense quantities are allowed, e.g. we canmultiply a sparse complex interval matrix C by a dense real matrix A just writing C*A. Ingeneral, the result will be represented using a dense data type. There is also a complete setof assignment operators. These and additional features allow to check the numerical resultsof new algorithms implemented for sparse matrices with traditional algorithm implemented

123

W. Krämer

for dense matrices. Implementing the same interface for corresponding dense and sparsealgorithms also allows the use of C++ template programming.

We now present two very simple but instructive examples (these examples are taken fromKrämer 2011). Let us create a banded matrix as a sparse data structure (denoted by SA)as well as a dense data structure (denoted by DA). To this end, we first create an auxiliaryrectangular matrix A. The number of columns of A is equal to the number of bands of thebanded matrix. The elements of each band are stored in a separate column of A. The indexof the column of A is equal to the index of the band stored in this column. Index 0 indicatesthe main diagonal of the banded matrix, -1 its first subdiagonal, −2 its second subdiagonal,+1 the first superdiagonal, and so on.

In the following example, we create a banded matrix with five bands with indices rangingfrom −3 to +1. The elements of the lowest band (index −3) are all set to 1, the elements of thesecond subdiagonal (index −2) are all set to 2, and so forth. We store this banded matrix in asparse data structure SA (C-XSC data type srmatrix) as well as in a dense data structureDA (data type rmatrix). The name srmatrixmeans sparse real matrix. To store a sparseinterval matrix efficiently you can use the sparse C-XSC data type simatrix and to storea sparse complex interval matrix the data type scimatrix.

To be able to print all matrix elements clearly we only create a 7×7 sparse matrix with fivebands (but changing the source line int dim(7); in int dim(10000000); wouldproduce a sparse matrix SAwith ten million unknowns where only the five bands with indicesranging from −3 to +1 contain matrix elements not equal to 0). We also create a correspondingdense matrix DA using a constructor of the class rmatrix with a sparse matrix as its actualparameter. Then we check whether both matrices are equal (from the numerical point ofview) and we compute its difference. Note that SA-DA is a mixed expression; we subtract adense object DA from the sparse object SA.

#include <iostream>#include <srmatrix.hpp> //sparse real matrices

using namespace std;using namespace cxsc;

int main() {

int dim(7); //matrix dimension

rmatrix A(dim,5); //rectangular real matrix//to store the 5 bands

SetLb(A,COL,-3); //lowest band has index -3//index 0 means the main diagonal, -1 the first subdiagonal,//+1 the first superdiagonal, and so onA[Col(-3)]= 1; //all elements of lowest band are set to 1A[Col(-2)]= 2;A[Col(-1)]= 3;A[Col( 0)]= 4; //all elements of diagonal are set to 4A[Col( 1)]= 5; //highest band

cout << ‘‘Auxiliary rectangular matrix A: ’’<< endl << A;

123

C-XSC, an environment for high performance verified computing

cout << ‘‘Lb(A,COL): ’’ << Lb(A,COL)<< ‘‘ Ub(A,COL): ’’ << Ub(A,COL)<< endl;

//create sparse dim-by-dim matrix SA with bands corresponding//to the columns of auxiliary matrix Asrmatrix SA(dim,dim,A);

cout << ‘‘Banded matrix SA (sparse): ’’ << endl << SA;

//create dense dim-by-dim matrix DA and initialize it with SArmatrix DA(SA);

//check DA == SA comparing a dense and a sparse matrixcout << boolalpha << ‘‘DA == SA: ’’ << (DA == SA) << endl;

//print SA-DA (should be the zero matrix)cout << ‘‘SA-DA: ’’<< endl << SA-DA << endl;

return 0;}

Running this program produces the following output:

Auxiliary rectangular matrix A:

1.000000 2.000000 3.000000 4.000000 5.000000

1.000000 2.000000 3.000000 4.000000 5.000000

1.000000 2.000000 3.000000 4.000000 5.000000

1.000000 2.000000 3.000000 4.000000 5.000000

1.000000 2.000000 3.000000 4.000000 5.000000

1.000000 2.000000 3.000000 4.000000 5.000000

1.000000 2.000000 3.000000 4.000000 5.000000

Lb(A,COL): -3 Ub(A,COL): 1

Banded matrix SA (sparse):

4.000000 5.000000 0.000000 0.000000 0.000000 0.000000 0.000000

3.000000 4.000000 5.000000 0.000000 0.000000 0.000000 0.000000

2.000000 3.000000 4.000000 5.000000 0.000000 0.000000 0.000000

1.000000 2.000000 3.000000 4.000000 5.000000 0.000000 0.000000

0.000000 1.000000 2.000000 3.000000 4.000000 5.000000 0.000000

0.000000 0.000000 1.000000 2.000000 3.000000 4.000000 5.000000

0.000000 0.000000 0.000000 1.000000 2.000000 3.000000 4.000000

DA == SA: true

SA-DA:

0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000

0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000 -0.000000

0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000 -0.000000

0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000 -0.000000

-0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -0.000000

-0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

-0.000000 -0.000000 -0.000000 0.000000 0.000000 0.000000 0.000000

123

W. Krämer

All results are as expected.Let us now perform some time measurements. We compare the running times for sparse

and for dense interval matrix–matrix multiplications using the same banded matrices as inthe previous example. Within a main loop we increase the dimension dim of the matricesfrom 25 to 800.

#include <iostream>#include <simatrix.hpp>#include ‘‘sys/time.h’’

using namespace std;using namespace cxsc;

//auxiliary function supporting time measurementsinline double getTime() {

struct timeval _tp;gettimeofday(&_tp,0);return _tp.tv_sec + _tp.tv_usec / 1000000.0;

}

int main() {for (int dim(25); dim <= 800; dim= 2*dim) {

cout << endl << ‘‘dim: ’’ << dim << endl;

imatrix A(dim,5); //auxiliary rectangular interval matrixSetLb(A,COL,-3); //index range for bands starts with -3A[Col(-3)]=1; //all elements of column -3 are set to 1A[Col(-2)]=2;A[Col(-1)]=3;A[Col( 0)]=4;A[Col( 1)]=5;

//create dim-by-dim matrices with bands defined//by the columns of Asimatrix SA(dim,dim,A); //SA is a sparse matriximatrix DA(SA); //DA corresp. dense matrix

double start, time_sparse, time_dense;start= getTime();SA*= SA; //matrix-matrix product for sparse matricestime_sparse= getTime()-start;

start= getTime();DA*= DA; //matrix-matrix product for dense matricestime_dense= getTime()-start;

cout << ‘‘ Acceleration factor due to’’<< ‘‘sparse data structure: ’’<< time_dense/time_sparse << endl;

}

123

C-XSC, an environment for high performance verified computing

return 0;}

Here comes the output produced when running this program:

dim: 25Acceleration factor due to sparse data structure: 5.1745dim: 50Acceleration factor due to sparse data structure: 19.5348dim: 100Acceleration factor due to sparse data structure: 77.6997dim: 200Acceleration factor due to sparse data structure: 310.944dim: 400Acceleration factor due to sparse data structure: 1254.18dim: 800Acceleration factor due to sparse data structure: 7128.67

As expected, the performance of the sparse interval matrix–matrix multiplication is muchbetter than for dense matrices. The acceleration factor grows very fast with increasing dimen-sion (this is also due to the fact that the sparsity of the problem increases with increasingdimension).

Meanwhile, also a small set of (preliminary) C-XSC solvers for sparse linear systems isavailable. These solvers are based on freely available Ansi C packages like CSparse , CXS-parse (Davis 2006; http://www.cise.ufl.edu/research/sparse/CXSparse/), UMFPACK (Davis2004), and CHOLMOD (Chen et al. 2009) PBLAS, PLASMA. The mentioned packages,for example, are used to improve the sparsity pattern by reorderings, to compute LU orCholesky factorizations, to compute first approximations to the solution, to do some kind ofequilibration/scaling (Bradley 2010; Van der Sluis 1969) and so forth. For banded systemsa solver using QR-factorizations of sub-matrices with dimension of the band width (Krämeret al. 1994) of the original system is available. For symmetric positive definite systems anefficient solver based on an estimation of the smallest eigenvalue (Rump 1995) is at theuser’s disposal. Also a very first version of a C-XSC solver for general unsymmetric systemshas been realized (using a bound for the smallest eigenvalue of AAT with previously scaledmatrix A). Efficient algorithms/implementations using sparse data structures are much, muchmore involved than corresponding implementations for dense data structures. Complicatedindexing and tricky and sophisticated algorithms are often hard to put into practice. The workon sparse C-XSC solvers is still in progress.

Table 3 summarizes some performance and accuracy measurements for the solver imple-mented to handle symmetric positive definite matrices based on Rump (1995, 2006); Sun(1992). Corresponding results using the Intlab solver (Rump 1999a) are also given. Ten testmatrices from the matrix market (http://math.nist.gov/MatrixMarket/) with condition num-bers ranging from 65 to 1013 and dimensions ranging from 237 to 90 449 are used. The righthand side vector is always chosen to be the vector of all ones.

Table 3 shows that the computing times of the C-XSC solver and the Intlab solver arein most cases close together, whereas the accuracy of the results computed with the C-XSCsolver is always significantly better (the C-XSC results are almost best possible with respectto the IEEE double precision data format used).

123

W. Krämer

Table 3 Time and accuracy measurements for spd solvers

Matrix Time Intlab Time C-XSC Rel. err. Intlab Rel. err. C-XSC

nos1 0.004 0.003 8.35 × 10−7 4.39 × 10−16

nos7 0.014 0.023 6.86 × 10−5 4.89 × 10−16

bcsstk15 0.41 0.41 6.77 × 10−7 4.66 × 10−16

bcsstk16 0.84 0.74 9.34 × 10−11 4.83 × 10−16

bcsstk17 0.79 1.06 1.95 × 10−5 4.82 × 10−16

bcsstk18 0.72 0.67 2.59 × 10−7 4.79 × 10−16

bcsstm24 0.03 0.02 7.42 × 10−10 4.76 × 10−16

s2rmq4m1 0.54 0.69 5.00 × 10−3 4.90 × 10−16

s1rmq4m1 0.51 0.59 1.90 × 10−6 4.90 × 10−16

s3dkq4m2 182.5 14.28 3.99 × 106 4.90 × 10−16

Hardware: machine with two Intel Xeon 2.26 GHz processors (Nehalem architecture) and 24 GB of RAM

6 Conclusion

C-XSC is a very extensive and powerful C++ class library. It provides a rich set of basicdata types (intervals, multiple precision complex intervals, matrix vector data types, …),and features (elementary functions for all kinds of interval data types, one and two dimen-sional Taylor arithmetic, slopes, automatic differentiation, …) supporting the development ofself-validating numerical methods. A set of problem solving routines (linear interval systemsfor dense and sparse systems, optimization, …) is provided. C-XSC is open source andfreely available from http://www.math.uni-wuppertal.de/wrswt/xsc/cxsc_new.html. Differ-ent additional packages are also freely available in source code (Hofschuster and Krämer2008; Popova et al. 2010; Popova and Krämer 2007; Blomquist et al. 2011).

In recent years significant progress has been made in improving the efficiency of theinterval linear system solvers supplied by C-XSC and efficient parallelized versions havebeen implemented (Krämer et al. 2012). In this respect the bilateral Probral project (seeAcknowledgments below) was very helpful. Please note, that all our C-XSC based additionalsoftware can also be downloaded from our webpage http://www.math.uni-wuppertal.de/wrswt/xsc/cxsc_new.html. The newest C-XSC version also runs efficiently in multi-threadingenvironments (Zimmer 2011).

Currently, our main focus is on the development of highly efficient solvers for sparseinterval linear systems (Zimmer et al. 2011; Zimmer 2012). This is a very demanding task.Support is very welcome. Further improvements of the efficiency on multicore systems arealso under investigation (Kolberg et al. 2008b; Krämer et al. 2012; Milani et al. 2010; Zimmer2011, 2012).

Acknowledgments Special thanks to all the members of the research groups working on the joint German-Brasilian project Probral 2008, AZ 415-br-probral/po-D/07/09623 funded by CAPES and DAAD, especiallyto Gerd Bohlender (KIT, Karlsruhe), Dalcidio Moraes Claudio (PUCRS, Porto Alegre), Gustavo Fernandes(PUCRS), Alfredo Goldman (USP, Sao Paulo), Werner Hofschuster (BUW, Wuppertal), Rudi Klatte (KIT),Mariana Kolberg (PUCRS), and Michael Zimmer (BUW) for a fruitful cooperation in an always very pleasantand stimulating atmosphere. For more information about our Probral project including links to additional jointpapers/preprints see http://www.math.uni-wuppertal.de/org/WRST/projekte/brasilien/. Also many thanks toTiarajú Asmuz Diverio (UFRGS, Porto Alegre), Carlos Amaral Hölbig (UPF, Passo Fundo), and FrithjofBlomquist.

123

C-XSC, an environment for high performance verified computing

References

Atkinson KE, Shampine LF (2008) Algorithm 876: solving Fredholm integral equations of the second kindin Matlab. ACM Trans Math Software 34(4)

Blomquist F, Hofschuster W, Krämer W (2011) C-XSC-Langzahlarithmetiken für reelle und komplexe Inter-valle basierend auf den Bibliotheken MPFR und MPFI. Preprint BUW-WRSWT 2011/1, Bergische Uni-versität Wuppertal. http://www2.math.uni-wuppertal.de/org/WRST/preprints/prep_11_1.pdf

Bohlender G (2010) Improving the efficiency of dot product computations using error free transformations inC-XSC (this volume)

Bradley A (2010) Algorithms for the equilibration of matrices and their application to limited-memory quasi-Newton methods. Ph.D. Thesis, Stanford ICME

Chen Y, Davis TA, Hager WW, Rajamanickam S (2009) Algorithm 887: CHOLMOD, supernodal sparseCholesky factorization and update/downdate. ACM Trans Math Software 35(3)

Choi J, Demmel J, Dhillon I, Dongarra J, Ostrouchov S, Petitet A, Stanley K, Walker D, Whaley R (1996)ScaLAPACK: a portable linear algebra library for distributed memory computers. Design issues and per-formance. Comput Phys Comm 97:1–15

Claudio DM, Dobner H-J (1997) Constructive error analysis for linear differential and integral equations.Mathematica Aplicada e Computational

Davis TA (2004) Algorithm 832: UMFPACK, an unsymmetric-pattern multifrontal method. ACM Trans MathSoftware 30(2):196–199

Davis TA (2006) Direct methods for sparse linear systems. SIAM, Philadelphia. Part of the SIAM Book Serieson the Fundamentals of Algorithms

Demmel J (1989) LAPACK: a portable linear algebra library for supercomputers. In: Proceedings of IEEEcontrol systems society workshop on computer-aided control system design, pp 1–7

Gilbert JR, Cleve M, Robert S (1992) Sparse matrices in MATLAB: design and implementation. SIAM JMatrix Anal Appl 13(1):333–356

Grimmer M (2007) Selbstverifizierende mathematische Softwarewerkzeuge im High Performance Computing.Dissertation, Universität Wuppertal. http://www.math.uni-wuppertal.de/wrswt/literatur/lit_diss.html

Grimmer M, Krämer W (2007) An MPI extension for verified numerical computations in parallel environments.In: Arabnia et al. (eds) International conference on scientific computing (CSC’07, Worldcomp’07), LasVegas, pp 111–117

Hammer R, Hocks M, Kulisch U, Ratz D (1993) Numerical toolbox for verified computing I: basic numericalproblems. Springer, Berlin

Hölbig C, Krämer W, Diverio T (2004) An accurate and efficient self-verifying solver for systems with bandedcoefficient matrix. In: Parallel computing: software technology, algorithms, architectures and applications.Elsevier, Amsterdam, pp 283–290

Hofschuster W, Krämer W (2004) C-XSC 2.0: a C++ library for extended scientific computing. Numericalsoftware with result verification, Lecture Notes in Computer Science, vol 2991. Springer, Heidelberg, pp15–35

Hofschuster W, Krämer W, Neher M (2008) C-XSC and closely related software packages. Preprint 2008/3,Universität Wuppertal, 2008. Published in: Dagstuhl Seminar Proceedings 08021-numerical validationin current hardware architectures, LNCS, vol 5492, Springer, Berlin, pp 68–102. http://www.math.uni-wuppertal.de/org/WRST/projekte/brasilien/

Klatte R, Kulisch U, Wiethoff A, Rauch LC (1993) C-XSC-a C++ class library for extended scientific com-puting. Springer, Heidelberg

Krämer W (2011) C-XSC, a sophisticated environment for reliable computing. In: Ratschan S (ed) Proceedingsof the fourth international conference on mathematical aspects of computer and information sciences,Bejing, pp 115–125

Krämer W, Zimmer M (2009) Fast (parallel) dense linear system solvers in C-XSC using error free transfor-mations and BLAS. Lecture Notes in Computer Science, vol 5492. Springer, Berlin, pp 230–249

Krämer W, Kulisch U, Lohner R (1994) Numerical toolbox for verified computing II. Springer ComputationalMathematics (draft), pp 36–68. http://www.math.uni-wuppertal.de/wrswt/literatur/tb2.ps.gz

Krämer W, Zimmer M, Hofschuster W (2012) Using C-XSC for high performance verified computing PARA2010, Reykjavik, Iceland, Part II. LNCS, vol 7134. Springer, Berlin, pp 168–178

Kolberg M (2009) Parallel self-verified solver for dense linear systems. PhD Thesis, PUCRS, Porto AlegreKolberg M, Fernandes LG, Claudio D (2008) Dense linear system: a parallel self-verified solver. Int J Parallel

Program 36:412–425Kolberg M, Cordeiro D, Bohlender G, Fernandes LG, Goldman A (2008) A multithreaded verified method for

solving linear systems in dual-core processors. In: PARA, 9th international workshop on state-of-the-art inscientific and parallel computing. Lecture Notes in Computer Science. Springer, Berlin (to be published)

123

W. Krämer

Kolberg M, Krämer W, Zimmer M (2009) A note on solving problem 7 of the SIAM 100-digit challenge usingC-XSC. Lecture Notes in Computer Science, vol 5492. Springer, Berlin, pp 250–261

Kolberg M, Krämer W, Zimmer M (2011) Efficient parallel solvers for large dense systems of linear intervalequations. Reliab Comput 15:193–206

Kulisch U (1997) Die fünfte Gleitkommaoperation für top-performance computer. Universität Karlsruhe,Berichte aus dem Forschungsschwerpunkt Computerarithmetik, Intervallrechnung und numerische Algo-rithmen mit Ergebnisverifikation

Kulisch U, Miranker W (1986) The arithmetic of the digital computer: a new approach. SIAM Rev 28(1):1–40Lawson C, Hanson R, Kincaid D, Krogh F (1979) BLAS for fortran usage. ACM Trans Math Software 5(3)Link to MPI (Message Passing Interface). http://www.mcs.anl.gov/research/projects/mpi/Link to PBLAS (Parallel Basic Linear Algebra Subprograms). http://www.netlib.org/scalapack/html/pblas_

qref.html/Milani CR, Kolberg M, Fernandes LG (2010) Solving dense interval linear systems with verified computing

on multicore architectures. VECPAR 2010. http://vecpar.fe.up.pt/2010/papers/44.phpObermaier H (2003) Computerverifikation von Lösungen nichtlinearer Integralgleichungen. Dissertation, Uni-

versität Karlsruhe. http://www.math.uni-wuppertal.de/wrswt/literatur/lit_diss.htmlOgita T, Rump SM, Oishi S (2005) Accurate sum and dot product. SIAM J Sci Comput 26:6PLASMA (Parallel Linear Algebra Software for Multi-core Architectures) user’s guide. http://icl.cs.utk.edu/

projectsfiles/plasma/pdf/users_guide.pdfPopova and Krämer W (2007) Inner and outer bounds for the solution set of parametric linear systems. Preprint

2006/6, Universität Wuppertal (2006) Published in: J Comput Appl Math 199(2):310–316Popova E, Kolev L, Krämer W (2010) A solver for complex-valued parametric linear systems. Preprint 2009/6,

Universitüt Wuppertal, 2009. Published in: Serdica J Comput 4(1):123–132Rump SM (1995) Verified computation of the solution of large sparse linear systems. Z Angew Math Mech

(ZAMM) 75:S439–S442Rump SM (1999) Intlab-interval laboratory. Developments in Reliable Computing, pp 77–104Rump SM (1999) Fast and parallel interval arithmetic. BIT Numer Math 39(3):53–60Rump SM (2006) Verification of positive definiteness. BIT Numer Math 46:433–452Sun J-G (1992) Rounding-error and perturbation bounds for the Cholesky and LDLT factorizations. Linear

Algebra Appl 173(C):77–97Van der Sluis A (1969) Condition numbers and equalibration of matrices. Numer Math 14:14–23Weblink to C-XSC. http://www.math.uni-wuppertal.de/wrswt/xsc/cxsc_new.htmlWeblink to CXSparse. http://www.cise.ufl.edu/research/sparse/CXSparse/Weblink to the Matrix Market. http://math.nist.gov/MatrixMarket/Weblink to OpenMP. http://www.openmp.org/mp-documents/OpenMP3.0-SummarySpec.pdf/Zimmer M (2007) Laufzeiteffiziente, parallele Löser für lineare Intervallgleichungssysteme in C-XSC. Uni-

versity of Wuppertal, Master ThesisZimmer M (2011) Using C-XSC in a multi-threaded environment. Preprint BUW-WRSWT 2011/2, Universität

Wuppertal. http://www2.math.uni-wuppertal.de/org/WRST/preprints/prep_11_2.pdfZimmer M (2012) PhD Thesis, University of Wuppertal (to appear)Zimmer M, Krämer W, Bohlender G, Hofschuster W (2010) Extension of the C-XSC library with scalar prod-

ucts with selectable accuracy. Preprint BUW-WRSWT 2009/4, University of Wuppertal, 2009. Publishedin: Serdica J Comput 4(3):349–370

Zimmer M, Krämer W, Hofschuster W (2011) Sparse matrices and vectors in C-XSC. Preprint BUW-WRSWT2009/7, Universität Wuppertal, 2009. Published in: Reliab Comput 14:138–160

123