cs 591x – cluster computing and programming parallel computers parallel libraries

CS 591x – Cluster Computing and Programming Parallel Computers

Parallel Libraries

Parallel Libraries

Recall that so far we have been – Breaking up (decomposing) our

“large” problems into smaller pieces…

Distributing the pieces of the problem to multiple processors

Explicitly moving data among processes through message passing

Parallel Libraries

Note that – Large scientific and engineering

problems often represent data in matrices and vectors

Large scientific and engineering problems make heavy use of linear algebra, linear systems, non-linear systems

Parallel Libraries

MPI is designed to support the development of librariesConsequently, there are a number of libraries, based on MPI, used to develop parallel softwareSome libraries take care of much, or all of the parallelizationThat means….

Parallel Libraries

… You don’t have to…… but you still can…… if you want… sometimes…

Parallel Libraries

ScaLAPACK Scalable Linear Algebra PACKage

PETSc Portable, Extensible Toolkit for

Scientific Computation

ScaLaPACK

Built on LAPACK – Linear Algebra Package Powerful Widely used in scientific and engineering

computing not scalable to distributed memory parallel

computers

LAPACK is built on BLAS – the Basic Linear Algebra Subprogram library

ScaLAPACK

uses PBLAS – Parallel BLAS performs local matrix and vector operations

in parallel application uses BLAS

uses BLACS – Basic Linear Algebra Communications Subprograms library handles interprocess communications for

ScaLAPACK uses MPI (other implementations also)

ScaLAPACK

Maps matrices and vectors to a process grid called a BLACSgrid similar to an MPI Cartesian topology matrices and vectors decomposed

into rectangular blocks – block cyclically distributed to BLACSgrid

ScaLAPACK – sample based on Pacheco pg. 345-350

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &p);

MPI_Comm_rank(MPI_COMM_WORLD,&myrank);

Get_input(p, myrank, &n, &n_proc_rows,&nproc_cols,

&row_block_size, &col_block_size);

m=n;

Cblacs_get(0,0,&blacs_grid); /* build blacs grid */

/* R process grid will use row major order */

Cblacs_gridinit(&blacs_grid,”R”,nproc_rows, nproc_cols);

Cblacs_pcoord(blacs_grid,my_rank,&my_proc_row,&my_proc_col);

ScaLAPACK – sample cont.local_mat_rows=get_dim(m,row_block_size,my_proc_row,nproc_rows);

local_mat_cols=get_dim(n,col_block_size,my_proc_col,nproc_cols);

Allocate(my_rank,”A”,&A_local,local_mat_rows*local_mat_cols,1);

b_local_size=get_dim(m,row_block_size,my_proc_row,nproc_rows);

Allocate(my_rank,”b”,b_local,b_local_size,1);

exact_local_size=get_dim(m,col_block_size,my_proc_row,nproc_rows);

Allocate(myrank,”Exact”,&exact_local,exact_local_size,1);

ScaLAPACK – sample cont.Build_descript(my_rank,”A”,A_descript,m,n,row_block_size,col_block_size,blacs_grid,local_mat_rows);

Build_descript(my_rank,”B”,b_descript,m,1,row_block_size,1,blacs_grid,b_local_size);

Build_descript(my_rank,”Exact”,exact_descript,n,1,col_block_size,1,blacs_grid,exact_local_size);

scaLAPACK – sample cont.Initialize(p,my_rank,A_local,local_mat_rows,local_mat_cols,exact_local,exact_local_size);

Mat_vect_mult(m,n,A_local,A_descript, exact_local, exact_descript, b_local, b_descript);

Allocate(my_rank,”pivot_list”,&pivot_list,local_mat_rows + row_block_size,0);

MPI_Barrier(MPI_COMM_WORLD);

scaLAPACK – sample cont./* psgesv solves Ax=b returns solution in b */

solve(my_rank,n,A_local,A_descript,pivot_list, b_local, b_descript);

…

Cblacs_exit(1);

MPI_Finalize();

…

}

scaLAPACK – sample cont.void Mat_vect_mult(int m, int n, float* A_local,

int A_descript, float* x_local, int* x_descript, float y_local, int* y_descript) (

char transpose = ‘N’;

…

psgemv(&transpose, &m, &n, &alpha, A_local, &first_row_A, &first_col_A, A_descript, x_local, &first_row_x, &first_col_x, x_descript, &beta,

y_local, &first_row_y, &first_col_y, y_descript,

y_increment);

}

Crossing Languages – Some Issues

Calling routines from another language calling Fortran subroutine in C

Using n dimensional arrays remember row major vs column major

Passing arguments in routine/function calls Fortran passes by address, C passes

by value

PETSc

Portable, Extensible Toolkit for Scientific ComputationLarge, powerfulSolves Partial differential equations Linear systems Non-linear systems

Solves matrices – Dense Sparse

PETSc

PETSc routines return error codesPETSc error checking routines to help troubleshoot problems CHKERRRA(errorcode)

PETSc

Built on top of MPIDeveloped primarily for C/C++ unlike scaLAPACK has Fortran interface

Dense and sparce matrices same interface

PETScIncludes many non-blocking operations i.e. any process can update any cell matrix as

non-blocking operation --- other work can be going on while this update

operation is carried out

Many options available from command linePETSc includes many solversSolvers can be selected from command line can change solvers without recompiling

PETSC_DECIDES

PETSc

from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2

PETSc – sample routinesPetscOptionsGetInt(PETSC_NULL, “-n”, &n, &flg);

VecSetType(Vec x, Vec_type vec_type);

VecCreate(MPI_Comm comm, Vec *x);

VecSetSizes(Vec x, int m, int M);

VecDuplicate(Vec old, Vec new);

MatCreate(MPI_Comm comm, int m, int n, int M, int N, Mat* A);

MatSetValues(Mat A, int m, int* im, int n, int* in,

PetscScalar *values, INSERT_VALUES);

PETSc – sample routinesMatAssemblyBegin(Mat A, MAT_FINAL_ASSEMBLY);

MatAssemblyEnd(Mat A, MAT_FINAL_ASSEMBLY);

KSPCreate(MPI_Comm comm, KSP *ksp);

KSPSolve(KSP ksp, Vec b, Vec x);

PetscInitialize(&argc, &argv);

PetscFinalize();

BLAS (Basic Linear Algebra Subprograms http://www.netlib.org/blas/

LAPACK Linear Algebra PACKage http://www.netlib.org/lapack/ http://www.netlib.org/lapack/lug/

index.html

ScaLaPACK http://www.netlib.org/scalapack/scalapac

k_home.html

PETSc http://www-unix.mcs.anl.gov/petsc/petsc-

as/ http://acts.nersc.gov/petsc/ http://www.chuug.org/talks/petsc.pdf http://www.epcc.ed.ac.uk/tracsbin/petsc-

2.0.24/docs/splitmanual/manual.html#Node0

cs 591x – cluster computing and programming parallel computers parallel libraries

Documents

rows local

size slide

size mat

local matrix

size build

rows row

col slide

parallel libraries mpi