cs 591x – cluster computing and programming parallel computers parallel libraries
TRANSCRIPT
CS 591x – Cluster Computing and Programming Parallel Computers
Parallel Libraries
Parallel Libraries
Recall that so far we have been – Breaking up (decomposing) our
“large” problems into smaller pieces…
Distributing the pieces of the problem to multiple processors
Explicitly moving data among processes through message passing
Parallel Libraries
Note that – Large scientific and engineering
problems often represent data in matrices and vectors
Large scientific and engineering problems make heavy use of linear algebra, linear systems, non-linear systems
Parallel Libraries
MPI is designed to support the development of librariesConsequently, there are a number of libraries, based on MPI, used to develop parallel softwareSome libraries take care of much, or all of the parallelizationThat means….
Parallel Libraries
… You don’t have to…… but you still can…… if you want… sometimes…
Parallel Libraries
ScaLAPACK Scalable Linear Algebra PACKage
PETSc Portable, Extensible Toolkit for
Scientific Computation
ScaLaPACK
Built on LAPACK – Linear Algebra Package Powerful Widely used in scientific and engineering
computing not scalable to distributed memory parallel
computers
LAPACK is built on BLAS – the Basic Linear Algebra Subprogram library
ScaLAPACK
uses PBLAS – Parallel BLAS performs local matrix and vector operations
in parallel application uses BLAS
uses BLACS – Basic Linear Algebra Communications Subprograms library handles interprocess communications for
ScaLAPACK uses MPI (other implementations also)
ScaLAPACK
Maps matrices and vectors to a process grid called a BLACSgrid similar to an MPI Cartesian topology matrices and vectors decomposed
into rectangular blocks – block cyclically distributed to BLACSgrid
ScaLAPACK – sample based on Pacheco pg. 345-350
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
Get_input(p, myrank, &n, &n_proc_rows,&nproc_cols,
&row_block_size, &col_block_size);
m=n;
Cblacs_get(0,0,&blacs_grid); /* build blacs grid */
/* R process grid will use row major order */
Cblacs_gridinit(&blacs_grid,”R”,nproc_rows, nproc_cols);
Cblacs_pcoord(blacs_grid,my_rank,&my_proc_row,&my_proc_col);
ScaLAPACK – sample cont.local_mat_rows=get_dim(m,row_block_size,my_proc_row,nproc_rows);
local_mat_cols=get_dim(n,col_block_size,my_proc_col,nproc_cols);
Allocate(my_rank,”A”,&A_local,local_mat_rows*local_mat_cols,1);
b_local_size=get_dim(m,row_block_size,my_proc_row,nproc_rows);
Allocate(my_rank,”b”,b_local,b_local_size,1);
exact_local_size=get_dim(m,col_block_size,my_proc_row,nproc_rows);
Allocate(myrank,”Exact”,&exact_local,exact_local_size,1);
ScaLAPACK – sample cont.Build_descript(my_rank,”A”,A_descript,m,n,row_block_size,col_block_size,blacs_grid,local_mat_rows);
Build_descript(my_rank,”B”,b_descript,m,1,row_block_size,1,blacs_grid,b_local_size);
Build_descript(my_rank,”Exact”,exact_descript,n,1,col_block_size,1,blacs_grid,exact_local_size);
scaLAPACK – sample cont.Initialize(p,my_rank,A_local,local_mat_rows,local_mat_cols,exact_local,exact_local_size);
Mat_vect_mult(m,n,A_local,A_descript, exact_local, exact_descript, b_local, b_descript);
Allocate(my_rank,”pivot_list”,&pivot_list,local_mat_rows + row_block_size,0);
MPI_Barrier(MPI_COMM_WORLD);
scaLAPACK – sample cont./* psgesv solves Ax=b returns solution in b */
solve(my_rank,n,A_local,A_descript,pivot_list, b_local, b_descript);
…
Cblacs_exit(1);
MPI_Finalize();
…
}
scaLAPACK – sample cont.void Mat_vect_mult(int m, int n, float* A_local,
int A_descript, float* x_local, int* x_descript, float y_local, int* y_descript) (
char transpose = ‘N’;
…
psgemv(&transpose, &m, &n, &alpha, A_local, &first_row_A, &first_col_A, A_descript, x_local, &first_row_x, &first_col_x, x_descript, &beta,
y_local, &first_row_y, &first_col_y, y_descript,
y_increment);
}
Crossing Languages – Some Issues
Calling routines from another language calling Fortran subroutine in C
Using n dimensional arrays remember row major vs column major
Passing arguments in routine/function calls Fortran passes by address, C passes
by value
PETSc
Portable, Extensible Toolkit for Scientific ComputationLarge, powerfulSolves Partial differential equations Linear systems Non-linear systems
Solves matrices – Dense Sparse
PETSc
PETSc routines return error codesPETSc error checking routines to help troubleshoot problems CHKERRRA(errorcode)
PETSc
Built on top of MPIDeveloped primarily for C/C++ unlike scaLAPACK has Fortran interface
Dense and sparce matrices same interface
PETScIncludes many non-blocking operations i.e. any process can update any cell matrix as
non-blocking operation --- other work can be going on while this update
operation is carried out
Many options available from command linePETSc includes many solversSolvers can be selected from command line can change solvers without recompiling
PETSC_DECIDES
PETSc
from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2
PETSc
from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2
PETSc – sample routinesPetscOptionsGetInt(PETSC_NULL, “-n”, &n, &flg);
VecSetType(Vec x, Vec_type vec_type);
VecCreate(MPI_Comm comm, Vec *x);
VecSetSizes(Vec x, int m, int M);
VecDuplicate(Vec old, Vec new);
MatCreate(MPI_Comm comm, int m, int n, int M, int N, Mat* A);
MatSetValues(Mat A, int m, int* im, int n, int* in,
PetscScalar *values, INSERT_VALUES);
PETSc – sample routinesMatAssemblyBegin(Mat A, MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(Mat A, MAT_FINAL_ASSEMBLY);
KSPCreate(MPI_Comm comm, KSP *ksp);
KSPSolve(KSP ksp, Vec b, Vec x);
PetscInitialize(&argc, &argv);
PetscFinalize();
BLAS (Basic Linear Algebra Subprograms http://www.netlib.org/blas/
LAPACK Linear Algebra PACKage http://www.netlib.org/lapack/ http://www.netlib.org/lapack/lug/
index.html
ScaLaPACK http://www.netlib.org/scalapack/scalapac
k_home.html
PETSc http://www-unix.mcs.anl.gov/petsc/petsc-
as/ http://acts.nersc.gov/petsc/ http://www.chuug.org/talks/petsc.pdf http://www.epcc.ed.ac.uk/tracsbin/petsc-
2.0.24/docs/splitmanual/manual.html#Node0