exercises 7: sparse linear algebra via petsc
TRANSCRIPT
Exercises 7:Sparse Linear Algebra via PETSc
November 14, 2019
Adapted from PRACE PETSc tutorials: http://www.training.prace-ri.eu/tutorials/index.html
Reminders
• Homework 2 due today
• Homework 3 posted; due November 28, 2019 17:20
• No lecture or exercises next week :)
2
Today• PETSc tutorial
• Run code examples
• Adapted from tutorial by Václav Hapla, IT4Innovations & Department of Applied Mathematics, VSB - Technical University of Ostrava
3
Libraries implementing solvers for Sparse LA• Trilinos (Sandia National Lab), https://trilinos.github.io/
• PETSc, Portable, Extensible Toolkit forScientific Computation (Argonne National Lab), https://www.mcs.anl.gov/petsc/
• many numerical algorithms (LU, CG, SPMV, …) implemented, tested and ready to use!
4
PETSc
• building blocks (data structures and routines) for the scalable parallel solution of scientific applications
• allows thinking in terms of high-level objects (matrices) instead of low-level objects (raw arrays)
• coded primarily in C language but good FORTRAN support, can also be called from C++, Python and Java codes
• highly portable
• source code and mailing lists open to anybody
5
Role of PETSc
"Developing parallel, nontrivial PDE solvers that deliver high performance is still difficult and requires months (or even years) of concentrated effort. PETSc is a toolkit that can ease these difficulties and reduce the development time, but it is not a black-box PDE solver, nor a silver bullet."
Barry Smith (PETSc founder)
6
PETSc components
7
Parallelism in PETSc• PETSc is parallelized mostly using MPI
• MPI provides low-level routines to exchange data primitives between processes
• PETSc provides mid-level routines such as
• insert matrix element to arbitrary location
• parallel matrix-vector product
• you can call MPI directly if needed
• same code for sequential and parallel runs
• support for hybrid MPI + {pthreads, OpenMP, CUDA} parallelism
8
PETSc Interfaces• Dense lin. algebra: BLAS, LAPACK, Elemental
• Sparse direct lin. sys. solvers: MUMPS, SuperLU, SuperLU_Dist, PaStiX, UMFPACK, LUSOL
• Iterative solvers / multigrid / preconditioners: HYPRE, Trilinos ML, SPAI
• Graph partitioning: ParMetis, Scotch, Party, Chaco
• FFT: FFTW
• ODE: Sundials
• Data exchange: HDF5
• Mathematics packages: MATLAB, Mathematica
9
PETSc Interfaced By• TAO - Toolkit for Advanced Optimization
• SLEPc - Scalable Library for Eigenvalue Problems
• fluidity - a finite element/volume fluids code
• OpenFVM - finite volume based CFD solver
• OOFEM - object oriented finite element library
• libMesh - adaptive finite element library
• MOOSE - Multiphysics Object-Oriented Simulation Environment
• DEAL.II - sophisticated C++ based finite element simulation package
• PHMAL - The Parallel Hierarchical Adaptive MultiLevel Project
• Chaste - Cancer, Heart and Soft Tissue Environment
• PetIGA - A framework for high performance Isogeometric Analysis
10
Datatypes• PETSc provides its own primitive data types
PetscInt n = 20;
PetscScalar v = -3.5, w = 3.1e9;
PetscReal x = 2.55, y = 1e-9;
• It is better to use them instead of built-in C types
• better portability
• easy switching between real and complex scalars
• easy switching between 32-bit and 64-bit numbers
11
PETSc Communicators
• communicator = an opaque object of MPI_Comm type that defines process group and synchronization channel
• PETSc built-in communicators:
• PETSC_COMM_SELF – just this process – for serial objects –
• PETSC_COMM_WORLD – all processes – for parallel objects
• you can define your own communicators
12
Error Handling• PETSc is written in C
• C has no support for C++ exceptions
• instead of throwing exception, every routine returns integer error code (PetscErrorCode type)
• error code is catched by CHKERRQ macro:
PetscErrorCode ierr;
ierr = SomePetscRoutine();CHKERRQ(ierr);
13
Command-line options
• on the command-line
./program -myint 10 -myreal 1e3
• in the program:
PetscInt myint;
PetscReal myreal;
PetscOptionsGetInt(PETSC_NULL,PETSC_NULL,"myint",&myint,PETSC_NULL);
PetscOptionsGetReal(PETSC_NULL,PETSC_NULL,"-myreal",&myreal,PETSC_NULL);
// now myint=10, myreal=1e3
14
Examples for Today• Get ex7.tar from Moodle; move to cluster
• Unpack tar file
• On cluster, need to load libraries needed to use PETSc. Type:
module load fenics/2019.1.0-ompi-petsc3.11-gcc8.3
15
HelloWorld Example• Open petsc_1_ex.c
• To make, type make (or make ex1)
• To run: (e.g.)
srun -n 2 ./ex1 -myint 5
16
PETSc Inheritance• PETSc uses 3-level inheritance
• every object in PETSc is an instance of a class: Vec, Mat, PC, KSP, SNES, …
• all classes inherit from PetscObject
• functions called on objects (methods) are prefixed with a class name:
MatMult(Mat,…)
• a new object is created with a class-specific Create function (constructor):
Mat A;
MatCreate(comm, &A);
• every class is further refined to types specified with
SetType MatSetType(A, MATSEQAIJ);
17
PETSc Inheritance
18
Vectors in PETScVec v; PetscInt m=2, M=8;
VecType type=VECMPI;
MPI_Comm comm=PETSC_COMM_WORLD;
• create: VecCreate(comm, &v);
• layout: VecSetSizes(v,m,M);
• type: VecSetType(v,type);
• options: VecSetFromOptions(v);
• dealloc: VecDestroy(&v);
19
Vectors in PETScVec v; PetscInt m=2, M=8;
VecType type=VECMPI;
MPI_Comm comm=PETSC_COMM_WORLD;
• create: VecCreate(comm, &v);
• layout: VecSetSizes(v,m,M);
• type: VecSetType(v,type);
• options: VecSetFromOptions(v);
• dealloc: VecDestroy(&v);
20
//VECSEQ, VECSTANDARD
//PETSC_COMM_SELF
//VecSetSizes(v,M,M);
Sequential variant:
Vectors in PETScVec v; PetscInt m=2, M=8;
VecType type=VECMPI;
MPI_Comm comm=PETSC_COMM_WORLD;
• create: VecCreate(comm, &v);
• layout: VecSetSizes(v,m,M);
• type: VecSetType(v,type);
21
//VECSEQ, VECSTANDARD
//PETSC_COMM_SELF
//VecSetSizes(v,M,M);
Sequential variant:
• all in one: VecCreateMPI(comm,m,M,&v);
• options: VecSetFromOptions(v);
• dealloc: VecDestroy(&v);
//VecCreateSeq(comm,M,&v);
Parallel layout
• consider the vector v with local size m, global size M, distributed across 3 processes
• call VecSetSizes(v,m,M) to set the sizes
22
Parallel layout• set either m or M to PETSC_DECIDE, i.e. let PETSc use the standard
layout
• get this standard layout across comm: PetscSplitOwnership(comm,&m,&M)
23
Query Layout• local and global sizes: VecGetLocalSize(v,&m) and VecGetSize(v,&M)
• global indices of the first and last elements of the local portion: VecGetOwnershipRange(v,&lo,&hi)
24
Vector assembly
Vec x;
• Set all entries to constant value: VecSet(x, 1.0);
• Set all entries to 0: VecZeroEntries(x);
• Set an individual element (global indexing !):
PetscInt i = 10;
PetscReal v = 3.14;
VecSetValue(x, i, v, INSERT_VALUES);
// or
VecSetValues(x, 1, &i, &v, INSERT_VALUES);
25
Vector assembly
• Set multiple entries at once:
PetscInt ii[]={1, 2}; PetscReal vv[]={2.7, 3.1);
VecSetValues(x, 2, ii, vv, INSERT_VALUES);
• The last argument can be
• INSERT_VALUES replace original value (=)
• ADD_VALUES add the new values to the original values (+=)
26
Vector assembly
• VecSetValues is purely local with no inter-process communication
• values are just locally cached
• before using the vector, call assembly function pair to exchange values between processors:
VecAssemblyBegin(x);
VecAssemblyEnd(x);
27
Vector assembly
• VecSetValues is purely local with no inter-process communication
• values are just locally cached
• before using the vector, call assembly function pair to exchange values between processors:
VecAssemblyBegin(x);
//optional calculations not involving x
VecAssemblyEnd(x);
• computations can be done while MPI messages are in transition
• this allows overlapping communication and computation
28
Getting values
Vec x;
• get a copy of 2 local entries of x with global indices ix to an array y:
PetscInt ix[]={10, 20};
PetscScalar v[2];
VecGetValues(x, 2, ix, v)
• get the pointer to the whole local internal array:
PetscScalar *a; VecGetArray(x, &a);
// read and/or modify the array a
VecRestoreArray(x, &a);
29
Duplicate and copy
• Create another Vec with the same type & layout:
Vec v, w;
VecDuplicate(v, &w);
• Copy the entries from v to w:
VecCopy(v, w);
30
Example 2• Open petsc_2_ex.c
31
Matrices in PETScMat A; PetscInt m=2, n=3, M=8, N=12;
MatType type=MATMPIAIJ;
MPI_Comm comm=PETSC_COMM_WORLD;
• create: MatCreate(comm, &A);
• layout: MatSetSizes(A,m,n,M,N);
• type: MatSetType(A,type);
• prealloc: MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);
• all in one: MatCreateMPIAIJ(comm,m,n,M,N,5,PETSC_NULL,5,PETSC_NULL,&A);
• options: MatSetFromOptions(A);
• dealloc: MatDestroy(&A);
32
// MATSEQAIJ,MATAIJ
// PETSC_COMM_SELF
Sequential variants:
//MatSetSizes(v,M,N,M,N)
// MatSeqAIJSetPreallocation(A,5,PETSC_NULL);
Matrices in PETScMat A; PetscInt m=2, n=3, M=8, N=12;
MatType type=MATMPIAIJ;
MPI_Comm comm=PETSC_COMM_WORLD;
• create: MatCreate(comm, &A);
• layout: MatSetSizes(A,m,n,M,N);
• type: MatSetType(A,type);
• prealloc: MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);
• all in one: MatCreateMPIAIJ(comm,m,n,M,N,5,PETSC_NULL,5,PETSC_NULL,&A);
• options: MatSetFromOptions(A);
• dealloc: MatDestroy(&A);
33
// MATSEQAIJ,MATAIJ
// PETSC_COMM_SELF
Sequential variants:
//MatSetSizes(v,M,N,M,N)
// MatSeqAIJSetPreallocation(A,5,PETSC_NULL);
Basic Matrix Types
• MATAIJ, MATSEQAIJ, MATMPIAIJ
• basic sparse format, known as compressed row format, CRS, Yale
• MATAIJ means MATSEQAIJ with a single process communicator, and MATMPIAIJ otherwise.
• MATBAIJ, MATSEQBAIJ, MATMPIBAIJ
• extensions of the AIJ formats described above
• store matrix elements by fixed-sized dense blocks
• intended especially for use with multiclass PDEs
• multiple DOFs per mesh node
• MATDENSE, MATSEQDENSE, MATMPIDENSE
• dense matrices
34
Parallel Layout Compatibility
35
Matrix AssemblyMat A;
• Set all (allocated) entries to 0:
MatZeroEntries(A);
• Set an individual element (global indexing !):
PetscInt i = 1, j = 2; PetscReal v = 3.14;
MatSetValue(A, i, j, v, INSERT_VALUES);
// or
MatSetValues(A, 1, &i, 1, &i, &v, INSERT_VALUES);
36
Matrix Assembly• Set multiple entries at once:
PetscInt ii[2]={1, 2}, jj[2]={11, 12};
PetscReal vv[4]={1.3, 2.7, 3.1, 4.5};
MatSetValues(A, 2, ii, 2, jj, vv, INSERT_VALUES);
• The last argument can be
• INSERT_VALUES replace original value (=)
• ADD_VALUES add the new values to the original values (+=)
37
Matrix Assembly
• MatSetValues is purely local with no inter-process communication
• values are just locally cached
• before using the vector, call assembly function pair to exchange values between processors:
MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
MatAssemblyEnd( A,MAT_FINAL_ASSEMBLY);
38
Matrix Assembly
• MatSetValues is purely local with no inter-process communication
• values are just locally cached
• before using the vector, call assembly function pair to exchange values between processors:
MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
//optional calculations not involving A
MatAssemblyEnd( A,MAT_FINAL_ASSEMBLY);
• computations can be done while MPI messages are in transition
• this allows overlapping communication and computation
39
Matrix Assembly
• cannot mix inserting and adding values
• need to do assembly between
MatSetValues(A, ..., INSERT_VALUES);
MatAssemblyBegin(A,MAT_FLUSH_ASSEMBLY);
MatAssemblyEnd( A,MAT_FLUSH_ASSEMBLY);
MatSetValues(A, ..., ADD_VALUES);
• MAT_FINAL_ASSEMBLY – final assembly to make A ready to use
• MAT_FLUSH_ASSEMBLY – cheaper, suffices for INSERT_VALUES/ADD_VALUES interleaving
40
Getting valuesMat A;
• get a copy of the 3x2 local block of A with global row indices ii and global column indices jj to an array v:
PetscInt ii[]={11, 22, 33}; PetscInt jj[]={12, 24};
PetscScalar v[6];
MatGetValues(A,3,ii,2,jj,v);
• get the row of the matrix A:
PetscInt nnz, *nzi; PetscScalar *vals;
MatGetRow( A, i, &nnz, &nzi, &vals);
// read the array vals (dont‘t alter!)
MatRestoreRow(A, i, &nnz, &nzi, &vals);
41
Some other matrix operations
• matrix
MatNorm(A, NORM_INFINITY, &norm); //NORM_1,
NORM_FROBENIUS MatScale(A, 2.2);
• matrix-vector
MatMult(A,x,y);
MatMultAdd(A,x,y);
MatMultTranspose(A,x,y);
• matrix-matrix
MatMatMult( A,B,MAT_INITIAL_MATRIX,fill,&C);
MatMatTransposeMult(A,B,MAT_INITIAL_MATRIX,fill,&C);
MatTransposeMatMult(A,B,MAT_INITIAL_MATRIX,fill,&C);
42
Example 3• Open petsc_3_ex.c
43
Linear System Solvers
KSP ksp;
KSPType type = KSPCG;
MPI_Comm comm = PETSC_COMM_WORLD;
// initialize A,b,x
• create: KSPCreate(comm, &ksp);
• type: KSPSetType(ksp, type); //default = KSPGMRES
• options: KSPSetFromOptions(ksp);
• dealloc: KSPDestroy(&ksp);
44
Solving linear systemsMat A; Vec b,x;
// initialize A,b,x
KSPSetOperators(ksp, A, A);
KSPSolve(ksp,b,x);
• The first matrix defines the linear system
• The second matrix is used in constructing the preconditioner
45
Solve tolerances
KSPSetTolerances(ksp, rtol, atol, dtol, maxit);
• rtol = the relative convergence tolerance
stop if: residual < rtol * norm(RHS)
• atol = absolute conv. tol.
stop if: residual < atol
• dtol = divergence tolerance
stop if: residual > dtol * norm(RHS)
• maxit = maximum number of iterations
• all can be set to PETSC_DEFAULT
• cmd-line: -ksp_rtol 1e-6 -ksp_divtol 1e-15 -ksp_atol 10
-ksp_max_it 2000
46
Preconditioners
• many built-in and interfaced preconditioners
• ILU, block Jacobi, sparse approximate inverse ...
• can be composed
PC pc;
KSPGetPC(ksp, &pc);
PCSetType(pc, PCILU);
47
Direct solvers
• Direct solvers = special case of iterative solvers
• just one iteration with application of "perfect preconditioner", i.e. forward & backward substitution of a complete factor
KSPSetType(ksp, KSPPREONLY);
PCSetType(pc, PCLU); // or PCCHOLESKY
48
Example 4• Open petsc_4_ex.c
49
No deliverable today. :)
50