Numerical Software Libraries for the Scalable Solution of PDEs Introduction to PETSc Satish Balay, Bill Gropp, Lois C. McInnes, Barry Smith Mathematics

Download Numerical Software Libraries for the Scalable Solution of PDEs Introduction to PETSc Satish Balay, Bill Gropp, Lois C. McInnes, Barry Smith Mathematics

Post on 25-Dec-2015




1 download

Embed Size (px)


  • Slide 1
  • Numerical Software Libraries for the Scalable Solution of PDEs Introduction to PETSc Satish Balay, Bill Gropp, Lois C. McInnes, Barry Smith Mathematics and Computer Science Division Argonne National Laboratory full tutorial as presented at the SIAM Parallel Processing Conference, March 1999, available via
  • Slide 2
  • 2 Philosophy Writing hand-parallelized application codes from scratch is extremely difficult and time consuming. Scalable parallelizing compilers for real application codes are very far in the future. We can ease the development of parallel application codes by developing general-purpose, parallel numerical PDE libraries. Caveats Developing parallel, non-trivial PDE solvers that deliver high performance is still difficult, and requires months (or even years) of concentrated effort. PETSc is a toolkit that can reduce the development time, but it is not a black-box PDE solver nor a silver bullet.
  • Slide 3
  • 3 PETSc Concepts As illustrated via a complete nonlinear PDE example: flow in a driven cavity How to specify the mathematics of the problem Data objects vectors, matrices How to solve the problem Solvers linear, nonlinear, and timestepping (ODE) solvers Parallel computing complications Parallel data layout structured and unstructured meshes Debugging, profiling, extensibility
  • Slide 4
  • 4 Advanced user-defined customization of algorithms and data structures Developer advanced customizations, intended primarily for use by library developers Tutorial Approach Beginner basic functionality, intended for use by most programmers Intermediate selecting options, performance evaluation and tuning From the perspective of an application programmer: beginner 1 intermediate 2 advanced 3 developer 4
  • Slide 5
  • 5 Incremental Application Improvement Beginner Get the application up and walking Intermediate Experiment with options Determine opportunities for improvement Advanced Extend algorithms and/or data structures as needed Developer Consider interface and efficiency issues for integration and interoperability of multiple toolkits
  • Slide 6
  • 6 Component Interactions for Numerical PDEs Grids Steering Optimization PDE Discretization Algebraic Solvers Visualization Derivative Computation PETSc emphasis
  • Slide 7
  • 7 CFD on an Unstructured Grid 3D incompressible Euler Tetrahedral grid Up to 11 million unknowns Based on a legacy NASA code, FUN3d, developed by W. K. Anderson Fully implicit steady-state Primary PETSc tools: nonlinear solvers ( SNES ) and vector scatters ( VecScatter ) Results courtesy of collaborators Dinesh Kaushik and David Keyes, Old Dominion Univ., partially funded by NSF and ASCI level 2 grant
  • Slide 8
  • 8 11,047,096 Unknowns Scalability for Fixed-Size Problem Missing data point
  • Slide 9
  • 9 600 MHz T3E, 2.8M vertices Scalability Study
  • Slide 10
  • 10 Multiphase Flow Oil reservoir simulation: fully implicit, time-dependent First, and only, fully implicit, parallel compositional simulator 3D EOS model (8 DoF per cell) Structured Cartesian mesh Over 4 million cell blocks, 32 million DoF Primary PETSc tools: linear solvers ( SLES ) restarted GMRES with Block Jacobi preconditioning Point-block ILU(0) on each processor Over 10.6 gigaflops sustained performance on 128 nodes of an IBM SP. 90+ percent parallel efficiency Results courtesy of collaborators Peng Wang and Jason Abate, Univ. of Texas at Austin, partially funded by DOE ER FE/MICS
  • Slide 11
  • 11 179,000 unknowns (22,375 cell blocks) PC and SP Comparison PC: Fast ethernet (100 Megabits/second) network of 300 Mhz Pentium PCs with 66 Mhz bus SP: 128 node IBM SP with 160 MHz Power2super processors and 2 memory cards
  • Slide 12
  • 12 Speedup Comparison
  • Slide 13
  • 13 The PETSc Programming Model Goals Portable, runs everywhere Performance Scalable parallelism Approach Distributed memory, shared-nothing Requires only a compiler (single node or processor) Access to data on remote machines through MPI Can still exploit compiler discovered parallelism on each node (e.g., SMP) Hide within parallel objects the details of the communication User orchestrates communication at a higher abstract level than message passing
  • Slide 14
  • 14 Computation and Communication Kernels MPI, MPI-IO, BLAS, LAPACK Profiling Interface PETSc PDE Application Codes Object-Oriented Matrices, Vectors, Indices Grid Management Linear Solvers Preconditioners + Krylov Methods Nonlinear Solvers, Unconstrained Minimization ODE Integrators Visualization Interface PDE Application Codes
  • Slide 15
  • 15 Compressed Sparse Row (AIJ) Blocked Compressed Sparse Row (BAIJ) Block Diagonal (BDIAG) DenseOther IndicesBlock IndicesStrideOther Index Sets Vectors Line SearchTrust Region Newton-based Methods Other Nonlinear Solvers Additive Schwartz Block Jacobi ILUICC LU (Sequential only) Others Preconditioners Euler Backward Euler Pseudo Time Stepping Other Time Steppers GMRESCGCGSBi-CG-STABTFQMRRichardsonChebychevOther Krylov Subspace Methods Matrices PETSc Numerical Components
  • Slide 16
  • 16 Mesh Definitions: For Our Purposes Structured Determine neighbor relationships purely from logical I, J, K coordinates PETSc support via DA Semi-Structured No direct PETSc support; could work with tools such as SAMRAI, Overture, Kelp Unstructured Do not explicitly use logical I, J, K coordinates PETSc support via lower-level VecScatter utilities One is always free to manage the mesh data as if it is unstructured.
  • Slide 17
  • 17 A Freely Available and Supported Research Code Available via Usable in C, C++, and Fortran77/90 (with minor limitations in Fortran 77/90 due to their syntax) Users manual in Postscript and HTML formats Hyperlinked manual pages for all routines Many tutorial-style examples Support via email:
  • Slide 18
  • 18 True Portability Tightly coupled systems Cray T3D/T3E SGI/Origin IBM SP Convex Exemplar Loosely coupled systems, e.g., networks of workstations Sun OS, Solaris 2.5 IBM AIX DEC Alpha HP Linux Freebsd Windows NT/95
  • Slide 19
  • 19 PETSc History September 1991: begun by Bill and Barry May 1992: first release of PETSc 1.0 January 1993: joined by Lois September 1994: begun PETSc 2.0 June 1995: first release of version 2.0 October 1995: joined by Satish Now: over 4,000 downloads of version 2.0 Department of Energy, MICS Program National Science Foundation, Multidisciplinary Challenge Program, CISE PETSc Funding and Support
  • Slide 20
  • 20 A Complete Example: Driven Cavity Model Velocity-vorticity formulation, with flow driven by lid and/or bouyancy Finite difference discretization with 4 DoF per mesh point Primary PETSc tools: SNES, DA Example code: petsc/src/snes/examples/tutorials/ex8.c Solution Components velocity: u velocity: v temperature: T vorticity: z [u,v, z,T] Application code author: D. E. Keyes
  • Slide 21
  • 21 PETSc codeUser code Application Initialization Function Evaluation Jacobian Evaluation Post- Processing PCKSP PETSc Main Routine Linear Solvers (SLES) Nonlinear Solvers (SNES) Solve F(u) = 0 Driven Cavity Solution Approach A CD B
  • Slide 22
  • 22 Driven Cavity Program Part A: Data objects ( Vec and Mat ), Solvers ( SNES ) Part B: Parallel data layout ( DA ) Part C: Nonlinear function evaluation ghost point updates local function computation Part D: Jacobian evaluation default colored finite differencing approximation Experimentation
  • Slide 23
  • 23 Collectivity MPI communicators (MPI_Comm) specify collectivity (processors involved in a computation) All PETSc creation routines for solver and data objects are collective with respect to a communicator, e.g., VecCreate(MPI_Comm comm, int m, int M, Vec *x) Some operations are collective, while others are not, e.g., collective: VecNorm( ) not collective: VecGetLocalSize() If a sequence of collective routines is used, they must be called in the same order on each processor
  • Slide 24
  • 24 Vectors Fundamental objects for storing field solutions, right-hand sides, etc. VecCreateMPI(...,Vec *) MPI_Comm - processors that share the vector number of elements local to this processor total number of elements Each process locally owns a subvector of contiguously numbered global indices proc 3 proc 2 proc 0 proc 4 proc 1
  • Slide 25
  • 25 Sparse Matrices Fundamental objects for storing linear operators (e.g., Jacobians) MatCreateMPIAIJ(,Mat *) MPI_Comm - processors that share the matrix number of local rows and columns number of global rows and columns optional storage pre-allocation information Each process locally owns a submatrix of contiguously numbered global rows. proc 3 proc 2 proc 1 proc 4 proc 0
  • Slide 26
  • 26 Parallel Matrix and Vector Assembly Processes may generate any entries in vectors and matrices Entries need not be generated by the process on which they ultimately will be stored PETSc automatically moves data during assembly if necessary Vector example: VecSetValues(Vec,) number of entries to insert/add indices of entries values to add mode: [INSERT_VALUES,ADD_VALUES] VecAssemblyBegin(Vec) VecAssemblyEnd(Vec)
  • Slide 27
  • 27 Matrix Assembly MatSetValues(Mat,) number of rows to insert/add indices of rows an


View more >