- acts - a reliable software infrastructure for scientific computing osni marques lawrence berkeley...
Post on 19-Dec-2015
214 Views
Preview:
TRANSCRIPT
- ACTS -A Reliable Software Infrastructure
for Scientific Computing
Osni MarquesLawrence Berkeley National Laboratory (LBNL)
oamarques@lbl.gov
UC Berkeley - CS267
04/20/2006UC Berkeley - CS267 2
Outline
• Keeping the pace with the software and hardware• Hardware evolution
• Performance tuning
• Software selection
• What is missing?
• The DOE ACTS Collection Project• Goals
• Current features
• Lessons learned
04/20/2006UC Berkeley - CS267 3
IBM BlueGene/L
A computation that took 1 full year to complete in 1980 could be done in ~ 10 hours in 1992, in ~ 16 minutes in 1997,
in ~ 27 seconds in 2001 and in ~ 1.7 seconds today!
04/20/2006UC Berkeley - CS267 4
Challenges in the Development of Scientific Codes
• Research in computational sciences is fundamentally interdisciplinary
• The development of complex simulation codes on high-end computers is not a trivial task
• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements
• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity
• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications
• Research in computational sciences is fundamentally interdisciplinary
• The development of complex simulation codes on high-end computers is not a trivial task
• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements
• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity
• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications
• Libraries written in different languages• Discussions about standardizing
interfaces are often sidetracked into implementation issues
• Difficulties managing multiple libraries developed by third-parties
• Need to use more than one language in one application
• The code is long-lived and different pieces evolve at different rates
• Swapping competing implementations of the same idea and testing without modifying the code
• Need to compose an application with some other(s) that were not originally designed to be combined
• Libraries written in different languages• Discussions about standardizing
interfaces are often sidetracked into implementation issues
• Difficulties managing multiple libraries developed by third-parties
• Need to use more than one language in one application
• The code is long-lived and different pieces evolve at different rates
• Swapping competing implementations of the same idea and testing without modifying the code
• Need to compose an application with some other(s) that were not originally designed to be combined
04/20/2006UC Berkeley - CS267 5
Automatic Tuning
• For each kernel1. Identify and generate a space of
algorithms
2. Search for the fastest one, by running them
• What is a space of algorithms?• Depending on kernel and input, may
vary
• instruction mix and order
• memory access patterns
• data structures
• mathematical formulation
• When do we search?• Once per kernel and architecture
• At compile time
• At run time
• All of the above
• PHiPAC: www.icsi.berkeley.edu/~bilmes/phipac
• ATLAS: www.netlib.org/atlas
• XBLAS: www.nersc.gov/~xiaoye/XBLAS
• Sparsity: www.cs.berkeley.edu/~yelick/sparsity
• FFTs and Signal Processing• FFTW: www.fftw.org
• Won 1999 Wilkinson Prize for Numerical Software
• SPIRAL: www.ece.cmu.edu/~spiral
• Extensions to other transforms, DSPs
• UHFFT
• Extensions to higher dimension, parallelism
04/20/2006UC Berkeley - CS267 6
What About Software Selection?
• Use a direct solver (A=LU) if• Time and storage space acceptable
• Iterative methods don’t converge
• Many b’s for same A
• Criteria for choosing a direct solver• Symmetric positive definite (SPD)
• Symmetric
• Symmetric-pattern
• Unsymmetric
• Row/column ordering schemes available• MMD, AMD, ND, graph partitioning
• Hardware
bAx :Example
Build a preconditioning matrix K such that Kx=b is much easier to solve than Ax=b and K is somehow “close” to A (incomplete LU decompositions, sparse approximate inverses, polynomial preconditioners, preconditioning by blocks or domains, element-by-element, etc). See Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods.
04/20/2006UC Berkeley - CS267 7
Components: simple example
)2
()(1
1
n
j
jjb
a
xxf
n
abdxxf
a bx
)(xf
Numerical integration: midpoint
N
in
b
a
xfNab
dxxf1
)(11
)(
Numerical integration: Monte Carlo
a b
)(xf
x
xxf 2)(2
2
1 )( xxf
23 1
4)(
xxf
04/20/2006UC Berkeley - CS267 8
The DOE ACTS Collection
Goals Collection of tools for developing parallel applications
Extended support for experimental software
Make ACTS tools available on DOE computers
Provide technical support (acts-support@nersc.gov)
Maintain ACTS information center (http://acts.nersc.gov)
Coordinate efforts with other supercomputing centers
Enable large scale scientific applications
Educate and train
• High Performance Tools• portable• library calls• robust algorithms• help code optimization
• More code development in less time• More simulation in less computer time
• High• Intermediate level• Tool expertise• Conduct tutorials
• Intermediate• Basic level• Higher level of support to users of the tool
• Basic• Help with installation • Basic knowledge of the tools• Compilation of user’s reports
Levels of Supporthttp://acts.nersc.govhttp://acts.nersc.gov
04/20/2006UC Berkeley - CS267 9
Current ACTS Tools and their Functionalities
Category Tool Functionalities
Numerical
Aztec Algorithms for the iterative solution of large sparse linear systems.
HypreAlgorithms for the iterative solution of large sparse linear systems, intuitive grid-centric interfaces, and dynamic configuration of parameters.
PETSc Tools for the solution of PDEs that require solving large-scale, sparse linear and nonlinear systems of equations.
OPT++ Object-oriented nonlinear optimization package.
SUNDIALSSolvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations.
ScaLAPACK Library of high performance dense linear algebra routines for distributed-memory message-passing.
SuperLU General-purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations.
TAOLarge-scale optimization software, including nonlinear least squares, unconstrained minimization, bound constrained optimization, and general nonlinear optimization.
Code DevelopmentGlobal Arrays
Library for writing parallel programs that use large arrays distributed across processing nodes and that offers a shared-memory view of distributed arrays.
Overture Object-Oriented tools for solving computational fluid dynamics and combustion problems in complex geometries.
Code ExecutionCUMULVS
Framework that enables programmers to incorporate fault-tolerance, interactive visualization and computational steering into existing parallel programs.
TAU Set of tools for analyzing the performance of C, C++, Fortran and Java programs.
Library Development ATLAS Tools for the automatic generation of optimized numerical software for modern computer architectures and compilers.
04/20/2006UC Berkeley - CS267 10
Use of ACTS Tools
Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field
(Louie, Yoon, Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK.
Advanced Computational Research in Fusion (SciDAC Project, PI Mitch Pindzola). Point of contact: Dario Mitnik (Dept. of Physics, Rollins College). Mitnik attended the workshop on the ACTS Collection in September 2000.
Since then he has been actively using some of the ACTS tools, in particular ScaLAPACK, for which he has
provided insightful feedback. Dario is currently working on the development, testing and support of new
scientific simulation codes related to the study of atomic dynamics using time-dependent close coupling lattice and time-independent methods. He reports that this work could not be carried out in sequential machines
and that ScaLAPACK is fundamental for the parallelization of these codes.
The international BOOMERanG collaboration announced results of the most detailed measurement of the cosmic microwave background radiation (CMB), which strongly indicated that the universe is flat (Apr. 27, 2000). Likelihood methods implemented in the MADCAP software package, using routines from ScaLAPACK, were used to examine the large dataset generated by BOOMERanG.
Performance of four science-of-scale applications that use ScaLAPACK
functionalities on an IBM SP
04/20/2006UC Berkeley - CS267 11
Use of ACTS Tools
3D overlapping grid for a submarine produced with Overture’s module ogen.
Model of a "hard" sphere included in a "soft" material, 26 million d.o.f.
Unstructured meshes in solid mechanics using Prometheus and
PETSc (Adams and Demmel).
Multiphase flow using PETSc, 4 million cell blocks, 32 million DOF, over 10.6
Gflops on an IBM SP (128 nodes), entire simulation runs in less than 30 minutes
(Pope, Gropp, Morgan, Seperhrnoori, Smith and Wheeler).
3D incompressible Euler,tetrahedral grid, up to 11 million unknowns, based on a
legacy NASA code, FUN3d (W. K. Anderson), fully implicit steady-state, parallelized with PETSc (courtesy of
Kaushik and Keyes).
Electronic structure optimization performed with TAO, (UO2)3(CO3)6
(courtesy of deJong).
Molecular dynamics and thermal flow simulation using codes based on Global
Arrays. GA have been employed in large simulation codes such as NWChem, GAMESS-UK, Columbus, Molpro, Molcas, MWPhys/Grid,
etc.
04/20/2006UC Berkeley - CS267 12
Use of ACTS Tools
Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon,
Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK.
OPT++ is used in protein energy minimization problems (shown
here is protein T162 from CASP5, courtesy of Meza , Oliva et al.)
Omega3P is a parallel distributed-memory code intended for the modeling and analysis of accelerator cavities, which requires the solution
of generalized eigenvalue problems. A parallel exact shift-invert eigensolver based on PARPACK and SuperLU has allowed for the solution
of a problem of order 7.5 million with 304 million nonzeros. Finding 10 eigenvalues requires about 2.5 hours on 24 processors of an IBM SP.
Two ScaLAPACK routines, PZGETRF and PZGETRS, are used for solution of linear systems in the spectral algorithms based AORSA code (Batchelor et al.), which is intended for the study of electromagnetic wave-plasma interactions. The code reaches 68% of peak performance on 1936 processors of an IBM SP.
04/20/2006UC Berkeley - CS267 13
ScaLAPACK: software structure
ScaLAPACK
BLAS
LAPACK BLACS
MPI/PVM/...
PBLASGlobal
Local
platform specific
Clarity,modularity, performance and portability.
Atlas can be used here for automatic tuning.
Clarity,modularity, performance and portability.
Atlas can be used here for automatic tuning.
Linear systems, least squares, singular value decomposition,
eigenvalues.
Linear systems, least squares, singular value decomposition,
eigenvalues.
Communication routines targeting
linear algebra operations.
Communication routines targeting
linear algebra operations.
Parallel BLAS.
Parallel BLAS.
Communication layer (message
passing).
Communication layer (message
passing).
http://acts.nersc.gov/scalapack
Version 1.7 released in August 2001; recent NSF funding for further
development.
Version 1.7 released in August 2001; recent NSF funding for further
development.
04/20/2006UC Berkeley - CS267 14
• Similar to the BLAS in portability, functionality and naming:• Level 1: vector-vector operations
• Level 2: matrix-vector operations
• Level 3: matrix-matrix operations
CALL DGEXXX( M, N, A( IA, JA ), LDA, ... )
CALL PDGEXXX( M, N, A, IA, JA, DESCA, ... )
• Built atop the BLAS and BLACS
• Provide global view of
the matrix operands
PBLAS
BLAS
PBLAS
(Parallel Basic Linear Algebra Subroutines)
array descriptor (see next slides)
array descriptor (see next slides)
A(IA:IA+M-1,JA:JA+N-1)
JA
IA
N_
N
MM_
04/20/2006UC Berkeley - CS267 15
BLACS
• A design tool, they are a conceptual aid in design and coding.
• Associate widely recognized mnemonic names with communication operations. This improves:• program readability
• self-documenting quality of the code.
• Promote efficiency by identifying frequently occurring operations of linear algebra which can be optimized on various computers.
(Basic Linear Algebra Communication Subroutines)
04/20/2006UC Berkeley - CS267 16
BLACS: basics
• Processes are embedded in a two-dimensional grid.
• An operation which involves more than one sender and one receiver is called a scoped operation.
10 32
0
0
1 2 3
54 76
98 1110
1 2
Scope Meaning
Row All processes in a process row participate.
Column All processes in a process column participate.
All All processes in the process grid participate.
Example: a 3x4 grid Example: a 3x4 grid
04/20/2006UC Berkeley - CS267 17
ScaLAPACK: data layouts
• 1D block and cyclic column distributions
• 1D block-cycle column and 2D block-cyclic distribution• 2D block-cyclic used in ScaLAPACK for dense matrices
04/20/2006UC Berkeley - CS267 18
ScaLAPACK: 2D Block-Cyclic Distribution
a11 a12 a15 a13 a14
a21 a22 a25 a23 a24
a51 a52 a55 a53 a54
a31 a32 a35 a33 a34
a41 a42 a45 a43 a44
5x5 matrix partitioned in 2x2 blocks 2x2 process grid point of view
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a34 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
0 1
2 3
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a34 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
04/20/2006UC Berkeley - CS267 19
2D Block-Cyclic Distribution
http://acts.nersc.gov/scalapack/hands-on/datadist.html
04/20/2006UC Berkeley - CS267 20
ScaLAPACK: array descriptors
• Each global data object is assigned an array descriptor
• The array descriptor:• Contains information required to establish mapping between a global array
entry and its corresponding process and memory location (uses concept of BLACS context).
• Is differentiated by the DTYPE_ (first entry) in the descriptor.
• Provides a flexible framework to easily specify additional data distributions or matrix types.
• User must distribute all global arrays prior to the invocation of a ScaLAPACK routine, for example:• Each process generates its own submatrix.
• One processor reads the matrix from a file and send pieces to other processors (may require message-passing for this).
SUBROUTINE PSGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO )
04/20/2006UC Berkeley - CS267 21
Array Descriptor for Dense Matrices
DESC_() Symbolic Name Scope Definition
1 2 3 4 5 6 7 8 9
DTYPE_A CTXT_A M_A N_A MB_A NB_A RSRC_A CSRC_A LLD_A
(global) (global) (global) (global) (global) (global) (global) (global) (local)
Descriptor type DTYPE_A=1 for dense matrices. BLACS context handle.
Number of rows in global array A. Number of columns in global array A.
Blocking factor used to distribute the rows of array A.
Blocking factor used to distribute the columns of array A.
Process row over which the first row of the array A is distributed.
Process column over which the first column of the array A is distributed.
Leading dimension of the local array.
04/20/2006UC Berkeley - CS267 22
ScaLAPACK: Functionality
xxxx
xLeast Squares
GQR
GRQ
xxxx
xxxx
xx
xxx
Symmetric
General
Generalized BSPD
SVD
SolutionReductionExpert Driver
Simple Driver
Ax = x or Ax = Bx
xxxxxx
xxx
xxxx
General
General Banded
General Tridiagonal
xxxxxx
xxx
xxxx
SPD
SPD Banded
SPD Tridiagonal
xxxxTriangular
Iterative Refinement
Conditioning Estimator
InversionSolveFactorExpert Driver
Simple Driver
Ax = b
On line tutorial: http://acts.nersc.gov/scalapack/hands-on/main.html
04/20/2006UC Berkeley - CS267 24
Global Arrays (GA) Wrappers
• Simpler than message-passing for many applications
• Complete environment for parallel code development
• Data locality control similar to distributed memory/message passing model
• Compatible with MPI
• Scalable
Distributed Data: data is explicitly associated with each processor, accessing data requires specifying the location of the data on the processor and the processor itself.Shared Memory: data is an a globally accessible address space, any processor can access data by specifying its location using a global index.GA: distributed dense arrays that can be accessed through a shared memory-like style.
http://www.emsl.pnl.gov/docs/global/ga.html
04/20/2006UC Berkeley - CS267 25
TAU: Tuning and Performance Analysis
• Multi-level performance instrumentation• Multi-language automatic source instrumentation
• Flexible and configurable performance measurement
• Widely-ported parallel performance profiling system• Computer system architectures and operating systems
• Different programming languages and compilers
• Support for multiple parallel programming paradigms• Multi-threading, message passing, mixed-mode, hybrid
• Support for performance mapping
• Support for object-oriented and generic programming
• Integration in complex software systems and applications
04/20/2006UC Berkeley - CS267 26
Definitions – Profiling
• Profiling• Recording of summary information during execution
• inclusive, exclusive time, # calls, hardware statistics, …• Reflects performance behavior of program entities
• functions, loops, basic blocks• user-defined “semantic” entities
• Very good for low-cost performance assessment• Helps to expose performance bottlenecks and hotspots• Implemented through
• sampling: periodic OS interrupts or hardware counter traps• instrumentation: direct insertion of measurement code
04/20/2006UC Berkeley - CS267 27
Definitions – Tracing
• Tracing• Recording of information about significant points (events) during
program execution
• entering/exiting code region (function, loop, block, …)
• thread/process interactions (e.g., send/receive message)
• Save information in event record
• timestamp
• CPU identifier, thread identifier
• Event type and event-specific information
• Event trace is a time-sequenced stream of event records
• Can be used to reconstruct dynamic program behavior
• Typically requires code instrumentation
04/20/2006UC Berkeley - CS267 28
TAU: Example 1 (1/4)
http://acts.nersc.gov/tau/programs/psgesv
04/20/2006UC Berkeley - CS267 29
TAU: Example 1 (2/4)
…
…
04/20/2006UC Berkeley - CS267 30
TAU: Example 1 (3/4)
PROGRAM PSGESVDRIVER!! Example Program solving Ax=b via ScaLAPACK routine PSGESV!! .. Parameters ..
!**** a bunch of things omitted for the sake of space **** ! .. Executable Statements ..!! INITIALIZE THE PROCESS GRID! integer profiler(2) save profiler
call TAU_PROFILE_INIT() call TAU_PROFILE_TIMER(profiler,'PSGESVDRIVER') call TAU_PROFILE_START(profiler) CALL SL_INIT( ICTXT, NPROW, NPCOL ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
!**** a bunch of things omitted for the sake of space ****
CALL PSGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, & INFO )
!**** a bunch of things omitted for the sake of space ****
call TAU_PROFILE_STOP(profiler) STOP END
psgesvdriver.int.f90
NB. ScaLAPACK routines have not been instrumented and therefore are not shown in the charts.
04/20/2006UC Berkeley - CS267 31
TAU: Example 2 (1/2)
http://acts.nersc.gov/tau/programs/pdgssvx
tau-multiplecounters-mpi-papi-pdt
04/20/2006UC Berkeley - CS267 32
TAU: Example 2 (2/2)
PAPI provides access to hardware performance counters (see http://icl.cs.utk.edu/papi for details and contact acts-support@nersc.gov for the corresponding TAU events). In this example we are just measuring FLOPS.
PARAPROF
04/20/2006UC Berkeley - CS267 33
Who Benefits from these tools?
... More Applications …
http://acts.nersc.gov/AppMathttp://acts.nersc.gov/AppMat
Enabling sciencesand discoveries…
withhigh performance and scalability...
Tool descriptions, installation
details, examples, etc
Agenda, accomplishmen
ts, conferences, releases, etc
Goals and other relevant
information
Points of
contact
Search engine
• High Performance Tools• portable• library calls• robust algorithms• help code optimization
• Scientific Computing Centers• Reduce user’s code development
time that sums up in more production runs and faster and effective scientific research results
• Overall better system utilization• Facilitate the accumulation and
distribution of high performance computing expertise
• Provide better scientific parameters for procurement and characterization of specific user needs
• High Performance Tools• portable• library calls• robust algorithms• help code optimization
• Scientific Computing Centers• Reduce user’s code development
time that sums up in more production runs and faster and effective scientific research results
• Overall better system utilization• Facilitate the accumulation and
distribution of high performance computing expertise
• Provide better scientific parameters for procurement and characterization of specific user needs
VECPAR 2006 ACTS Workshop 2006
http://acts.nersc.govhttp://acts.nersc.gov
04/20/2006UC Berkeley - CS267 35
Journals Featuring ACTS Tools
• An Overview of the Advanced CompuTational Software (ACTS) Collection, by T. Drummond and O. Marques
• SUNDIALS: Suite of Nonlinear and Differential/Algebraic Equation Solvers, by A. Hindmarsh, P. Brown, K. Grant, S. Lee, R. Serban, D. Shumaker and C. Woodward.
• An Overview of SuperLU: Algorithms, Implementation, and User Interface, by X. Li. • SLEPc: A Scalable and Flexible Toolkit for the Solution of Eigenvalue Problems, by V.
Hernandez, J. Roman and V. Vidal. • An Overview of the Trilinos Project, by M. Heroux, R. Bartlett, V. Howle, R. Hoekstra, J.
Hu, T. Kolda, R. Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thornquist, R. Tuminaro, J. Willenbring, A. Williams and K. Stanley.
• Pursuing Scalability for hypre's Conceptual Interfaces, by R. Falgout, J. Jones and U. Yang.
• A Component Architecture for High-Performance Scientific Computing, by D. Bernholdt, B. Allan, R. Armstrong, F. Bertrand, K. Chiu, T. Dahlgreen, K. Damevski, W. Elwasif, T. Epperly, M. Govindaraju, D. Saltz, J. Kohl, M. Krishnan, G. Kumfert, J. Larson, S. Lefantzi, M. Lewis, A. Malony, L. McInnes, J. Nieplocha, B. Norris, S. Parker, J. Ray, S. Shende, T. Windus and S. Zhou.
• CUMULVS: Interacting with High-Performance Scientific Simulations, for Visualization, Steering and Fault Tolerance, by J. Kohl, T. Wilde and D. Bernholdt.
• Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit, by J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease and E. Aprà.
• The TAU Parallel Performance System, by S. Shende and A. Malony. • High Performance Remote Memory Access Communication: The ARMCI Approach, by
J. Nieplocha, V. Tipparaju, M. Krishnan and D. Panda.
Spring 2006 Issue
September 2005 Issue
04/20/2006UC Berkeley - CS267 36
ACTS Numerical Tools: Functionality
Computational Problem Methodology Algorithms Library
Systems of Linear Equations
Direct Methods
LU Factorization
ScaLAPACK(dense)
SuperLU (sparse)
Cholesky Factorization
ScaLAPACK
LDLT (Tridiagonal matrices)
ScaLAPACK
QR Factorization
ScaLAPACK
QR with column pivoting
ScaLAPACK
LQ factorization ScaLAPACK
04/20/2006UC Berkeley - CS267 37
ACTS Numerical Tools: Functionality
Computational Problem
Methodology Algorithms Library
Systems of Linear Equations
(cont..)
Iterative Methods
Conjugate Gradient
AztecOO (Trilinos)
PETSc
GMRES AztecOO
PETSc
Hypre
CG Squared AztecOO
PETSc
Bi-CG Stab AztecOO
PETSc
Quasi-Minimal Residual (QMR)
AztecOO
Transpose Free QMR
AztecOO
PETSc
04/20/2006UC Berkeley - CS267 38
ACTS Numerical Tools: Functionality
Computational Problem
Methodology Algorithms Library
Systems of Linear Equations
(cont..)
Iterative Methods
(cont..)
SYMMLQ PETSc
Precondition CG AztecOOPETScHypre
Richardson PETSc
Block Jacobi Preconditioner
AztecOOPETScHypre
Point Jocobi Preconditioner
AztecOO
Least Squares Polynomials
PETSc
04/20/2006UC Berkeley - CS267 39
ACTS Numerical Tools: FunctionalityComputational
ProblemMethodology Algorithms Library
Systems of Linear Equations
(cont..)
Iterative Methods
(cont..)
SOR PreconditioningPETSc
Overlapping Additive Schwartz
PETSc
Approximate InverseHypre
Sparse LU preconditioner
AztecOOPETScHypre
Incomplete LU (ILU) preconditioner
AztecOO
Least Squares Polynomials
PETSc
MultiGrid (MG)
Methods
MG PreconditionerPETScHypre
Algebraic MGHypre
Semi-coarsening Hypre
04/20/2006UC Berkeley - CS267 40
Computational Problem
Methodology Algorithm Library
Linear Least Squares Problems
Least Squares ScaLAPACK
Minimum Norm Solution
ScaLAPACK
Minimum Norm Least Squares
ScaLAPACK
Standard Eigenvalue Problem
Symmetric Eigenvalue Problem For A=AH or A=AT
ScaLAPACK (dense)
SLEPc (sparse)
Singular Value Problem
Singular Value Decomposition
ScaLAPACK (dense)
SLEPc (sparse)
Generalized Symmetric Definite Eigenproblem
Eigenproblem ScaLAPACK (dense)
SLEPc (sparse)
ACTS Numerical Tools: Functionality
minx || b Ax ||2
minx || x ||2
minx || x ||2
minx || b Ax ||2
Az z
A UVT
A UVH
Az Bz
ABz z
BAz z
04/20/2006UC Berkeley - CS267 41
ACTS Numerical Tools: Functionality
Computational Problem
Methodology Algorithm Library
Non-Linear Equations
Newton Based
Line Search PETSc
Trust Regions PETSc
Pseudo-Transient Continuation
PETSc
Matrix Free PETSc
04/20/2006UC Berkeley - CS267 42
ACTS Numerical Tools: FunctionalityComputational
ProblemMethodology Algorithm Library
Non-Linear Optimization
Newton Based
Newton OPT++
TAO
Finite-Difference Newton
OPT++
TAO
Quasi-Newton OPT++
TAO
Non-linear Interior Point
OPT++
TAO
CG
Standard Non-linear CG
OPT++
TAO
Limited Memory BFGS
OPT++
Gradient Projections
TAO
Direct SearchNo derivate information
OPT++
04/20/2006UC Berkeley - CS267 43
ACTS Numerical Tools: FunctionalityComputational
ProblemMethodology Algorithm Library
Non-Linear Optimization
Newton Based
Newton OPT++
TAO
Finite-Difference Newton
OPT++
TAO
Quasi-Newton OPT++
TAO
Non-linear Interior Point
OPT++
TAO
CG
Standard Non-linear CG
OPT++
TAO
Limited Memory BFGS
OPT++
Gradient Projections
TAO
Direct SearchNo derivate information
OPT++
04/20/2006UC Berkeley - CS267 44
ACTS Numerical Tools: FunctionalityComputational
ProblemMethodology Algorithm Library
Non-Linear Optimization (cont..)
Semismoothing
Feasible Semismooth
TAO
Unfeasible semismooth
TAO
Ordinary Differential Equations
IntegrationAdam-Moulton
(Variable coefficient forms)
CVODE (SUNDIALS)
CVODES
Backward Differential Formula
Direct and Iterative Solvers
CVODE
CVODES
Nonlinear Algebraic Equations
Inexact NewtonLine Search KINSOL (SUNDIALS)
Differential Algebraic Equations
Backward Differential Formula
Direct and Iterative Solvers
IDA (SUNDIALS)
04/20/2006UC Berkeley - CS267 45
ACTS Tools: Functionality
Computational Problem
Support Techniques Library
Writing Parallel Programs
Distributed Arrays
Shared-Memory Global Arrays
Distributed Memory
CUMULVS (viz)Globus (Grid)
Grid Generation OVERTURE
Structured Meshes CHOMBO (AMR)HypreOVERTUREPETSc
Semi-Structured Meshes
CHOMBO (AMR)HypreOVERTURE
Distributed Computing
GRID Globus
Remote Steering CUMULVS
Coupling PAWS
04/20/2006UC Berkeley - CS267 46
ACTS Tools: FunctionalityComputational
ProblemSupport Technique Library
Writing Parallel Programs (cont.)
Distributed Computing
Check-point/restart CUMULVS
ProfilingAlgorithmic Performance
Automatic instrumentation
PETSc
User Instrumentation
PETSc
Execution Performance
Automatic Instrumentation
TAU
User Instrumentation
TAU
Code Optimization
Library Installation
Linear Algebra Tuning
ATLAS
InteroperabilityCode Generation
Language BABEL
CHASM
Components CCA
top related