automatic differentiation: introduction

16
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a subprogram that computes the derivatives of that function Derivatives used in optimization, nonlinear solvers, sensitivity analysis, uncertainty quantification Forward mode of AD is efficient for problems with few independent variables or Jacobian- vector products Reverse mode of AD is efficient for problems with few dependent variables or J T v products Efficiency of generated code depends on sophistication of underlying compiler analysis and combinatorial algorithms

Upload: deacon

Post on 21-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Automatic Differentiation: Introduction. Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a subprogram that computes the derivatives of that function - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Automatic Differentiation: Introduction

Automatic Differentiation: Introduction

• Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a subprogram that computes the derivatives of that function

• Derivatives used in optimization, nonlinear solvers, sensitivity analysis, uncertainty quantification

• Forward mode of AD is efficient for problems with few independent variables or Jacobian-vector products

• Reverse mode of AD is efficient for problems with few dependent variables or JTv products

• Efficiency of generated code depends on sophistication of underlying compiler analysis and combinatorial algorithms

Page 2: Automatic Differentiation: Introduction

AD: Current Capabilities

• Fortran 77: ADIFOR 2.0/3.0– Robust, mature tool with excellent language coverage– Excellent compiler analysis– Efficient forward mode (small number of independents)– Adequate reverse mode (small number of dependents)

• C/C++: ADIC 2.0– Semi-mature tool with full C language coverage– Sophisticated differentiation algorithms– Efficient forward mode

• Fortran 90: OpenAD/F– New tool with partial language coverage– Sophisticated differentiation algorithms– Accurate and novel compiler analysis– Innovative templating mechanism– Efficient forward and reverse modes

Page 3: Automatic Differentiation: Introduction

AD: Application Highlight

Runtime (m:s) Ratio Memory

Simulation alone 2:20 1.0 —

Basic adjoint 143:37 61.6 6.87M

Improved checkpointing 141:20 60.6 21.44M

Add compiler analysis 21:51 9.4 3.17M

Finite differences 23 days 14,400 —

Sensitivity of flow through Drake passage to bottom topography, using MIT shallow water model

Page 4: Automatic Differentiation: Introduction

AD: Future Capabilities

• C/C++: ADIC 2.x– Enhanced support for C++ (basic templating, operator

overloading)

• Fortran 90: OpenAD/F– Improved language coverage (user-defined types, pointers, etc.)

• Both tools– New differentiation algorithms– New checkpointing mechanisms– Advanced compiler analysis– Efficient forward and reverse modes– Integration with CSCAPES coloring algorithms– Ease of use through integration with PETSc and Zoltan toolkits

Page 5: Automatic Differentiation: Introduction

Load Balancing: Introduction

Goals:• Provide software and algorithms for load balancing

(partitioning) that can easily be used by parallel applications.

• Load balancing: distribute work evenly among processors while minimizing communication cost. Reduces parallel run time.

• Static load balancing (often called “partitioning”)– Application computation and communication patterns do not

change– Partition and distribute data once

• Dynamic load balancing– In dynamic or adaptive applications, computation and

communication change over time.– Load balancing should be invoked at certain intervals.– Try to reduce data migration (application data to move)

Page 6: Automatic Differentiation: Introduction

Load Balancing: Current Capabilities

• Zoltan: Software toolkit for parallel data management and load balancing– Available at http://www.cs.sandia.gov/Zoltan

• Collection of many load-balancing methods– Geometric: RCB, space filling curves– Graph and hypergraph partitioning

• Data-structure neutral interface– Call-back functions– Single, common interface for many methods

• Allows applications to “plug and play”

• Portable, parallel code (MPI)– Used in many DOE and Sandia applications – Can run on thousands of processors

Page 7: Automatic Differentiation: Introduction

• Large variety of applications, requirements, data structures.

Multiphysics simulations

x bA

=

Linear solvers & preconditioners

Adaptive mesh refinement

Crash simulations

Particle methods

Parallel electronics networks

12

VsSOURCE_VOLTAGE

12

RsR

12 Cm012

C

12

Rg02R

12

Rg01R

12 C01

C

12 C02

C12

L2

INDUCTOR

12L1

INDUCTOR

12R1

R

12R2

R1

2

RlR

12

Rg1R

12

Rg2R

12 C2

C1

2 C1C

12 Cm12

C

Cell Modeling

Load Balancing: Applications

Page 8: Automatic Differentiation: Introduction

Load Balancing: Future Capabilities

• Scalable hypergraph partitioning– Hypergraphs accurately model communication volume– We aim to improve scalability to thousands of processors

• 2d matrix partitioning– Reduce communication compared to standard 1d distribution

• Multiconstraint partitioning– Multi-physics simulation

• Complex objectives partitioning– E.g., simultaneously balance computation and memory

• Parallel sparse matrix ordering (nested dissection)

Page 9: Automatic Differentiation: Introduction

Reordering Transformations: Introduction

• Irregular memory access patterns make performance sensitive to data and iteration orders

• Run-time reordering transformations schedule data accesses and iterations to maximize performance

• Preliminary work on reordering heuristics shows that hypergraph models outperform graph models

• Full sparse tiling: new inspector/executor strategy that exploits inter-iteration locality

Page 10: Automatic Differentiation: Introduction

RT: Current Capabilities

• Open source package implementing several data and iteration reordering heuristics: Data_N_Comp_Reorder

• Data reordering heuristics– Breadth first search (graph-based)– Consecutive packing– Partitioning (graph-based)– Breadth first search (hypergraph-based)– Consecutive packing (hypergraph-based)– Partitioning (hypergraph-based)

• Iteration reordering heuristics– Breadth first search (hypergraph-based)– Lexicographical sorting and various approximations– Consecutive packing (hypergraph-based)– Partitioning (hypergraph-based)

• Full sparse tiling implementation for model problems

Page 11: Automatic Differentiation: Introduction

RT: Application Highlight

• Reordering for a mesh-quality improvement code (FeasNewt – T. Munson)

• Hypergraph-BFS data reordering coupled with Cpack iteration reordering offers best performance

• Reordering leads to performance within 90% of memory bandwidth limit for sparse matvec

0

500

1000

1500

2000

2500

Hessian Gradient Matmul

Peak

Memory BandwidthLimit

Original

Reordered

Page 12: Automatic Differentiation: Introduction

RT: Future Capabilities

• New hypergraph-based runtime reordering transformations

• Comparison between hypergraph-based and bipartite graph-based runtime reordering transformations

• Hypergraph partitioners for load balancing modified to work well for reordering transformations

• Hierarchical full sparse tiling for hierarchical parallel systems

Page 13: Automatic Differentiation: Introduction

Graph Coloring and Matching: Introduction

• Graph coloring deals with partitioning a set of binary-related objects into few groups of “independent” objects

• Sparsity exploitation in computation of Jacobians and Hessians leads to a variety of graph coloring problems. Sources of problem variations:– Unsymmetric vs symmetric matrix– Direct vs substitution method – Uni- vs bi-directional partitioning

1d partition 2d partition

Jacobian Distance-2 coloring

Star bicoloring

Direct

Hessian Star coloring NA Direct

Jacobian NA Acyclic bicoloring

Subst

Hessian Acyclic coloring NA Subst

• Matching deals with finding a “large” set of independent edges in a graph• Variant matching problems occur in load-balancing, process scheduling,

linear solvers, preconditioners, etc. • Orthogonal sources of variation in matching problems:

• Bipartite vs general graphs• Cardinality vs weighted problems

Page 14: Automatic Differentiation: Introduction

GCM: Current Capabilities

• Coloring Serial:– Developed novel (greedy) algorithms for distance-1, distance-2,

star and acyclic coloring problems. A package implementing these algorithms and corresponding variant ordering routines available.

Parallel: – Developed a scheme for parallelizing greedy coloring algorithms

on distributed-memory computers. MPI implementations of distance-1 and distance-2 coloring made available via Zoltan.

• Matching– Algorithms that compute optimal solutions for matching problems

are polynomial in time, but slow and difficult to parallelize.– High quality approximate solutions can be computed in (near)

linear time. Approximation techniques make parallelization easier. – Developed fast approximation algorithms for several matching

problems.– Efficient implementations of exact matching algorithms available.

Page 15: Automatic Differentiation: Introduction

GCM: Application Highlights

• Coloring– Automatic differentiation (sparse Jacobians and Hessians)– Parallel computation (discovery of concurrency, data migration)– Frequency allocation– Register allocation in compilers, etc

• Matching – Numerical preprocessing in sparse linear systems:

• permute a matrix such that its diagonal or block diagonal are heavy.

– Block triangular decomposition in sparse linear systems:• decompose a system of equations into smaller sets of systems.

– Graph partitioning: • guide the coarsening phase of multilevel graph partitioning methods.

Page 16: Automatic Differentiation: Introduction

GCM: Future Capabilities

• Develop and implement star and acyclic bicoloring algorithms for Jacobian computation

• Develop parallel algorithms that scale to thousands of processors for the various coloring problems (distance-1, distance-2, star, acyclic)

• Integrate coloring software with automatic differentiation tools

• Develop petascale parallel matching algorithms based on approximation techniques