a algorithm xxx: adigator, a toolbox for the algorithmic ... · algorithm: adigator, a toolbox for...

25
A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB Using Source Transformation via Operator Overloading Matthew J. Weinstein 1 Anil V. Rao 2 University of Florida Gainesville, FL 32611 A toolbox called ADiGator is described for algorithmically differentiating mathematical functions in MAT- LAB. ADiGator performs source transformation via operator overloading using forward mode algorithmic differentiation and produces a file that can be evaluated to obtain the derivative of the original function at a numeric value of the input. A convenient by product of the file generation is the sparsity pattern of the derivative function. Moreover, because both the input and output to the algorithm are source codes, the algorithm may be applied recursively to generate derivatives of any order. A key component of the algorithm is its ability to statically exploit derivative sparsity at the MATLAB operation level in order to improve run-time performance. The algorithm is applied to four different classes of example problems and is shown to produce run-time efficient derivative code. Due to the static nature of the approach, the algorithm is well suited and intended for use with problems requiring many repeated derivative computations. Categories and Subject Descriptors: G.1.4 [Numerical Analysis]: Automatic Differentiation General Terms: Automatic Differentiation, Numerical Methods, MATLAB Additional Key Words and Phrases: algorithmic differentiation, scientific computation, applied mathemat- ics, chain rule, forward mode, overloading, source transformation ACM Reference Format: Weinstein, M. J. and Rao, A. V. 2015. Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB Using Source Transformation via Operator Overloading. ACM Trans. Math. Soft. V, N, Article A (January YYYY), 25 pages. DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 The authors gratefully acknowledge support for this research from the U.S. Office of Naval Research un- der Grants N00014-11-1-0068 and N00014-15-1-2048, from the U.S. Defense Advanced Research Projects Agency under Contract HR0011-12-C-0011, from the U.S. National Science Foundation under grants CBET- 1404767, DMS-1522629, and CMMI-1563225, and from the U.S. Air Force Research Laboratory under con- tract FA8651-08-D-0108/0054. Disclaimer: The views, opinions, and findings contained in this article are those of the authors and should not be interpreted as representing the official views or policies of the De- partment of Defense or the U.S. Government. Distribution A. Approved for Public Release; Distribution Unlimited. Author’s addresses: M. J. Weinstein and A. V. Rao, Department of Mechanical and Aerospace Engineering, P.O. Box 116250, University of Florida, Gainesville, FL 32611-6250; e-mail: {mweinstein,anilvrao}@ufl.edu. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is per- mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. © YYYY ACM 1539-9087/YYYY/01-ARTA $15.00 DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Upload: dinhque

Post on 30-Jun-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A

Algorithm xxx: ADiGator, a Toolbox for the Algorithmic Differentiationof Mathematical Functions in MATLAB Using Source Transformationvia Operator Overloading

Matthew J. Weinstein1

Anil V. Rao2

University of FloridaGainesville, FL 32611

A toolbox called ADiGator is described for algorithmically differentiating mathematical functions in MAT-

LAB. ADiGator performs source transformation via operator overloading using forward mode algorithmic

differentiation and produces a file that can be evaluated to obtain the derivative of the original function at

a numeric value of the input. A convenient by product of the file generation is the sparsity pattern of the

derivative function. Moreover, because both the input and output to the algorithm are source codes, the

algorithm may be applied recursively to generate derivatives of any order. A key component of the algorithm

is its ability to statically exploit derivative sparsity at the MATLAB operation level in order to improve

run-time performance. The algorithm is applied to four different classes of example problems and is shown

to produce run-time efficient derivative code. Due to the static nature of the approach, the algorithm is well

suited and intended for use with problems requiring many repeated derivative computations.

Categories and Subject Descriptors: G.1.4 [Numerical Analysis]: Automatic Differentiation

General Terms: Automatic Differentiation, Numerical Methods, MATLAB

Additional Key Words and Phrases: algorithmic differentiation, scientific computation, applied mathemat-ics, chain rule, forward mode, overloading, source transformation

ACM Reference Format:

Weinstein, M. J. and Rao, A. V. 2015. Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation ofMathematical Functions in MATLAB Using Source Transformation via Operator Overloading. ACM Trans.Math. Soft. V, N, Article A (January YYYY), 25 pages.DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000

The authors gratefully acknowledge support for this research from the U.S. Office of Naval Research un-der Grants N00014-11-1-0068 and N00014-15-1-2048, from the U.S. Defense Advanced Research ProjectsAgency under Contract HR0011-12-C-0011, from the U.S. National Science Foundation under grants CBET-1404767, DMS-1522629, and CMMI-1563225, and from the U.S. Air Force Research Laboratory under con-tract FA8651-08-D-0108/0054. Disclaimer: The views, opinions, and findings contained in this article arethose of the authors and should not be interpreted as representing the official views or policies of the De-partment of Defense or the U.S. Government.Distribution A. Approved for Public Release; Distribution Unlimited.Author’s addresses: M. J. Weinstein and A. V. Rao, Department of Mechanical and Aerospace Engineering,P.O. Box 116250, University of Florida, Gainesville, FL 32611-6250; e-mail: {mweinstein,anilvrao}@ufl.edu.Permission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrightsfor components of this work owned by others than ACM must be honored. Abstracting with credit is per-mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any componentof this work in other works requires prior specific permission and/or a fee. Permissions may be requestedfrom Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)869-0481, or [email protected].© YYYY ACM 1539-9087/YYYY/01-ARTA $15.00DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 2: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:2 M. J. Weinstein, and A. V. Rao

1. INTRODUCTION

Algorithmic differentiation (AD) is the process of determining accurate derivativesof a function defined by computer programs using the rules of differential calculus[Griewank 2008; 2014]. AD exploits the fact that a user program may be broken into asequence of elementary operations, and the derivative of the program is obtained by asystematic application of the calculus chain rule. AD can be performed either using theforward or reverse mode, where the fundamental difference between the two modes isthe order in which the chain rule is applied. In the forward mode, the chain rule isapplied from the input independent variables of differentiation to the final output de-pendent variables of the program, while in the reverse mode the chain rule is appliedfrom the final output dependent variables of the program back to the independentvariables of differentiation. Forward and reverse mode AD methods are classically im-plemented using either operator overloading or source transformation. In an operatoroverloaded approach, a custom class is constructed and all standard arithmetic oper-ations and mathematical functions are defined to operate on objects of the class. Anyobject of the custom class typically contains properties that include the function andderivative values of the object at a particular numerical value of the input. Further-more, when any operation is performed on an object of the class, both function andderivative calculations are executed from within the overloaded operation. In a sourcetransformation approach, typically a compiler-type software is required to transforma user-defined function source code into a derivative source code, where the new pro-gram contains derivative statements interleaved with the function statements of theoriginal program. The generated derivative source code may then be evaluated numer-ically in order to compute the desired derivatives. Run-time efficiency in computingthe AD-generated derivative is gained either by performing optimization at the timeof transformation or by exploiting derivative sparsity.

In recent years, MATLAB [Mathworks 2014] has become extremely popular as aplatform for numerical computing due largely to its built in high-level matrix opera-tions and user friendly interface. The interpreted nature of MATLAB and its high-levellanguage make programming intuitive and debugging easy. The qualities that makeMATLAB appealing from a programming standpoint, however, tend to pose problemsfor AD tools. In the MATLAB language, there exist many ambiguous operators (forexample, +, *) which perform different mathematical procedures depending upon theshapes (for example, scalar, vector, matrix, etc.) of the inputs to the operators. More-over, user variables are not required to be of any fixed size or shape. Thus, the propermathematical procedure of each ambiguous operator must be determined at run-timeby the MATLAB interpreter. This mechanism poses a major problem for both sourcetransformation and operator overloaded AD tools. Source transformation tools mustdetermine the proper rules of differentiation for all function operations at the time oftransformation. Given an ambiguous operation, however, the corresponding differen-tiation rule is also ambiguous. In order to cope with this uncertainty, MATLAB sourcetransformation AD tools must either determine fixed shapes for all variables, or printderivative procedures which behave differently depending upon the meaning of thecorresponding ambiguous function operations. As operator overloading is applied atrun-time, operator ambiguity is not an issue when employing an operator overloadedAD tool. The mechanism that the MATLAB interpreter uses to determine the mean-ings of ambiguous operators, however, imposes a great deal of run-time overhead onoperator overloaded tools. Several MATLAB AD tools have been developed over theyears including ADMAT [Coleman and Verma 1998a; 1998b], INTLAB [Rump 1999],MAD [Forth 2006], which rely solely upon operator overloading, ADiMat [Bischof et al.2002] which relies upon a combination of source transformation and operator over-

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 3: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3

loading, and, most recently, MSAD [Kharche and Forth 2006] which relies solely uponsource transformation.

In this paper a new open-source MATLAB algorithmic differentiation toolbox calledADiGator (Automatic Differentiation by Gators) is described . ADiGator performssource transformation via the non-classical methods of operator overloading andsource reading for the forward mode algorithmic differentiation of MATLAB programs.Motivated by the iterative nature of many applications that require numerical deriva-tive computation (for example, nonlinear optimization, ordinary differential equa-tions), a great deal of emphasis is placed upon performing an a priori analysis of theproblem at the time of transformation in order to minimize derivative computation runtime. Moreover, the algorithm neither relies upon sparse data structures at run-timenor relies on matrix compression in order to exploit derivative sparsity. Instead, anoverloaded class is used at transformation-time to determine sparse derivative struc-tures for each MATLAB operation. Simultaneously, the sparse derivative structuresare exploited to print run-time efficient derivative procedures to an output source code.The printed derivative procedures may then be evaluated numerically in order to com-pute the desired derivatives. Finally, it is noted that the previous research given inPatterson et al. [2013] and Weinstein and Rao [2016] focused on the methods uponwhich the ADiGator tool is based, while this paper focuses on the software implemen-tation of these previous methods and the utility of the software.

This paper is organized as follows. In Section 2, a row/column/value triplet notationused to represent derivative matrices is introduced. In Section 3, an overview of theimplementation of the algorithm is given in order to grant the reader a better under-standing of how to efficiently utilize the software as well as to identify various codingrestrictions to which the user must adhere. Key topics such as the used overloadedclass and the handling of flow control are discussed. In Section 4, a discussion is givenon the use of overloaded objects to represent cell and structure arrays. In Section 5, aspecial class of vectorized functions is considered, where the algorithm may be used totransform vectorized function codes into vectorized derivative codes. In Section 6, theuser interface to the ADiGator algorithm is described. In Section 7, the algorithm istested against other well known MATLAB AD tools on a variety of examples. In Sec-tion 8, a discussion is given on the efficiency of the algorithm and finally, in Section 9,conclusions are drawn.

2. SPARSE DERIVATIVE NOTATIONS

The algorithm of this paper utilizes a row/column/value triplet representation ofderivative matrices. In this section, the triplet representation is given for a general ma-trix function of a vector,F(x) : Rnx → Rqf×rf . The derivative of F(x) is the three dimen-sional object, ∂F/∂x ∈ Rqf×rf×nx . In order to gain a more tractable two-dimensionalderivative representation, we first let f(x) ∈ Rmf be the one-dimensional transforma-tion of the function F(x) ∈ Rqf×rf ,

f(x) =

F1(x)...

Frf (x)

, Fk =

F1,k(x)...

Fqf ,k(x)

, (k = 1, . . . , rf ),

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 4: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:4 M. J. Weinstein, and A. V. Rao

where mf = qfrf . The unrolled representation of the three-dimensional derivative∂F/∂x is then given by the two-dimensional Jacobian

∂f

∂x=

∂f1∂x1

∂f1∂x2

· · · ∂f1∂xnx

∂f2∂x1

∂f2∂x2

· · · ∂f2∂xnx

......

. . ....

∂fmf

∂x1

∂fmf

∂x2· · · ∂fmf

∂xnx

∈ Rmf×nx .

Assuming the first derivative matrix ∂f/∂x contains pfx ≤ mfnx possible non-zero el-ements (that is, there exist mfnx − pfx elements of ∂f/∂x which must be zero due tolack of dependence), the row and column locations of the possible non-zero elements of

∂f/∂x are denoted by the index vector pair (ifx, jfx) ∈ Z

pfx

+ × Zpfx

+ , where

ifx =

ifx(1)...

ifx(pfx)

, jfx =

jfx(1)...

jfx(pfx)

correspond to the row and column locations, respectively. In order to ensure uniquenessof the row/column pairs

(

ifx(k), jfx(k)

)

(where ifx(k) and jfx(k) refer to the kth elements ofthe vectors ifx and jfx, respectively, k = 1, . . . , pfx) the following column-major restrictionis placed upon the order of the index vectors:

ifx(1) +mf

(

jfx(1)− 1)

< ifx(2) +mf

(

jfx(2)− 1)

< · · · < ifx(pfx) +mf

(

jfx(pfx)− 1

)

.

Henceforth it shall be assumed that this restriction is always satisfied for row/columnindex vector pairs of the form of (ifx, j

fx), however it may not be explicitly stated. To

refer to the possible non-zero elements of ∂f/∂x , the vector dfx ∈ Rpf

x is used such that

dfx(k) =

∂f[ifx(k)]

∂x[jfx(k)], (k = 1, . . . , pfx),

where dfx(k) refers to the kth element of the vector dfx. Using this sparse nota-

tion, the Jacobian ∂f/∂x may be fully defined given the row/column/value triplet

(ifx, jfx,d

fx) ∈ Z

pfx

+ × Zpfx

+ × Rpfx

+ together with the dimensions mf and nx. Moreover, thethree-dimensional derivative matrix ∂F(x)/∂x is uniquely defined given the triplet(ifx, j

fx,d

fx) together with the dimensions qf , rf , and nx.

3. OVERVIEW OF THE ADIGATOR ALGORITHM

Without loss of generality, consider a function f(v(x)), where f : Rmv → Rmf and ∂v/∂x

is defined by the triplet (ivx, jvx,d

vx) ∈ Z

pvx

+ × Zpvx

+ × Rpvx

+ . Assume now that f(·) has beencoded as a MATLAB function, F, where the function F takes v ∈ Rmv as its input andreturns f ∈ R

mf as its output. Given the MATLAB function F, together with the indexvector pair (ivx, j

vx) and the dimensions mv and nx, the ADiGator algorithm determines

the index vector pair (ifx, jfx) and the dimension mf . Moreover, a MATLAB derivative

function, DF, is generated such that DF takes v and dvx as its inputs, and returns f

and dfx as its outputs. In order to do so, the algorithm uses a process which we have

termed source transformation via operator overloading. This process begins by trans-forming the original user-defined source code into an intermediate source code where

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 5: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:5

the new source code is augmented to contain calls to ADiGator specific transformationroutines while preserving the original code’s mathematical operations. The forwardmode of AD (which is used by ADiGator) is then affected by performing three passeson the intermediate program. On the first pass, a record of all operations, variables,and flow control statements is built. On the second pass, derivative sparsity patternsare propagated, and overloaded unions are performed where code branches join.3 Onthe third and final pass, derivative sparsity patterns are again propagated forward,while the procedures required to compute the output non-zero derivatives are printedto the derivative program. For a more detailed description of the method, the reader isreferred to Weinstein and Rao [2016] and Patterson et al. [2013].

3.1. User Source to Intermediate Source Transformations

The first step in the ADiGator algorithm is to transform the user-defined source codeinto an intermediate source code. This process is applied to the user provided mainfunction, as well as any user-defined external functions (or sub-functions) which itcalls. For each function contained within the user-defined program, a correspondingintermediate function, adigatortempfunc#, is created such that # is a unique inte-ger identifying the function. The initial transformation process is carried out by read-ing the user-defined function, line-by-line, and searching for keywords. The algorithmlooks for the following code behaviors and routines: variable assignments, flow control,external function calls, global variables, code comments, and error statements. If theuser-defined source code contains any statements that are not listed above (with theexception of operations defined in the overloaded library), then the transformation willproduce an error stating that the algorithm cannot process the statement.

3.2. Overloaded Operations

Once the user-defined program has been transformed to the intermediate program, theforward mode of AD is affected by performing multiple passes on the intermediate pro-gram. In the presence of flow control, three passes (parsing, overmapping, and print-ing) are required, otherwise only two (parsing and printing) are required. In each pass,all overloaded objects are tracked by assigning each object a unique integer id value. Inthe parsing evaluation, information similar to conventional data flow graphs and con-trol flow graphs is obtained by propagating overloaded objects with unique id fields. Inthe overmapping evaluation, forward mode AD is used to propagate derivative sparsitypatterns, and overloaded unions are performed in areas where flow control branchesjoin. In the printing evaluation, each basic block of function code is evaluated on its setof overmapped input objects. In this final pass, the overloaded operations perform twotasks: propagating derivative sparsity patterns and printing the procedures requiredto compute the non-zero derivatives at each link in the forward chain rule. In this sec-tion we briefly introduce the overloaded cada class, the manner in which it is used toexploit sparsity at transformation-time, a specific type of known numeric objects, andthe manner in which the overloaded class handles logical references/assignments.

3.2.1. The Overloaded cada Class. The overloaded class is introduced by first consider-ing a variable Y(x) ∈ R

qy×ry , where Y(x) is assigned to the identifier ‘Y’ in the user’scode. It is then assumed that there exist some elements of Y(x) which are identicallyzero for any x ∈ Rnx . These elements are identified by the strictly increasing index

vector iy ∈ Zpf

+ , 0 ≤ pf ≤ my, where

y[iy(k)] = 0, ∀x ∈ Rnx (k = 1, . . . , pf),

3This second pass is only required if there exists flow control in the user-defined program.

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 6: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:6 M. J. Weinstein, and A. V. Rao

and y(x) is the unrolled column-major vector representation of Y(x). It is then as-sumed that the possible non-zero elements of the unrolled Jacobian, ∂y/∂x ∈ R

my×nx

(my = qyry), are defined by the row/column/value triplet (iyx, jyx,d

yx) ∈ Z

pyx

+ × Zpyx

+ × Rpyx

+ .The corresponding overloaded object, denoted Y, would then have the following func-tion and derivative properties:

Function Derivativename: ‘Y.f’ name: ‘Y.dx’size: (qy, ry) nzlocs: (iyx, j

yx)

zerolocs: iy

Assuming that the object Y is instantiated during the printing pass, the procedureswill have been printed to the derivative file such that, upon evaluation of the derivativefile, Y.f and Y.dx will be assigned the values of Y and dy

x, respectively. It is importantto stress that the values of (qy, ry), iy, and (iyx, j

yx) are all assumed to be fixed at the

time of derivative file generation. Moreover, by adhering to the assumption that thesevalues are fixed, it is the case that all overloaded operations must result in objects withfixed sizes and fixed derivative sparsity patterns (with the single exception to this rulegiven in Section 3.2.4). It is also noted that all user objects are assumed to be scalars,vectors, or matrices. Thus, while MATLAB allows for one to use n-dimensional arrays,the ADiGator algorithm may only be used with two dimensional arrays.

3.2.2. Exploiting Sparsity at the Operation Level. Holding to the assumption that all inputsizes and sparsity patterns are fixed, any files that are generated by the algorithm areonly valid for a single input size and derivative sparsity pattern. Fixing this informa-tion allows the algorithm to accurately propagate derivative sparsity patterns duringthe generation of derivative files. Moreover, rather than relying on compression tech-niques to exploit sparsity of the program as a whole, sparsity is exploited at every linkin the forward chain rule. Typically this is achieved by only applying the chain rule tovectors of non-zero derivatives (for example, dy

x). To illustrate this point, we considerthe simple function line:

W = sin(Y);.

The chain rule for the corresponding operation W(x) = sin(Y(x)) is then given by

∂w

∂x=

cos(y1) 0 · · · 00 cos(y2) · · · 0...

.... . .

...0 0 · · · cos(ymy

)

∂y

∂x, (1)

where w ∈ Rmy is the unrolled column-major vector representation of W. Given(iyx, j

yx) ∈ Z

pyx

+ × Zpyx

+ , Eq. (1) may sparsely be carried out by the procedure

dwx (k) = cos(y[iyx(k)])dyx(k), k = 1, . . . , pyx. (2)

Moreover, the index vector pair which identifies the possible non-zero locations of∂w/∂x is identical to that of ∂y/∂x. During the printing evaluation, the overloadedsin routine would have access to (iyx, j

yx) and print the procedures of Eq. (2) to the

derivative file as the MATLAB procedure

W.dx = cos(Y(Index1)).*Y.dx;,

where the variable Index1 would be assigned the value of the index vector iyx andwritten to memory at the time the derivative procedure is printed. Thus, sparsity isexploited at transformation-time, such that the chain rule is carried out at run-time

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 7: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:7

by only operating on vectors of non-zero derivatives. Similar derivative procedures areprinted for all array operations (for instance sqrt, log, +, .*).

The case where the chain rule is not simply applied to vectors of non-zero deriva-tives at run-time is that of matrix operations (for example, summation, matrix mul-tiplication, etc.). In general, the inner derivative matrices of such operations containrows with more than one non-zero value. Thus, the chain rule may not, in general, becarried out by performing element-wise array multiplications on vectors. Derivativesparsity, however, may still be exploited for such operations. For instance, consider thematrix operation Z(x) = AY(x), A ∈ Rqz×qy , where A is a constant matrix. The chainrule associated with this operation is given as

∂Z

∂xk

= A∂Y

∂xk

, (k = 1, . . . , nx).

Suppose now that

B ≡[

∂Y∂x1

· · · ∂Y∂xnx

]

∈ Rqy×rynx .

Then

C ≡ AB =[

A ∂Y∂x1

· · · A ∂Y∂xnx

]

=[

∂Z∂x1

· · · ∂Z∂xnx

]

∈ Rqz×rynx , (3)

where the matrices B and C have the same column-major linear indices as ∂y/∂x and∂z/∂x, respectively. Now consider that, given the index vector pair (iyx, j

yx), the sparsity

pattern of B(x) is known. Moreover, if there exist any columns of B which are known tobe zero, then the matrix multiplication of Eq. (3) performs redundant computations oncolumns whose entries are all zero. We now allow the strictly increasing index vectorkyx ∈ Z

syx+ , syx ≤ rynx, to denote the columns of B which are not zero, and let

D ≡[

B[ky

x(1)] · · · B[ky

x(syx)]

]

∈ Rqy×syx

be the collection of possibly non-zero columns of B. All of the elements of dzx must then

be contained within the matrix

AD =[

C[ky

x(1)] · · · C[ky

x(syx)]

]

∈ Rqz×syx . (4)

The ADiGator algorithm takes advantage of this fact and produces derivative proce-dures corresponding to Eq. (4), where the resulting values of dz

x may be referenceddirectly from the result of Eq. (4).

3.2.3. Known Numeric Objects. A common error that occurs when using operator over-loading in MATLAB is given as

‘Conversion to double from someclass not possible.’.

This typically occurs when attempting to perform a subscript-index assignment suchas y(i) = x, where x is overloaded and y is of the double class. In order to avoid thiserror and to properly track all variables in the intermediate program, the ADiGator al-gorithm ensures that all active variables in the intermediate program are overloaded.Moreover, immediately after a numeric variable (double, logical etc.) is created, itis transformed into a “known numeric object”, whose only relevant properties are itsstored numeric value, string name and id. The numeric value is then assumed to befixed. As a direct consequence, all operations performed in the intermediate programare forced to be overloaded. At times, this consequence may be adverse as redundantauxiliary computations may be printed to the derivative file. Moreover, in the worstcase, one of the operations in question may not have an overloaded routine written,and thus produce an error.

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 8: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:8 M. J. Weinstein, and A. V. Rao

3.2.4. Logical References and Assignments. As stated in Section 3.2.1, the algorithm onlyallows for operations which result in variables of a fixed size (given a fixed dimensionalinput). It is often the case, however, that one wishes to perform operations on onlycertain elements of a vector, where the element locations are determined by the valuesof the entries of the vector. In order to allow for such instances, the algorithm allows forunknown logical array references under the condition that, if a logical index referenceis performed, the result of the logical reference must be assigned to a variable via alogical index assignment. Moreover, the same logical index variable must be used forboth the reference and assignment. This method of handling logical array referencesand assignments allows for all variables of the derivative program to be of a fixeddimension, yet can result in some unnecessary computation.

3.3. Handling of Flow Control

The ADiGator algorithm handles flow control by performing overloaded unions wherecode fragments join. Namely, the unions are performed on the exit of conditionalif/elseif/else statements, on the entrance of for loop statements, and on both theentrance and exit of user-defined external functions and while loops. While this ap-proach allows the software to differentiate routines containing flow control and totranscribe the flow control to the derivative program, it does introduce some limita-tions and adverse effects. The negative consequences of this approach are summarizedas follows. The software is limited in that it cannot differentiate programs which recur-sively call the same routines, nor is the software entirely robust to different instancesof the same variable changing sizes. It is also the case that sparsity cannot always befully exploited in the presence of flow control. Furthermore, derivative file generationtimes will be proportional to the number of loop iterations in a program, as loops mustbe unrolled for purposes of analysis. For a more in-depth analysis of of the methodsused to differentiate programs containing flow control, the reader is referred to Wein-stein and Rao [2016].

4. OVERLOADED CELL AND STRUCTURE ARRAYS

In Section 3 it was assumed that all user variables in the originating program are ofclass double. In the intermediate program, all such objects are effectively replaced byobjects of the cada class [Patterson et al. 2013], where each cada object is tracked bya unique id value. It is sometimes the case, however, that a user code is made to con-tain cell and/or structure arrays, where the elements of the arrays contain objects ofthe double class. In the intermediate program, it is then desirable to track the outer-most cell and/or structure arrays, rather than each of the objects of which the arrayis composed. To this end, all cell and structure arrays are replaced with objects of thecadastruct class during the overloaded analysis. Each cadastruct object is then as-signed a unique id value, assuming it does not correspond to a scalar structure. In theevent that a scalar structure is built, then each of the fields of the scalar structure istreated as a unique variable. The cadastruct objects are themselves made to containobjects of the cada class, however, the embedded cada objects are not tracked (assumingthe object does not correspond to a scalar structure). The handling of cell and structurearrays in this manner allows the algorithm to perform overloaded unions of cell andstructure arrays and to print loop iteration-dependent cell/structure array referencesand assignments.

5. VECTORIZATION OF THE CADA CLASS

In this section, the differentiation of a special class of vectorized functions is consid-ered, where we define a vectorized function as any function of the form of F : Rnx×N →Rmf×N which performs the vector valued function f : Rnx → Rmf on each column of its

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 9: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:9

input. That is,

F(X) = [ f(X1) f(X2) · · · f(XN ) ] ∈ Rmf×N , (5)

where Xk ∈ Rnx , k = 1, . . . , N ,

X = [X1 X2 · · · XN ] ∈ Rnx×N .

It is stressed that the vectorized functions of this section are not limited to a singleoperation, but rather may be coded as a sequence of operations. Similar to array op-erations, vectorized functions have a sparse block diagonal Jacobian structure due tothe fact that

∂Fl,i

∂Xj,k

= 0, ∀i 6= k, l = 1, . . . ,mf , j = 1, . . . , nx.

Allowing

X† =

X1

X2

...XN

∈ RnxN , Fk = f(Xk) ∈ R

mf ,

and

F†(X) =

F1

F2

...FN

∈ RmfN ,

the two-dimensional Jacobian ∂F†/∂X† is given by the block diagonal matrix

∂F†

∂X†=

∂F1

∂X10 · · · 0

0 ∂F2

∂X2· · · 0

......

. . ....

0 0 · · · ∂FN

∂XN

∈ RmfN×nxN , (6)

where

∂Fi

∂Xi

=

∂F1,i

∂X1,i

∂F1,i

∂X2,i· · · ∂F1,i

∂Xnx,i

∂F2,i

∂X1,i

∂F2,i

∂X2,i· · · ∂F2,i

∂Xnx,i

......

. . ....

∂Fmf ,i

∂X1,i

∂Fmf,i

∂X2,i· · · ∂Fmf ,i

∂Xnx,i

∈ Rmf×nx , i = 1, . . . , N. (7)

Such functions commonly occur when utilizing collocation methods [Ascher et al. 1995]to obtain numerical solutions of ordinary differential equations, partial differentialequations, or integral equations. In such cases, it is the goal to obtain the values ofX ∈ Rnx×N which solve the equation

c(F(X),X) = 0 ∈ Rmc , (8)

where F(X) is of the form of Eq. (5). Now, one could apply AD directly to Eq. (8),however, it is often the case that it is more efficient to instead apply AD separately tothe function F(X), where the specific structure of Eq. (6) may be exploited. The resultsmay then be used to compute the derivatives of Eq. (8).

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 10: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:10 M. J. Weinstein, and A. V. Rao

Due to the block diagonal structure of Eq. (6), it is the case that the vectorized prob-lem has an inherently compressible Jacobian with a maximum column dimension ofnx. This compression may be performed via the pre-defined Curtis-Powell-Reid seedmatrix

S =

Inx

Inx

...Inx

∈ RnxN×nx , (9)

where Inxis the nx × nx identity matrix. The ADiGator algorithm in the vectorized

mode does not, however, rely upon matrix compression, but rather utilizes the fact thatthe structure of the Jacobian of Eq. (6) is determined by the structure of the Jacobian ofEq. (7). To exhibit this point, the row/column pairs of the derivative of f with respect to

its input are now denoted by (ifx, jfx) ∈ Z

pfx

+ × Zpfx

+ . The N derivative matrices, ∂Fi/∂Xi,

may then be represented by the row/column/value triplets (ifx, jfx,d

Fi

Xi) ∈ Z

pfx

+ ×Zpfx

+ ×Rpfx

+

together with the dimensions mf and nx. All possible non-zero derivatives of ∂F/∂Xare then given by

DFX =

[

dF1

X1dF2

X2· · · dFN

XN

]

∈ Rpfx×N . (10)

Furthermore, ∂F/∂X may be fully defined given the vectorized row/column/value

triplets (ifx, jfx,D

FX) ∈ Z

pfx

+ × Zpfx

+ × Rpfx×N

+ , together with the dimensions nx, mf , andN . Thus, in order to print derivative procedures of a vectorized function as definedin Eq. (5), it is only required to propagate row/column index vector pairs (ifx, j

fx) ∈

Zpfx

+ × Zpfx

+ corresponding to the non-vectorized problem, and to print procedures that

compute vectorized non-zero derivatives, DFX ∈ R

pfx×N

+ .In order to identify vectorized cada objects, all vectorized cada instances are made to

have a value of Inf located in the size field corresponding to the vectorized dimension.Then, at each vectorized cada operation, sparsity patterns of the non-vectorized prob-lem are propagated (that is, (ifx, j

fx)) and procedures are printed to the derivative file to

compute the vectorized function and vectorized derivative values (that is, F and DFX).

It is then the case that any operations performed on a vectorized cada object must beof the form given in Eq. (5).

Here is is noted that, given a fixed value of N , the non-vectorized mode may easilybe used to print the procedures required to compute the non-zero derivatives of F(X).Typically the derivative files generated by the vectorized and non-vectorized modeswill perform the exact same floating point operations at run-time. One may then ques-tion the advantages of utilizing the vectorized mode, particularly when more work isrequired of the user in order to separate vectorized functions. The advantages of thevectorized mode are given as follows:

(1) Derivative files are vectorized. Typically functions of the form of Eq. (5) are codedsuch that the value of N may be any positive integer. By utilizing the vectorizedmode, it is the case that the derivative files are generated such that N may be anypositive integer. In contrast, any files generated using the non-vectorized mode areonly valid for fixed input sizes. Allowing the dimension N to change is particularlyhelpful when using collocation methods together with a process known as meshrefinement [Betts 2009] because in such instances the problem of Eq. (8) must oftenbe re-solved for different values of N .

(2) Transformation time is reduced. By taking advantage of the fact that the sparsity ofthe vectorized problem (that is, F(X)) is determined entirely by the sparsity of the

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 11: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:11

non-vectorized problem (that is, f(x)), it is the case that sparsity propagation costsare greatly reduced when using the vectorized mode over the non-vectorized mode.

(3) Run-time overhead is reduced. In order to exploit sparsity, the algorithm printsderivative procedures which perform many subscript index references and assign-ments at run-time. Unfortunately, these reference and assignment operations incurrun-time penalties proportional to the length of the reference/assignment index vec-tors [Menon and Pingali 1999]. Moreover, the lengths of the used reference and as-signment indices are proportional to the number of non-zero derivatives at each linkin the chain rule. When printing derivative procedures in the vectorized mode, how-ever, the ‘:’ character is used as a reference to all elements in the vectorized dimen-sion. Thus, the lengths of the required index vectors are proportional to the numberof non-zero derivatives of the non-vectorized problem (that is, ∂f/∂x), rather thanthe vectorized problem (that is, ∂F/∂X). Indexing reference/assignment run-timeoverheads are therefore reduced by an order of N when using the vectorized moderather than the non-vectorized.

6. USER INTERFACE TO ADIGATOR

The computation of derivatives using the ADiGator package is carried out in a multi-step process. First, the user must code their function as a MATLAB program whichconforms to the restrictions discussed in Section 3. The user must then fix informa-tion pertaining to the inputs of the program (that is, input variable sizes and deriva-tive sparsity patterns). The ADiGator algorithm is then called to transform the user-defined function program into a derivative program, where the derivative program isonly valid for the fixed input information. The ADiGator tool is then no longer usedand the generated derivative program may be evaluated on non-overloaded objects tocompute the desired derivatives.

In order to begin the transformation process, the ADiGator algorithm must createoverloaded objects of the form discussed in Section 3.2.1. Thus, the user must providecertain information for each input to their program. Assuming temporarily that alluser inputs to the original function program are of the double class, then all userinputs must fall into one of three categories:

• Derivative inputs. Derivative inputs are any inputs which are a function of the vari-able of differentiation. Derivative inputs must have a fixed size and fixed derivativesparsity pattern.

• Known numeric inputs. Known numeric inputs are any inputs whose values are fixedand known. These inputs will be transformed into the known numeric objects dis-cussed in Section 3.2.3.

• Unknown auxiliary inputs. Unknown auxiliary inputs are any inputs which are not afunction of the variable of differentiation nor are they of a fixed value. It is required,however, that unknown auxiliary inputs have a fixed size.

For each of the user-defined input variables, the user must identify to which cate-gory the input belongs and create an ADiGator input variable. Under the conditionthat a user-defined program takes a structure or cell as an input, the correspondingADiGator input variable is made to be a structure or cell where each cell/structureelement corresponding to an object of the double class must be identified as one of thethree different input types. The ADiGator input variables are thus made to contain allfixed input information and are passed to the ADiGator transformation algorithm. TheADiGator transformation algorithm is then carried out using the adigator commandwhich requires the created ADiGator input variables, the name of the main functionfile of the user-defined program, and the name of which the generated derivative fileis to be titled. The generated derivative program then has the same input structure

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 12: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:12 M. J. Weinstein, and A. V. Rao

as the originating program with the exception that derivative inputs must be replacedby structures containing derivative input function values and non-zero derivative val-ues. Moreover, the generated derivative program returns, for each output variable,the function values, possible non-zero derivative values, and locations of the possiblenon-zero derivatives. The user interface thus allows a great deal of flexibility for theuser-defined function program input/output scheme. Moreover, the user is granted theability to use any desired input seed matrices. For more information on how to usethe ADiGator package, the reader is referred to the user’s guide and examples whichaccompany the code.

7. EXAMPLES

In this section, the ADiGator tool is tested by solving four different classes of prob-lems. In Section 7.1, the developed algorithm is used to integrate an ordinary dif-ferential equation with a large sparse Jacobian. In Section 7.2, a set of three fixeddimension non-linear system of equations problems are investigated, and in Section7.3, a large sparse unconstrained minimization problem is presented. Lastly, in Sec-tion 7.4 the vectorized mode of ADiGator is showcased by solving the large scale non-linear programming problem that arises from the discretization of an optimal controlproblem. For each of the tested problems, comparisons are drawn against methods offinite-differencing, the well-known MATLAB AD tools ADiMat version 0.6.0, INTLABversion 6, and MAD version 1.4, and, when available, hand-coded derivative files. Allcomputations were performed on an Apple Mac Pro with Mac OS-X 10.9.2 (Mavericks)and a 2 × 2.4 GHz Quad-Core Intel Xeon processor with 24 GB 1066 MHz DDR3 RAMusing MATLAB version R2014a. All reported times were computed using MATLAB’stic/toc routines and averaging over 1000 iterations.

7.1. Stiff Ordinary Differential Equation

In this section the well-known Burgers’ equation is solved using a moving mesh tech-nique as presented in Huang et al. [1994]. The form of Burgers’ equation used for thisexample is given by

u = α∂2u

∂y2− ∂

∂y

(

u2

2

)

, 0 < y < 1, t > 0, α = 10−4 (11)

with boundary conditions and initial conditions

u(0, t) = u(1, t) = 0, t > 0,u(y, 0) = sin(2πy) + 1

2 sin(πy), 0 ≤ y ≤ 1.(12)

The partial differential equation (PDE) of Eq. (11) is then transformed into an ordinarydifferential equation (ODE) via a central difference discretization together with themoving mesh PDE, MMPDE6 (with τ = 10−3), and spatial smoothing is performedwith parameters γ = 2 and p = 2. The result of the discretization is then a stiff ODEof the form

M(t, x)x = f(t,x), (13)

where M : R × Rnx → Rnx×nx is a mass-matrix function and f : R × Rnx → Rnx is theODE function. This problem is given as an example problem for the MATLAB ODEsuite and is solved with the stiff ODE solver, ode15s [Shampine and Reichelt 1997],which allows the user to supply the Jacobian ∂f/∂x.

Prior to actually solving the ODE, a study is performed on the efficiency of differen-tiation of the function f(t,x) for varying values of nx, where the code for the functionf(t,x) has been taken verbatim from the MATLAB example file burgersode. The Ja-cobian ∂f/∂x is inherently sparse and compressible, where a Curtis-Powell-Reid seed

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 13: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:13

matrix S ∈ Znx×18+ may be found for all nx ≥ 18. Thus, the Jacobian ∂f/∂x becomes in-

creasingly more sparse as the dimension of nx is increased. A test was first performedby applying the AD tools ADiGator, ADiMat, INTLAB, and MAD to the function codefor f(t,x). It was found, however, that all tested AD tools perform quite poorly, partic-ularly when compared to the theoretical efficiency of a sparse finite-difference.4 Thereason for the poor performance is due to the fact that the code used to compute fcontains four different explicit loops, each of which runs for nx

2 − 2 iterations and per-forms scalar operations. When dealing with the explicit loops, all tested AD tools incura great deal of run-time overhead penalties. In order to quantify these run-time over-heads, the function file which computes f was modified such that all loops (and scalaroperations within the loops) were replaced by the proper corresponding array opera-tions and vector reference/assignment index operations.5 A test was then performedby applying AD to the resulting modified file. The results obtained by applying AD toboth the original and modified files are given in Fig. 1. Results were obtained usingADiGator in the default mode, ADiMat in the scalar compressed forward mode, INT-LAB’s gradient class, and MAD in the compressed forward mode. Within this figureit is seen that all tested AD tools greatly benefit from the removal of the loop state-ments. Moreover, it is seen that the ADiGator tool performs relatively well comparedto that of a theoretical finite difference. To further investigate the handling of explicitloops, absolute function CPU times and ADiGator file generation times are given inTable I. Within this table, it is seen that the reason the original Burgers’ ODE func-tion file is written with loops is that it is slightly more efficient than when the loopsare removed. It is, however, also seen that when using the ADiGator tool to generatederivative files, the cost of the transformation of the original code containing loopsincreases immensely as the value of nx increases. This increase in cost is due to thefact that the ADiGator tool effectively unrolls loops for the purpose of analysis, andthus must perform a number of overloaded operations proportional to the value of nx.When applying the ADiGator tool to the file containing no explicit loops, however, thenumber of required overloaded operations stays constant for all values of nx. From thisanalysis, it is clear that explicit loops should largely be avoided whenever using any ofthe tested AD tools. Moreover, it is clear that the efficiency of applying AD to a MAT-LAB function is not necessarily proportional to the efficiency of the original function.

Table I: Burgers’ ODE function CPU and ADiGator generation CPU times.

nx: 32 64 128 256 512 1024 2048Function File Computation Time (ms)

with loops: 0.2046 0.2120 0.2255 0.2449 0.2895 0.3890 0.5615without loops: 0.2122 0.2190 0.2337 0.2524 0.2973 0.3967 0.5736ADiGator Derivative File Generation Time (s)

with loops: 2.410 4.205 7.846 15.173 30.130 62.754 137.557without loops: 0.682 0.647 0.658 0.666 0.670 0.724 0.834

4Forward difference approximations with respect to n variables may be accomplished by n + 1 functionevaluations, one for the perturbation of each variable and one for the function at the input value. Whenutilizing matrix compression, one may replace n with the second dimension of the seed matrix S.5This process of replacing loops with array operations is often referred to as “vectorization”. In this paperthe term “vectorized” has already been used to refer to a specific class of functions in Section 5. Thus, inorder to avoid any confusion, use of the term “vectorization” is avoided when referring to functions whoseloops have been replaced.

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 14: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:14 M. J. Weinstein, and A. V. Rao

log2(nx)

ADiGator

log2

(

CP

U(∂f/∂x))/C

PU(f))

ADiMat

Forward Difference

INTLAB

MAD

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

13

14

15

16

(a) With Explicit Loops

log2(nx)

ADiGator

log2

(

CP

U(∂f/∂x))/C

PU(f))

ADiMat

Forward Difference

INTLAB

MAD

3

4

5

5

6

6

7

7

8

8

9

9 10 11

(b) Without Explicit Loops

Fig. 1: Burgers’ ODE Jacobian to function CPU ratios. All ratios presented in binarylogarithm format. (a) Ratios obtained by differentiating the original implementationof f containing explicit loops. (b) Ratios obtained by differentiating the modified imple-mentation of f containing no explicit loops. In both cases, the dashed line representsthe binary logarithm of the theoretical ratio for a sparse forward-difference given bylog2 (CPU(∂f/∂x))/CPU(f)) = log2(19) ≈ 4.25.

The efficiency of the ADiGator tool is now investigated by solving the ODE and com-paring solution times obtained by supplying the Jacobian via the ADiGator tool versussupplying the Jacobian sparsity pattern and allowing ode15s to use the numjac finite-difference tool to compute the required derivatives. It is important to note that the

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 15: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:15

numjac finite-difference tool was specifically designed for use with the MATLAB ODEsuite, where a key component of the algorithm is to choose perturbation step-sizes atone point based off of data collected from previous time steps [Shampine and Reichelt1997]. Moreover, it is known that the algorithm of ode15s is not extremely reliant uponprecise Jacobian computations, and thus the numjac algorithm is not required to com-pute extremely accurate Jacobian approximations [Shampine and Reichelt 1997]. Forthese reasons, it is expected that when using numjac in conjunction with ode15s, Ja-cobian to function CPU ratios should be near the theoretical values shown in Fig. 1.In order to present the best case scenarios, tests were performed by supplying ode15swith the more efficient function file containing loop statements. When the ADiGatortool was used, Jacobians were supplied by the files generated by differentiating thefunction whose loops had been removed. In both cases, the ODE solver was suppliedwith the mass matrix, the mass matrix derivative sparsity pattern, and the Jacobiansparsity pattern. Moreover, absolute and relative tolerances were set equal to 10−5 and10−4, respectively, and the ODE was integrated on the interval t = [0, 2]. Test resultsmay be seen in Table II, where it is seen that the ODE may be solved more efficientlywhen using numjac for all test cases except nx = 2048. It is also seen that the number ofJacobian evaluations required when using either finite-differences or AD are roughlyequivalent. Thus, the ode15s algorithm, in this case, is largely unaffected by supplyinga more accurate Jacobian.

Table II: Burgers’ ODE solution times.

nx: 32 64 128 256 512 1024 2048ODE Solve Time (s)ADiGator: 1.471 1.392 2.112 4.061 10.472 36.386 139.813

numjac: 1.383 1.284 1.958 3.838 9.705 32.847 140.129Number of Jacobian EvaluationsADiGator: 98 92 126 197 305 495 774

numjac: 92 92 128 194 306 497 743

7.2. Fixed Dimension Nonlinear Systems of Equations

In this section, analysis is performed on a set of fixed dimension nonlinear systemof equations problems taken from the MINPACK-2 problem set [Averick et al. 1991].While originally coded in Fortran, the implementations used for the tests of this sectionwere obtained from Lenton [2005]. The specific problems chosen for analysis are thoseof the “combustion of propane fuel” (CPF), “human heart dipole” (HHD), and “coatingthickness standardization” (CTS). The CPF and HDD problems represent systems ofnonlinear equations f : Rn → Rn (n = 11 and n = 8, respectively) where it is desiredto find x∗ such that f(x∗) = 0. The CTS problem represents a system of nonlinearequations f : R134 → R

252 where it is desired to find x∗ which minimizes f(x) in theleast-squared sense. The standard methods used to solve such problems are basedupon Newton iterations and thus require iterative Jacobian computations.

Prior to solving the nonlinear problems, a test is first performed to gauge the effi-ciency of the Jacobian computation compared to the other well-known MATLAB ADtools. The implementation of Lenton [2005] provides hand-coded Jacobian files whichalso provide a convenient base-line for computation efficiency. For each of the prob-lems the ADiGator tool was tested against the ADiMat tool in the scalar compressedforward mode, the INTLAB tool’s gradient class, the MAD tool in the compressed for-ward mode, and the hand-coded Jacobian as provided by Lenton [2005]. Moreover, it is

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 16: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:16 M. J. Weinstein, and A. V. Rao

noted that the Jacobians of the CPF and HHD functions are incompressible while theJacobian of the CTS function is compressible with a column dimension of six. Thus, forthe CPF and HHD tests, the ADiMat and MAD tools are essentially used in the fullmodes. The resulting Jacobian to function CPU ratios are given in Table III togetherwith the theoretical ratio for a sparse finite difference (sfd). From Table III it is seenthat the ADiGator algorithm performs relatively better on the sparser CPF and CTSfunctions (whose Jacobians contain 43.8% and 2.61% non-zero entries, respectively)than on the denser HHD problem (whose Jacobian contains 81.25% non-zero entries).Moreover, it is seen that, on the incompressible CPF problem, the ADiGator algorithmperforms more efficiently than a theoretical sparse finite difference. Furthermore, inthe case of the compressible CTS problem, the ADiGator tool performs more efficientlythan the hand-coded Jacobian file.

Table III: Jacobian to function CPU ratios for CPF, HHD, and CTS problems.

Problem: CPF HHD CTSJacobian to Function CPU Ratios, CPU(∂f/∂x)/CPU(f )ADiGator: 8.0 21.3 7.3

ADiMat: 197.0 226.3 56.3INTLAB: 298.5 436.5 85.9

MAD: 474.8 582.6 189.8hand: 1.3 1.2 11.3

sfd: 12.0 9.0 7.0

Next, the three test problems were solved using the MATLAB optimization tool-box functions fsolve (for the CPF and HHD nonlinear root-finding problems) andlsqnonlin (for the CTS nonlinear least squares problem). The problems were testedby supplying the MATLAB solvers with the Jacobian files generated by the ADiGatoralgorithm and by simply supplying the Jacobian sparsity patterns and allowing the op-timization toolbox to perform sparse finite-differences. Default tolerances of the opti-mization toolbox were used. The results of the test are shown in Table IV, which showsthe solution times, number of required Jacobian evaluations, and ADiGator file gen-eration times. From this table, it is seen that the CPF, HHD, and CTS problems solvein about the same amount of time whether supplied with the Jacobian via the ADiGa-tor generated files or the MATLAB sparse finite-differencing routine. It is also notedthat the time required to generate the ADiGator derivative files is actually greaterthan the time required to solve the problems. For this class of problems, however, thedimensions of the inputs are fixed, and thus the ADiGator generated derivative filesmust only be generated a single time.

Table IV: Solution times for fixed dimension nonlinear systems.

Problem: CPF HHD CTSSolution Time (s)ADiGator: 0.192 0.100 0.094

sfd: 0.212 0.111 0.091Number of IterationsADiGator: 96 38 5

sfd: 91 38 5ADiGator File Generation Time (s)

0.429 0.422 0.291

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 17: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:17

7.3. Large-Scale Unconstrained Minimization

In this section the 2-D Ginzburg-Landau (GL2) minimization problem is tested fromthe MINPACK-2 test suite [Averick et al. 1991]. The problem is to minimize the Gibbsfree energy in the discretized Ginzburg-Landau superconductivity equations. The ob-jective, f , is given by

f =

nx∑

i=1

ny∑

j=1

−|vi,j |2 +1

2|vi,j |4 + φi,j(v, a

(1), a(2)), (14)

where v ∈ C(nx+1)×(ny+1) and (a(1), a(2)) ∈ R(nx+1)×(ny+1) ×R(nx+1)×(ny+1) are discreteapproximations to the order parameter V : R2 → C and vector potential A : R2 → R2

at the equally spaced grid points ((k − 1)hx, (l − 1)hy), 1 ≤ k ≤ nx + 1, 1 ≤ l ≤ ny + 1.Periodicity conditions are used to express the problem in terms of the variables vk,l,

a(1)k,l , and a

(2)k,l for 1 ≤ k ≤ nx + 1, 1 ≤ l ≤ ny + 1. Moreover, both the real and imaginary

components of v are treated as variables. Thus, the problem has 4nxny variables. Forthe study conducted in this section, it was allowed that n = 4nxny, nx = ny. Moreover,the Ginzburg-Landau constant, κ, and number of vortices, nv, were set to κ = 5 andnv = 8. The code used for the tests of this section was obtained from Lenton [2005],which also contains a hand-coded gradient file.6 For the remainder of this section, theobjective function will be denoted by f , where f : Rn → R and the gradient functionwill be denoted g, where g : Rn → Rn.

In order to test the efficiency of the ADiGator tool at both the first and second deriva-tive levels, both the objective and gradient functions, f and g, were differentiated.Thus, three different tests were performed by computing (1) the first derivative ofthe objective, ∂f/∂x; (2) the first derivative of the gradient, ∂g/∂x; and (3) the sec-ond derivative of the objective, ∂2f/∂x2, where ∂g/∂x = ∂2f/∂x2. The aforementionedderivatives were computed using the ADiGator, ADiMat, INTLAB, and MAD tools andresults are given in Table V. Additionally, Table V provides the theoretical derivative-to-function CPU ratios that would be required if a finite difference was to be used alongwith the derivative-to-function ratio of the hand-coded gradient file. The results pre-sented in Table V were obtained as follows. For the gradient computation, ∂f/∂x, thetested AD tools were applied to the objective function where ADiMat was used in thereverse scalar mode, and INTLAB and MAD were used in the sparse forward modes.Additionally, the hand-coded gradient g was evaluated in order to compute the hand-coded ratios, and the ratio given for a finite-difference is equal to n+1. For the Jacobiancomputation, ∂g/∂x, the tested AD tools were applied to the hand-coded gradient func-tion where ADiMat was used in the forward compressed scalar mode, INTLAB wasused in the sparse forward mode, and MAD was used in the forward compressed mode.The ratios given for a sparse finite difference are given as (c + 1) times those of thehand-coded gradient ratios, where c is the number of Hessian colors provided in thetable. For the Hessian computation, ∂2f/∂x2, the tested AD tools were applied again tothe objective function where ADiMat was used in the compressed forward over scalarreverse mode (with operator overloading for the forward computation, t1rev option ofadmHessian), INTLAB was used in the sparse second-order forward mode, and MADwas used in the compressed forward mode over sparse forward mode. The ratios givenfor a finite-difference are equal to (n + 1)(c + 1); the number of function evaluationsrequired to approximate the Hessian via a central difference. As witnessed from Table

6The files obtained from Lenton [2005] unpacked the decision vector by projecting into a three dimensionalarray. The code was slightly modified to project only to a two-dimensional array in order to allow for usewith the ADiGator tool.

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 18: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:18 M. J. Weinstein, and A. V. Rao

V, the ADiGator tool performs quite well at run-time compared to the other methods.While the hand-coded gradient may be evaluated faster than the ADiGator generatedgradient file, the ADiGator generated file is, at worst, only five times slower, and isgenerated automatically.

Table V: Different derivative to function CPU ratios for 2-D Ginzburg-Landau problemfor increasing values of n, where n = 4nxny and nx = ny.

n: 16 64 256 1024 4096 16384Ratios CPU(∂f/∂x)/CPU(f )

ADiGator: 6.3 6.3 7.0 9.6 10.9 12.0ADiMat: 86.9 84.6 80.9 68.7 52.9 21.3

INTLAB: 67.9 67.1 65.4 60.1 57.7 41.0MAD: 123.6 121.2 118.3 112.9 142.3 240.9

fd: 17.0 65.0 257.0 1025.0 4097.0 16385.0hand: 3.8 4.2 4.2 3.8 3.8 2.5

Ratios CPU(∂g/∂x)/CPU(f )ADiGator: 33.0 38.0 39.1 39.3 49.6 50.4

ADiMat: 632.5 853.1 935.6 902.1 731.4 420.4INTLAB: 518.7 530.4 514.7 460.0 414.1 249.1

MAD: 896.2 876.9 838.9 724.3 579.8 267.8sfd: 64.9 87.3 100.5 99.8 95.6 66.2

Ratios CPU(∂2f/∂x2)/CPU(f )ADiGator: 9.7 10.7 13.1 20.5 45.4 62.9

ADiMat: 944.5 926.5 889.2 819.9 727.4 393.0INTLAB: 102.4 102.3 138.4 2102.4 47260.0 -

MAD: 531.1 527.5 584.3 1947.6 19713.8 -fd: 289.0 1365.0 6168.0 26650.0 102425.0 426010.0

Hessian Information# colors: 16 20 23 25 24 25

% non-zero: 62.50 19.53 4.88 1.22 0.31 0.08

As seen in Table V, the files generated by ADiGator are quite efficient at run-time.Moreover, the derivative files generated by ADiGator are only valid for a fixed dimen-sion. Thus, one cannot disregard file generation times. In order to investigate the ef-ficiency of the ADiGator transformation routine, absolute derivative file generationtimes together with absolute objective file evaluation times are given in Table VI. Thistable shows that the cost of generation of the objective gradient file, ∂f/∂x, and gradi-ent Jacobian file, ∂g/∂x, are relatively small, while the cost of generating the objectiveHessian file becomes quite expensive at n = 16384. Simply revealing the file generationtimes, however, does not fully put into perspective the trade-off between file generationtime costs and run-time efficiency gains. In order to do so, a “cost of derivative com-putation” metric is formed, based off of the number of Hessian evaluations required tosolve the GL2 minimization problem. To this end, the GL2 problem was solved usingthe MATLAB unconstrained minimization solver, fmincon, in the full-Newton modewith ADiGator supplied derivatives, and the number of required Hessian computa-tions was recorded. Using the data of Tables V and VI, the metric was computed as thetotal time to perform the k required Hessian computations, using all of the tested ADtools. The results from this computation are given in Table VII, where two costs aregiven for using the ADiGator tool, one of which takes into account the time requiredto generate the derivative files. Due to the relatively low number of required Hessian

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 19: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:19

evaluations, it is seen that the ADiGator tool is not always the best option when onefactors in the file generation time. That being said, for this test example, files whichcompute the objective gradient and Hessian sparsity pattern are readily available andthus the time required to generate them (automatically or otherwise) is not considered.Moreover, when using the ADiGator tool, one obtains the Hessian sparsity pattern andan objective gradient file as a direct result of the Hessian file generation.

Table VI: ADiGator file generation times and objective function evaluation times for 2-D Ginzburg-Landau problem. Shown is the time required for ADiGator to perform thetransformations: objective function f to an objective gradient function ∂f/∂x, gradientfunction g to Hessian function ∂g/∂x, and gradient function ∂f/∂x to Hessian function∂2f/∂x2.

n: 16 64 256 1024 4096 16384ADiGator File Generation Time (s)∂f/∂x: 0.51 0.51 0.52 0.53 0.58 0.90∂g/∂x: 2.44 2.51 2.51 2.57 2.89 4.33

∂2f/∂x2: 2.12 2.13 2.23 2.33 4.85 37.75Objective Function Evaluation Time (ms)

f : 0.2795 0.2821 0.2968 0.3364 0.4722 1.2611

7.4. Large Scale Nonlinear Programming

Consider the following nonlinear program (NLP) that arises from the discretizationof a scaled version of the minimum time to climb (of a supersonic aircraft) optimalcontrol problem described in Darby et al. [2011]. The problem is discretized using themultiple-interval formulation of the Legendre-Gauss-Radau (LGR) orthogonal colloca-tion method as described in Garg et al. [2010]. This problem was studied in Weinsteinand Rao [2016] and is revisited in this paper as a means of investigating the use of thevectorized mode of the ADiGator algorithm. The problem is to determine the values ofthe vectorized variable X ∈ R4×N ,

X =

[

YU

]

, Y ∈ R3×N , U ∈ R

1×N , (15)

the support points s ∈ R3, and the parameter β, which minimize the cost function

J = β (16)

subject to the nonlinear algebraic constraints

C(X, s, β) = [Y s]DT − β

2F(X) = 0 ∈ R

3×N (17)

and simple bounds

Xmin ≤ X ≤ Xmax, smin ≤ s ≤ smax, βmin ≤ β ≤ βmax. (18)

The matrix DN×(N+1) of Eq. (17) is the LGR differentiation matrix [Garg et al. 2010],and the function F(X) : R4×N → R3×N of Eq. (17) is the vectorized function

F(X) = [ f(X1) f(X2) · · · f(XN ) ] , (19)

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 20: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:20 M. J. Weinstein, and A. V. Rao

Table VII: Cost of differentiation for 2-D Ginzburg-Landau problem. A “cost of differen-tiation” metric is given as the time required to perform k Hessian evaluations, wherek is the number of Hessian evaluations required to solve the GL2 optimization prob-lem using fmincon with a trust-region algorithm and function tolerance of

√ǫmachine.

Results are presented for three cases of computing the Hessian: sparse-finite differ-ences over AD, AD over the hand-coded gradient, and AD over AD. Times listed forADiGator∗ are the times required to compute k Hessians plus the time required togenerate the derivative files, as given in Table VI.

n: 16 64 256 1024 4096 16384# Hes Eval: 9 11 26 21 24 26

# colors: 16 20 23 25 24 25Cost of Differentiation: sfd over AD (s)ADiGator: 0.27 0.41 1.29 1.76 3.08 10.23

ADiGator∗: 0.78 0.92 1.81 2.29 3.66 11.13ADiMat: 3.71 5.51 14.99 12.63 15.00 18.19

INTLAB: 2.90 4.37 12.11 11.04 16.36 34.94MAD: 5.29 7.90 21.90 20.73 40.33 205.34hand: 0.16 0.27 0.78 0.70 1.08 2.17

Cost of Differentiation: AD over Hand-Coded (s)ADiGator: 0.08 0.12 0.30 0.28 0.56 1.65

ADiGator∗: 2.52 2.63 2.81 2.85 3.45 5.98ADiMat: 1.59 2.65 7.22 6.37 8.29 13.79

INTLAB: 1.30 1.65 3.97 3.25 4.69 8.17MAD: 2.25 2.72 6.47 5.12 6.57 8.78

Cost of Differentiation: AD over AD (s)ADiGator: 0.02 0.03 0.10 0.14 0.51 2.06

ADiGator∗: 2.65 2.68 2.85 3.01 5.94 40.71ADiMat: 2.38 2.87 6.86 5.79 8.24 12.88

INTLAB: 0.26 0.32 1.07 14.85 535.61 †MAD: 1.34 1.64 4.51 13.76 223.42 †

*Includes the cost of ADiGator file generation.† Could not be computed due to memory overruns.

where Xi ∈ R4 refers to the ith column of X and f : R4 → R3 is defined as

f1(x) = x2 sinx3,

f2(x) = 1c1

(

ζ(

T (x1), x1, x2

)

− θ(

T (x1), ρ(x1), x1, x2

)

)

− c2 sinx3,

f3(x) = c2x2

(x4 − cosx3)

. (20)

The functions ζ(T, x1, x2) and θ(T, ρ, x1, x2) respectively compute coefficients of thrustand drag, as described in Seywald et al. [1994].. Moreover, the functions T (h) and ρ(h)are modified from the smooth functions of Darby et al. [2011] to the following piecewisecontinuous functions taken from NOAA [1976]:

ρ(h) = c3p(h)

T (h), (21)

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 21: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:21

where

(T (h), p(h)) =

(

c4 − c5h, c6

[

T (h)c7

]c8)

, h < 11,

(c9, c10ec11−c12h) , otherwise.

(22)

Unlike the implementation considered in Weinstein and Rao [2016], Eq. (22) is imple-mented as a sequence of logical reference and assignment operations.

In the first portion of the analysis of this problem, the NLP of Eqs. (16)–(18) is solvedfor increasing values of N . The number of LGR points in each interval is fixed to fourand the number of mesh intervals, K, is varied. The number of LGR points is thusN = 4K. The NLP is first solved on an initial mesh with K = 4 intervals. The num-ber of mesh intervals is then doubled sequentially, where the result of the solution ofthe previous NLP is used to generate an initial guess to the next NLP. Mesh intervalsare equally spaced for all values of K = 4, 8, 16, 32, 64, 128, 256, 512, 1024 and the NLPis solved with the NLP solver IPOPT [Biegler and Zavala 2008; Waechter and Biegler2006] in both the quasi-Newton (first-derivative) and full-Newton (second-derivative)modes. Moreover, derivatives of the NLP were computed via ADiGator using two dif-ferent approaches. In the first, non-vectorized, approach, the ADiGator tool is applieddirectly to the function which computes C(X, s, β) of Eq. (17) to compute the constraintJacobian and the Lagrangian Hessian. In the second, vectorized, approach, the ADi-Gator tool is applied in the vectorized mode to the function which computes F(X). TheADiGator computed derivatives of F(X) are then used to construct the NLP constraintJacobian and Lagrangian Hessian (using discretization separability as described inBetts [2009]). Results of the tests are shown in Table VIII. In the presented table, so-lution times are broken into two different categories, initialization time and NLP solvetime. When using the non-vectorized approach, the initialization time is the time re-quired to generate derivative files prior to solving each NLP. When using the vectorizedapproach, derivative files must only be generated a single time (shown as mesh # 0).The initialization time required of the vectorized approach at each subsequent meshiteration is the time required to compute the derivative of the linear portion of C (thatis, [X s]DT) plus the time required to determine sparsity patterns of the constraintJacobian and Lagrangian Hessian, given sparsity patterns of ∂f/∂x and ∂2f/∂x2. Itis seen in Table VIII for both quasi-Newton and full-Newton solutions that the useof the vectorized mode reduces both initialization times and NLP run times. The rea-son for the reduction in initialization times is fairly straightforward: when using thevectorized mode, derivative files are valid for any value of N and thus must only begenerated a single time. The reason for the reduction in run-times is two-fold. First, byseparating the nonlinear portion of C from the linear portion of C, the first derivativesof the linear portion must only be computed a single time for each mesh, rather than atrun-time. Next, as discussed in Section 5, run-time indexing overheads are effectivelyreduced by an order of N when using the vectorized mode over the non-vectorized. Thisreduction of run-time indexing overheads is greatly emphasized in the results from thefull-Newton mode, where many more indexing operations are required at the secondderivative level than at the first.

In the next part of the analysis of this example the vectorized function F(X) wasdifferentiated for increasing values of N using a variety of well-known MATLAB ADtools. At the first-derivative level, the ADiGator tool was used in the vectorized andnon-vectorized modes, ADiMat was used in the compressed scalar forward mode, INT-LAB was used in the sparse forward mode, and MAD was used in the compressedforward mode. At the second-derivative level, the ADiGator tool was again used in thevectorized and non-vectorized modes, ADiMat was used in the compressed forward

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 22: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:22 M. J. Weinstein, and A. V. Rao

Table VIII: NLP Solution Times for Minimum Time to Climb using IPOPT and ADi-Gator. The NLP is solved for increasing values of K using IPOPT in both the quasi-Newton and full-Newton modes. For both modes, the ADiGator tool is used in twodifferent ways. In the non-vectorized case (labeled non-vect), ADiGator is applied di-rectly to the functions of the NLP. In the vectorized case (labeled vect), ADiGator isapplied in the vectorized mode to the function F(X).

mesh #: 0 1 2 3 4 5 6 7 8 9 TotalK: 4 8 16 32 64 128 256 512 1024

Qu

asi-

New

ton # jac eval: - 63 63 93 92 92 71 62 145 52 733

Initialization Time (s)non-vect: - 1.18 1.17 1.18 1.18 1.19 1.23 1.35 1.62 2.27 12.4

vect: 0.97 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 1.0NLP Solve Time (s)

non-vect: - 0.59 0.61 0.96 1.12 1.45 1.65 2.37 9.41 7.00 25.2vect: - 0.56 0.51 0.82 0.95 1.21 1.35 1.95 7.82 5.80 21.0

Fu

ll-N

ewto

n

# jac eval: - 41 14 17 17 18 18 20 20 23 188# hes eval: - 40 13 16 16 17 17 19 19 22 179Initialization Time (s)

non-vect: - 5.37 5.38 5.43 5.47 5.64 6.06 7.40 11.09 24.04 75.9vect: 4.59 0.00 0.00 0.00 0.00 0.01 0.03 0.13 0.46 1.78 7.0

NLP Solve Time (s)non-vect: - 0.86 0.40 0.48 0.52 0.66 0.89 1.48 2.50 5.73 13.5

vect: - 0.85 0.23 0.30 0.32 0.41 0.54 0.90 1.51 3.33 8.4

over scalar reverse mode (with operator overloading for the forward computation),INTLAB was used in the sparse second-order forward mode, and MAD was used inthe compressed forward over compressed forward mode. Results are shown in Fig. 2.At the second-derivative level, the ADiMat tool was used to compute ∂2

λTF†/∂X2,

where λ ∈ R3N and F† ∈ R3N is the one-dimensional transformation of F. All othertools were used entirely in the forward mode, and thus were simply used to compute∂2F/∂X2. The computation time CPU(∂2

λTF†/∂X2) was then computed as the time

required to compute ∂2F/∂X2 plus the time required to pre-multiply ∂2F†/∂X2 by λT.

It is noted that, where relevant, the number of colors used for both the first and sec-ond derivative is always equal to four as the vectorized problem has the convenientpre-defined seed matrix given in Eq. (9). As seen in Fig. 2, the ratios for all testedtools tend to decrease as the value of N is increased. This increase in efficiency is dueto a reduction in the relevance of run-time overheads (this is very apparent for theoperator overloaded INTLAB and MAD tools at the first derivative) together with thefact that the derivatives become sparser as N is increased (and the number of colorsremains constant). When comparing the results obtained from using ADiGator in thevectorized mode versus the non-vectorized mode, it is seen that the ratios diverge asthe dimension of the problem is increased. These differences in computation times aredue strictly to run-time overheads associated with reference and assignment indexing.

8. DISCUSSION

The presented algorithm has been designed to perform a great deal of analysis attransformation-time in order to generate the most efficient derivative files possible.The results of Section 7 show these ADiGator generated derivative files to be quiteefficient at run-time when compared to other well-known MATLAB AD tools. The pre-sented results also show, however, that transformation times can become quite largeas problem dimensions increase. This is particularly the case when dealing with func-tions containing explicit loops (for example, Burgers’ ODE) or those which perform

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 23: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:23

log2(N)

ADiGator (vect)

ADiGator (non-vect)

log2

(

CP

U(∂F/∂

X)/

CP

U(F

))

ADiMat

INTLAB

MAD

2

3

4

4

5

5

6

6

7

7 8 9 10 11 12

(a) log2(

CPU(∂F/∂X)/CPU(F))

vs. log2(N)

log2(N)

ADiGator (vect)

ADiGator (non-vect)

log2

(

CP

U(∂

2λTF

† /∂X

2))/C

PU(f)) ADiMat

INTLAB

MAD

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10 11 12

(b) log2(

CPU(∂2λTF†/∂X2))/CPU(f)

)

vs. log2(N)

Fig. 2: Jacobian and Hessian to Function CPU Ratios for Minimum Time to ClimbVectorized Function. All ratios given in binary logarithm format.

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 24: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

A:24 M. J. Weinstein, and A. V. Rao

matrix operations (for example, the GL2 minimization problem). Even so, the ADi-Gator algorithm is well suited for applications requiring many repeated derivativecomputations, where the cost of file generation becomes less significant as the numberof required derivative computations is increased.

The fact that the method has been implemented in the MATLAB language is both anadvantage and a hindrance. Due to MATLAB’s high level array and matrix operations,the corresponding overloaded operations are granted a great deal of information attransformation-time. The overloaded operations can thus print derivative proceduresthat are optimal for the sequence of elementary operations which the high-level oper-ation performs, rather than printing derivative procedures that are optimal for each ofthe individual elementary operations. In order to exploit derivative sparsity, however,the overloaded operations print derivative procedures which typically only operate onvectors of non-zero derivatives. Moreover, the derivative procedures rely heavily onindex array reference and assignment operations (for example, a(ind), where ind isa vector of integers). Due to the interpreted nature of the MATLAB language, suchreference and assignment operations are penalized at run-time by MATLAB’s arraybounds checking mechanism, where the penalty is proportional to the length of theindex vector (for example, ind) [Menon and Pingali 1999]. Moreover, the length of theused index vectors are proportional to the number of non-zero derivatives at each linkin the chain rule. Thus, as problem sizes increase, so do the derivative run-time penal-ties associated with the array bounds checking mechanism. The increase in run-timeoverheads is manifested in the the results of Fig. 1 and Table V of Sections 7.1 and7.3, respectively. For both problems, the Jacobians become relatively more sparse asthe problem dimensions are increased. One would thus expect the Jacobian to functionCPU ratios to decrease as problem dimensions increase. The witnessed behavior of theADiGator tool is, however, the opposite and is attributed to the relative increase inrun-time overhead due to indexing operations. When studying the vectorized problemof Section 7.4, the indexing run-time overheads may actually be quantified as the dif-ference in run times between the vectorized and non-vectorized generated files. Fromthe results of Fig. 2, it is seen that, at small values of N , the differences in run timesare negligible. At N = 4096, however, the non-vectorized ADiGator-generated first andsecond derivative files spent at least 32% and 42%, respectively, of the computationtime performing array bounds checks from indexing operations.

9. CONCLUSIONS

A toolbox called ADiGator has been described for algorithmically differentiating math-ematical functions in MATLAB. ADiGator statically exploits sparsity at each link inthe chain rule in order to produce run-time efficient derivative files, and does not re-quire any a priori knowledge of derivative sparsity, but instead determines derivativesparsity as a direct result of the transformation process. The algorithm is describedin detail and is applied to four examples of varying complexity. It is found that thederivative files produced by ADiGator are quite efficient at run-time when comparedto other well-known AD tools. The generated derivative files are, however, valid onlyfor fixed dimensional inputs and thus the cost of file generation cannot be overlooked.Finally, it is concluded that the ADiGator tool is well suited for applications requiringmany repeated derivative computations.

REFERENCES

ASCHER, U., MATTHEIJ, R., AND RUSSELL, R. 1995. Numerical Solution of Boundary Value Problems forOrdinary Differential Equations. Society for Industrial and Applied Mathematics, Philadelphia, Penn-sylvania.

AVERICK, B. M., CARTER, R. G., AND MORÃL’, J. J. 1991. The minpack-2 test problem collection.

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 25: A Algorithm xxx: ADiGator, a Toolbox for the Algorithmic ... · Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:3 loading,

Algorithm: ADiGator, a Toolbox for the Algorithmic Differentiation of Mathematical Functions in MATLAB A:25

BETTS, J. T. 2009. Practical Methods for Optimal Control and Estimation Using Nonlinear Programming,2 ed. SIAM Press, Philadelphia.

BIEGLER, L. T. AND ZAVALA, V. M. 2008. Large-scale nonlinear programming using IPOPT: An integratingframework for enterprise-wide optimization. Computers and Chemical Engineering 33, 3 (March), 575–582.

BISCHOF, C. H., BÜCKER, H. M., LANG, B., RASCH, A., AND VEHRESCHILD, A. 2002. Combining sourcetransformation and operator overloading techniques to compute derivatives for MATLAB programs. InProceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation(SCAM 2002). IEEE Computer Society, Los Alamitos, CA, USA, 65–72.

COLEMAN, T. F. AND VERMA, A. 1998a. ADMAT: An Automatic Differentiation Toolbox for MATLAB. Tech-nical Report. Computer Science Department, Cornell University.

COLEMAN, T. F. AND VERMA, A. 1998b. The efficient computation of sparse jacobian matrices using auto-matic differentiation. SIAM Journal on Scientific Computing 19, 4, 1210–1233.

DARBY, C. L., HAGER, W. W., AND RAO, A. V. 2011. Direct trajectory optimization using a variable low-order adaptive pseudospectral method. Journal of Spacecraft and Rockets 48, 3 (May–June), 433–445.

FORTH, S. A. 2006. An efficient overloaded implementation of forward mode automatic differentiation inMATLAB. ACM Transactions on Mathematical Software 32, 2 (April–June), 195–222.

GARG, D., PATTERSON, M. A., HAGER, W. W., RAO, A. V., BENSON, D. A., AND HUNTINGTON, G. T. 2010. Aunified framework for the numerical solution of optimal control problems using pseudospectral methods.Automatica 46, 11 (November), 1843–1851.

GRIEWANK, A. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation.Frontiers in Appl. Mathematics. SIAM Press, Philadelphia, Pennsylvania.

GRIEWANK, A. 2014. Automatic Differentiation. Princeton Companion to Applied Mathematics, NicolasHigham Ed., Princeton University Press.

HUANG, W., REN, Y., AND RUSSELL, R. D. 1994. Moving mesh methods based on moving mesh partialdifferential equations. J. Comput. Phys 113, 279–290.

KHARCHE, R. V. AND FORTH, S. A. 2006. Source transformation for MATLAB automatic differentiation. InComputational Science – ICCS, Lecture Notes in Computer Science, V. N. Alexandrov, G. D. van Albada,P. M. A. Sloot, and J. Dongarra, Eds. Vol. 3994. Springer, Heidelberg, Germany, 558–565.

LENTON, K. 2005. An efficient, validated implementation of the MINPACK-2 test problem collection inMATLAB. M.S. thesis, Cranfield University (Shrivenham Campus), Applied Mathematics & Opera-tional Research Group, Engineering Systems Department, RMCS Shrivenham, Swindon SN6 8LA, UK.

MATHWORKS. 2014. MATLAB Version R2014b. The MathWorks Inc., Natick, Massachusetts.MENON, V. AND PINGALI, K. 1999. A case for source-level transformations in MATLAB. SIGPLAN Not. 35, 1

(Dec.), 53–65.NOAA. 1976. U. S. standard atmosphere, 1976. National Oceanic and Amospheric Administration, Wash-

ington, D.C.PATTERSON, M. A., WEINSTEIN, M. J., AND RAO, A. V. 2013. An efficient overloaded method for computing

derivatives of mathematical functions in MATLAB. ACM Transactions on Mathematical Software, 39, 3(July), 17:1–17:36.

RUMP, S. M. 1999. Intlab – interval laboratory. In Developments in Reliable Computing, T. Csendes, Ed.Kluwer Academic Publishers, Dordrecht, Germany, 77–104.

SEYWALD, H., CLIFF, E. M., AND WELL, K. H. 1994. Range optimal trajectories for an aircraft flying in thevertical plane. Journal of Guidance, Control, and Dynamics 17, 2 (2015/03/04), 389–398.

SHAMPINE, L. F. AND REICHELT, M. W. 1997. The MATLAB ODE suite. SIAM journal on scientific comput-ing 18, 1, 1–22.

WAECHTER, A. AND BIEGLER, L. T. 2006. On the implementation of a primal-dual interior-point filter linesearch algorithm for large-scale nonlinear programming. Mathematical Programming 106, 1 (March),575–582.

WEINSTEIN, M. J. AND RAO, A. V. 2016. A source transformation via operator overloading method for theautomatic differentiation of mathematical functions in MATLAB. ACM Transactions on MathematicalSoftware 42, 2, 11:1–11:44.

ACM Transactions on Mathematical Software, Vol. V, No. N, Article A, Publication date: January YYYY.