compilationaspectsofautomaticdifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf ·...

20
Compilation aspects of Automatic Differentiation by examples Christèle Faure * Presented at "Les séminaires du projet A3" in March 2000 1 Introduction This document describes challenging problems general enough to serve as target applications to the compilation community. This tentative description is rather naïve in the vocabulary used as well as in the examples given. In our point of view, Automatic Differentiation could be an ideal application domain for compilation techniques. But one must stay really close to the final objective of Automatic Differentiation which is the extraction of derivative information from an original code. From a practical point of view, there is no point to get really complicated information if the process of generation of the derivative code cannot take advantage of them. An other key point is that the AD tools must handle really large codes (100000 lines), so precise analysis can only be performed on small parts of the original code. We try to give samples (written in Fortran 77) for all the points which are developed in this document. These examples come from various applications treated with Odyssée, but are general enough to be representative of AD problems. 2 AD basic AD technology is based on the use of the chain rule to differentiate a program as a the composition of elementary functions. Each elementary function is implemented within a statement which is easily differentiated as shown in Section 2.1. From this, a straight line program is also derived in a straight forward manner as shown in Section 2.2. The only thing to be dealt with is the overwritting of variables in reverse mode. A general program can be differentiated is the control is managed. In reverse mode, the management of the control is made difficult by the necessity to go back in the computation as shown in Section 2.3. 2.1 One assignment The assignment A shown in Figure 1(a) can be seen as the mathematical elementary function f which input and output domains are R 2 and R. In order to built up the composition of the elementary functions corresponding to a sequence of state- ments, one has to extend the input and output domains: R 3 and R 3 in our example. This leads to consider all the variables involved in the computation as potential input and output. * Email : [email protected] 1

Upload: others

Post on 30-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

Compilation aspects of Automatic Differentiation by examples

Christèle Faure∗

Presented at "Les séminaires du projet A3" in March 2000

1 Introduction

This document describes challenging problems general enough to serve as target applications to thecompilation community. This tentative description is rather naïve in the vocabulary used as well as inthe examples given. In our point of view, Automatic Differentiation could be an ideal application domainfor compilation techniques.

But one must stay really close to the final objective of Automatic Differentiation which is theextraction of derivative information from an original code. From a practical point of view, thereis no point to get really complicated information if the process of generation of the derivative code cannottake advantage of them. An other key point is that the AD tools must handle really large codes (100000lines), so precise analysis can only be performed on small parts of the original code.

We try to give samples (written in Fortran 77) for all the points which are developed in this document.These examples come from various applications treated with Odyssée, but are general enough to berepresentative of AD problems.

2 AD basic

AD technology is based on the use of the chain rule to differentiate a program as a the compositionof elementary functions. Each elementary function is implemented within a statement which is easilydifferentiated as shown in Section 2.1. From this, a straight line program is also derived in a straightforward manner as shown in Section 2.2. The only thing to be dealt with is the overwritting of variablesin reverse mode. A general program can be differentiated is the control is managed. In reverse mode,the management of the control is made difficult by the necessity to go back in the computation as shownin Section 2.3.

2.1 One assignment

The assignment A shown in Figure 1(a) can be seen as the mathematical elementary function f whichinput and output domains are R2 and R.In order to built up the composition of the elementary functions corresponding to a sequence of state-ments, one has to extend the input and output domains: R3 and R3 in our example. This leads toconsider all the variables involved in the computation as potential input and output.

∗Email : [email protected]

1

Page 2: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

The product of the Jacobian matrix of A by some direction (dX, dY, dZ)i can be easily built. Figure 2shows two equivalent representations of this computation using mathematical notation: Figure 2(a) showsthe matrix computation and Figure 2(b) shows the equivalent scalar computation where the subscriptsi,o indicate input and output respectively.

The product of the transposed Jacobian matrix of F by the direction in the dual space (dX?, dY ?, dZ?)i

can also be easily built. Figure 3 shows two equivalent representations of this computation in the samemanner as Figure 2.

From the two scalar computations shown in Figure 2(b) and Figure 3(b), one can simply get the cor-responding tangent and cotangent straight line programs. We introduce the i and o subscripts becausea mathematical variable can be set but can never be modified in place. On the contrary, computervariables are memory locations and can therefore be modified in place. In order to optimise the codein terms of number of intermediate variables, the Xi and Xo variables are identified and denoted by X.The statement dXo = dXi is then transformed into dX = dX and discarded. Figures 1(b) and 1(c) showrespectively the optimised tangent code and the optimised cotangent code of A.

2.2 Straight line programs

Let consider two new statements B (Example 4(a)) and C (Example 5(a)). They are both differentiatedusing the same method as for A as shown in Example 4 and Example 5.

Sequence A;B (Example 6(a)) can be differentiated in a straight forward way using the derivatives of Aand B because both computations are independents. The tangent code (A; B)′ (Example 6(b)) of A; Bis only the sequence A′;B′ computed with the initial values X0 and Y0 of X and Y. The cotangent code(A; B)? is B?; A? computed with the initial values of X and Y. The only constraint here is to run B afterB′ and B?.

On the contrary, sequence B; A introduces a dependency between B and A because Y is computed in Band used in A. Let denote by Y0 the initial value of Y and Y1 its value after the execution of B: B′

and B? must be evalauted on Y=Y0 whereas A′ and A? must be evaluated on Y=Y1. This dependencyconstraint on the original variables leads to the generation of (B; A)′ as shown in Example 7(b): as aresult statement B is executed between B′ and A′. This is the general method for generating the tangentcode: the derivative statement is executed before the original one. In reverse mode a new constraint isadded: if statement B is executed before A in the original code, B? must be executed after A?. As aresult some storage (or recomputation) must be performed to have the correct value of Y at the righttime. If one stores the two necessary values Y0 and Y1, the generated code is Example 7(c). The generalmethod for generating cotangent codes is to execute first the forward part: the original function plusthe storage of the modified necessary values and then the Äbackward part: the derivative function plusthe retrieval of the modified necessary values.

The generation methods illustrated in Example 7 are the standard method: they are certainly correctbut under-optimal in some cases.

2.3 Complex statement

The control used in the original source is reproduced in the derivative code. For example, the loopExample 8(a) is differentiated in Example 8(b) in direct mode: the loop is reproduced and the body A isreplaced by A′. In reverse mode, the same loop can be differentiated using various method: Example 8(c)or Example 8(d). In reverse mode, the number of scalar values to be stored are quite different from oneexample to the other example.

The same is performed for if-then-else, dowhile commands ... As for the straight line program differ-entiation these strategies could certainly be refined to get optimal (storage or recomputation) derivative

2

Page 3: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

codes. For example in reverse mode a problem is: what is the smallest amount of control values to bestored to be able to follow the call-graph backward.

3 AD internal

In this document, we are only looking at AD as a compile-time technique. Any of those tools is based onthe composition of phases which are transformation phases as well as analysis phases. In this way ADcan be seen as a specific source transformation. The main difference with other general transformation(translation from a language to an other, optimisation ...) is that the generated code does not onlycomputes the original values but some others which have a mathematical meaning. An other crucialdifference is that the analysis to be performed are not local but global all over the program.

In the following sections, we describe briefly the main different phases necessary to differentiate a code.

3.1 Pre-process

This phase consists of a standardisation of the original code before differentiation. The transformationperformed at this level depends on the treated language as well as the limitations of the system.

Within Odyssée this phase consists in:

• rewriting the non-differentiable operators abs, min into if-statements,

• in-lining of the function statements,

• splitting of expressions into equivalent binary expressions by introducing intermediate variables.

3.2 AD dependency analysis

During this phase, the system makes an inter-procedural analysis of the program, which results in adependency graph between the input/output variables of each unit. Only variable names are considered,even if the dependencies between components of array would help.

The information required to differentiate a routine is: at each point in a program which variables are(possibly) impacted by the active inputs chosen by the user. From this the system is able to extract twoinformation:

1. which variable must be associated to a derivative at each point in the program ,

2. which variable is modified by each sub-program of the program.

The first information depends on the active variables chosen by the user, whereas the second one dependsonly on the original code.

Within Odyssée, two passes are used to generated the whole information. During the first pass (bottom-up in the call-graph) the dependencies read/write are computed. During the second pass (top-down inthe call-graph) the activity flag is propagated from the input of the top sub-program to all the othersub-programs.

3

Page 4: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

3.3 Trajectory analysis

This information is necessary to generate a code that stores all the modified variables to reproduce theexecution by only restoring these variables. Such a code is absolutely necessary to the reverse mode ofAD.

Within the present version of Odyssée, all the variables modified by the original code are stored, as wellas the context of all sub-program called. The storage analysis is then equivalent to the AD dependenciesanalysis if you consider all the inputs as active.

3.4 Generation of the derivative code

This phase aims at generating the derivative code which computes the derivatives as well as the originalvalues. At this level, the unit by unit differentiation of the program is done, according to the followingtwo rules.

3.5 Post-process

The fourth phase aims at simplifying the resulting code. The algebraic expressions are simplified in thesame way as in Computer Algebra Systems except that the arguments of sums and products cannot bereordered modulo associativity and commutativity. First, each sub-program of the program is differen-tiated with respect to its input variables, and not with respect to the input variables of the head-unit.Secondly, the differentiation is maximal in the sense that a sub-program is differentiated with respectto the maximum set of input variables appearing in every call statement. In this way, only one sub-program for each unit of the original program. The methods used to differentiate a sub-program withinOdyssée are described by examples in [FP98]. Other methods which should be automatized within ADtools are described in [Fau99].

4 Norm verification

As described in the following section, some default within the original code accepted by compilers leadto great trouble using the reverse mode. The main reason is that the status (read, written) of the adjointvariables is the transposed status of the corresponding original variable.

If the compilers where able to check for these problems, AD would be a lot more straight forward.

4.1 Aliasing through calls to sub-programs

The derivative in reverse mode of one assignment is different depending on the fact that the variable isoverwritten. Let consider the two instructions Example 9(a) and Example 9(b): the first one inducesno rewriting whereas the second one rewrites Y. Example 10(a) shows the cotangent code when theassigned variable y is not overwritten, Example 10(b) shows the corresponding derivative code when yis overwritten. This dependency is easy to detect for scalar variables, a little bit more difficult for arraybut can still be handled.

If the same problem appears through a call to a sub-program this becomes really difficult.

Let look at the following example named cumul (X,Y,A,B,N) witch computes ∀i ∈ [1..N ]Y (i) = X(i) +A(i) ∗B(i) as shown in Example 11(a). The cotangent code of cumul with respect to x,a,b is shown in

4

Page 5: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

Example 11(b).

If cumul is called with X and Y pointing to the same memory zone like with call cumul (Y,Y,A,B,N),the corresponding call to the cotangent code call cumulcl (Y,Y,A,B,N,YCCL,YCCL,ACCL,BCCL) willnot compute the correct derivatives. It will compute YCLL(i) correctly but reset it to zero afterwards.When this problem arises in a large code, it is really difficult to detect.

One must notice that this aliasing case has no effect one the derivatives generated in direct mode.Essentially, the direct mode will translate the aliasing pattern from the original variables to the derivativeones.

The first problem is to detect if this case happens in the code to be differentiated.

AD tools have chosen to generate a unique derivative for each original sub-program. In this case, thereis no way to generate a cotangent code correct for any aliasing pattern. The only possibility is then tointroduce intermediate variables in the original code or in the derivative code.

A second possibility would be to generate clones of the original aliased sub-program and to call them atthe correct points in the original code. Then the automatic differentiation will be standard.

A third possibility is to generate a derivative with one entry for each aliasing pattern. The derivativecode would test the aliasing pattern and use the correct entry.

4.2 Implicit aliasing

The code Example 14(a) is to be differentiated with respect to x, a, b. In this piece of code, thevariables y2, y3 are modified even though it does not appear clearly in the source. As a result, only y1depends on x which leads to a two levels problem:

1. no derivative is associated to y2, y3,

2. the values of y2,y3 are nor stored nor restored.

The first problem leads to wrong tangent and cotangent derivatives. Example 15(a) shows that noderivative is associated to y3. The correct tangent code Example 15(b) can only be generated if y3depends on x, a, b. It is the same in reverse mode even if it is less clear.

The second problem only influences the cotangent derivatives: y2 is nor set before the call to cumulnor reset before the call to cumulcl as shown in Example 16(a). The correct version Example 16(b)introduces the necessary trajectory management statements as well as the supplementary derivativecomputation.

These problems are really difficult to isolate within a large derivative code. Is such an error possible todetect on the original code ?

5 Derivative code optimisation

5.1 Selection of output derivatives

Until now, within Odyssée one is able to generate the derivative of a sub-program P (the whole sub-treefrom the routine P in the call tree) with respect to some of its inputs I. In general, what is really neededis the derivative of some outputs O of P with respect these inputs I. The code generated by Odyssée is

5

Page 6: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

not optimal because it computes the derivative of all the outputs instead of a subset O. This problemcan be solved by slicing the original code, or the generated code.

Example 17 shows the original code as well as the standard tangent code with respect to X1 generatedby Odyssée. Let consider a first case for which only the derivatives of Z with respect to X1 are needed.The automatically generated code Example 17(b) is clearly not optimal.

Example 19 shows the original code Example 17(a) sliced with respect the output variable Z in Exam-ple 18(a) and then differentiated with respect to X1 in Example 18(b). This result is better compared tothe initial derivative, but can still be optimised because the computation of Z is not required any more.On the contrary, if the tangent code Example 17(b) is sliced with respect to ZTTL then the derivativecode Example 19(a) is minimal.

Let call pre-slicing the combination slicing w.r.t original variables and automatic differentiation, andpost-slicing the combination automatic differentiation followed by slicing w.r.t derivative variables. Ingeneral, the optimal method is the post-slicing. On some examples, it is even the only way to minimisethe computation. Let consider a second case for which only the derivatives of K with respect to X1 arerequired. The pre-slicing will lead to no gain whereas the post-slicing will result in Example 19(b) wherethe computation of Y, K have been removed.

The main drawback of the post-slicing is that it is applied on a generated code that can be a lot largerthat the original one and the slicing process would certainly be expansive. For example, the number ofvariables is doubled (without considering the store variables) and the number of lines is multiplied by 2in direct mode and by n in reverse mode (where n is the number of active variables in the right handside of the assignment). In reverse mode, a second problem arises as original and cotangent statementscan be far from one another.

A specific algorithm for slicing the derivative when it is generated would be the best solution. Such atechnique should predict the “to be sliced statements” and tell the system not to generate it into thederivative code.

5.2 Trajectory management optimisation

In reverse mode, the generated code has to store and retrieve a large amount of values. We have proventhat on a large example this pop/push of scalar values are 1/3 of the total execution time (see [Fau98]).

The problem is the same for the direct and reverse part with transposed constraints. The constraintsdue to the chosen adjoining strategy are not studied here but certainly have to be taken into account.For clarity reasons, we only consider the direct parts of the calculation: original code plus trajectorystorage.

The standard strategy is to store the values of all the modified variables before modification, the morethis analysis is precise the minimal the trajectory is. An other remark is that the user will run largerand larger test cases. As a conclusion one can say that the trajectory analysis has to be more and moreprecise to minimise its size, but it does not exclude trajectory management optimisation.

The first idea is that the scalar pop and push commands could be gathered to use array pop and pushcommands. This can be viewed as a bufferization of the values before the whole buffer is pushed, exceptthat a various sized buffer is used. If a push of n scalar values is cheaper in time than n pushs of a scalarvalues, the derivative code will be optimised. One can notice that one or zero pop statement is added toeach original statement in a standard cotangent code. It is possible to let the push statements float (aslatex figures) along the original code to gather some of them.

Let call trajectory management ratio the ratio between the number of trajectory moves push, popand the number of original operations +,*,sin. In Example 20(a) each iteration of the loop pushesone scalar value and computes one scalar value. This is certainly penalising because the trajectory

6

Page 7: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

management ratio n/n is high, in Example 20(b) the trajectory management ratio is 1/n In the sameway, the Example 20(c) can be optimised into Example 20(d).

The second idea is that the original computation could be run on one processor and the push statementscould be executed on a second processor.

5.3 General checkpointing

Let consider you want the adjoint code of a loop of length n Example 21(a) that modifies some variableY. You can either store the n values of Y like in Example 21(b) or recompute them from the initial valuelike in Example 21(c). Using the first solution you compute only n times F but store the n values of Y ,whereas using the second solution you compute n + n(n + 1)/2 computations of F but store only onevalue of Y .

A trade-off between execution time and memory requirement is called checkpointing. It consists instoring some values of Y in checkpoints and recomputing the others values from these checkpoints likein Example 21(d).

If you have N < n checkpoints to reverse a loop with n steps, it is possible to compute their optimalrepartition that minimises the recomputation of F : the next_check choose the optimal value. Thisoptimal schedule is described in different papers [GPRS96, Gri92] and uses the hypothesis that eachiteration execution cost t, and that the size of the values to checkpoint Y is the same.

Until now, the optimal checkpointing method has been only applied on explicit loops. If one wants touse it in a more general way, two problems arise:

1. the original execution has to be split into equal measure pieces,

2. the code that recomputes the original function from one checkpoint to an other must be written.

What is the smallest set of variables to be stored at one checkpoint to be able to recompute the originalfunction until the next checkpoint ?

5.4 Iterative directional derivatives

Let consider the computation of two directional derivatives with the same original input values but twodifferent input directions. If these directions are independent, they can be computed at the same timeand an n-directional tangent or cotangent code can be computed. If they are dependent, the only solutionis to use two 1-directional tangent Example 22(a)or cotangent as shown in Example 22(b). In this case,it is clear that the original function is computed twice.

We are studying some way to share the execution of the function between two executions [Gri00] of a1-directional tangent or cotangent code. A first solution Example 23(c) is: to execute once the originalcode Example 23(a) that stores some costly original values and then apply several times a fast versionExample 23(b) of the standard 1-directional tangent. A second solution Example 24(c) is: to executefirst a 1-directional tangent code Example 24(a) that stores some costly original or derivative valuesand then apply several times a fast version Example 23(b) of the standard 1-directional tangent. Bothsolutions can be merged by implementing some modified version of the original code where costlyoriginal and derivative values are stored.

The optimised iterative multi-directional tangent code Example 23(c) or Example 24(c) is more efficientthan the standard one if the trade-off between storage and recomputation is good. Is there any way tochose automatically the values to be stored to be sure of the gain.

7

Page 8: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

5.5 Really used array dimension

A lot of implementation of numerical codes must be run for various sizes of problems (size of the gridfor example). To do so, a maximal size is chosen at compile time to allocate all the array, and the realsize is given at runtime.

This really standard method Example 25(a) Example 25(b) has a great influence on the efficiency of thecotangent code as shown in Example 25(c). The ndim components of X are stored and restored whereasonly n should be. The first solution for us is to ask the user to declare the really used size of its arrays.But then this could lead to errors, if this property is not verified all along the source code. If the systemcould detect such a property, it would help.

6 AD enhancement

6.1 AD dependency propagation

For this problem, we did not found any example within our experiments. But this will arise when coupledcodes will have to be differentiated. The dependency Z depends on Y is difficult to detect in Example 26:the dependency comes from the write of Y in and the read of Z from the same file ’output.data’. Asa result if one differentiates sub0 with respect to X, the variable Z must become active. The presentversion of Odyssée generates Example 27 that computes no derivative for Z. The correct tangent code ispresented in Example 28.

To solve this problem, one can think of considering all read/write instructions to be active, but then theactivity propagation would be really under-optimal. The generated code would compute and store a lotof zero derivatives.

Is it possible to detect the dependency of Z to Y ?

6.2 Completely overwritten arrays

At a glance to Example 29, one can see that each component of vector X is set to 224 in sub1. This isnot detected by Odyssée dependency analysis.

To compute the directional derivative he wants, the user has to set the input direct XTTL to the valuehe wants before calling sub0tl. But the code automatically generated Example 30 resets the inputdirection XTTL to zero. As a result, all the computed derivatives ZTTL, KTTL are computed to zero. Thisis mathematically correct: the function sub0 is constant with respect to X and its derivative is zero.

On a large code, this leads to a really strange behaviour for the derivative code because the directionmay be reset to zero in the middle of the code. The system must raise a warning saying that X is not aninput variable for sub2 and that sub2tl would have a constant derivative. Then the user differentiatesthe program from compute instead of sub0 and obtains the derivative values he wants. To raise thiswarning, the system must be sure that the whole array X is overwritten.

6.3 Array dimension

Array dimensions can be declared to be * or 1 wherever in the source code. This is a ficticious size usedjust to tell the compiler that the variable is an array.

8

Page 9: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

The AD system has to store all the components of X before the call to sub2 and restore it before the callto sub2cl. To generate a correct derivative, it needs to know the declared size n even though it coulduse the used size imax*jmax. One must notice that the part of the code loaded by the system is not ingeneral the complete but only a part of it.

Is it possible to propagate the declared size of each arrays through the call-graph? If it is so, the systemcould raise warning or generate special comments to tell the user that some size declarations are missing.

6.4 Inter-procedural analysis

Example 32 shows an original program where a sub-program is passed as an arguments to sub1. Thecurrent version of Odyssée does not treat such a case and the tangent code automatically generatedExample 33 is not correct. The correct code for this example should be Example 34.

An other case that Odyssée does treat is the recursivity which is accepted in some f77 compilers.

Both problems are certainly not difficult to sort out in AD tools because it must be treated withincompilers.

6.5 Sensitivity of the generated code to numerical errors

There is a measure function that checks the coincidence between the tangent and cotangent codes.We have used this measure to compare on one example the cotangent code written by hand and theautomatically generated one (using Odyssée). It appears that for the same tangent code, the fit is muchmore important for the hand written cotangent code than for the automatically generated one.

We think that the sensitivity of the generated code to numerical errors should be investigated.

References

[Fau98] C. Faure. Le gradient de THYC3D par Odyssée. Rapport de recherche 3519, INRIA, October1998.

[Fau99] C. Faure. Adjoining strategies for multi-layered programs. Rapports de recherche 3781,INRIA, 1999.

[FP98] C. Faure and Y. Papegay. Odyssée User’s Guide. Version 1.7. Rapport technique 0224,INRIA, September 1998.

[GPRS96] J. Grimm, L. Pottier, and N. Rostaing-Schmidt. Optimal time and minimum space-timeproduct for reversing a certain class of programs. In M. Berz, C. Bischof, G. Corliss, andA. Griewank, editors, Computational Differentiation: Applications, Techniques, and Tools,pages 95–106. SIAM, 1996.

[Gri92] A. Griewank. Achieving logarithmic growth of temporal and spatial complexity in reverseautomatic differentiation. Optimization Methods and Software, 1:35–54, 1992.

[Gri00] A. Griewank. Principles and Techniques of Algorithmic Differentiation. SIAM, 2000.

9

Page 10: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

Z = X * Y**2(a) A

dZ = Y**2 * dX + 2*X*Y*dY(b) A′ = JF ∗ d

dX = dX + Y**2*dZdY = dY + 2*X*Y*dZdZ = 0.

(c) A? = J>F ∗ d?

Figure 1: Derivatives of the assignment A

dXdYdZ

o

:=

1 0 00 1 0

Y 2 2XY 0

dXdYdZ

i(a)

dXo := dXi

dYo := dYi

dZo := Y 2dXi + 2XY dY i

(b)

Figure 2: Product of the Jacobian matrix of A by the direction (dX, dY, dZ)i

dX?

dY ?

dZ?

o

:=

1 0 Y 2

0 1 2XY0 0 0

dX?

dY ?

dZ?

i(a)

dX?o := dX?

i + Y 2dZ?i

dY ?o := dY ?

i + 2XY dZ?i

dZ?o := 0.

(b)

Figure 3: Product of the transposed Jacobian matrix of A by the dual “direction” (dX?, dY ?, dZ?)i

10

Page 11: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

Y = Y**3 * X**2(a) B

dY = 3*Y**2*dY * X**2 + 2*Y**3*X*dX(b) B′

dX = dX + 2*X*Y**3 * dYdY = 3*X**2*Y**2 * dY

(c) B?

Figure 4: Derivatives of the assignment B

Y = X * Y**2(a) C

dY = Y**2 * dX + 2*X*Y*dY(b) C′

dX = dX + Y**2*dYdY = 2*X*Y*dY

(c) C?

Figure 5: Derivatives of the assignment C

Z = X * Y**2Y = Y**3 * X**2

(a) A; B

dZ = Y**2 * dX + 2*X*Y*dYdY = 3*Y**2*dY * X**2 + 2*Y**3*X*dX

Z = X * Y**2Y = Y**3 * X**2

(b) (A; B)′

dX = dX + 2*X*Y**3 * dYdY = 3*X**2*Y**2 * dY

dX = dX + Y**2*dZdY = dY + 2*X*Y*dZdZ = 0.

Z = X * Y**2Y = Y**3 * X**2

(c) (A; B)?

Figure 6: Derivatives of the sequence A;B

Y = Y**3 * X**2Z = X * Y**2

(a) B; A

dY = 3*Y**2*dY * X**2 + 2*Y**3*X*dXY = Y**3 * X**2

dZ = Y**2 * dX + 2*X*Y*dYZ = X * Y**2

(b) (B; A)′

Y0 = YY = Y**3 * X**2Y1 = YZ = X * Y**2

Y = Y1dX = dX + Y**2*dZdY = dY + 2*X*Y*dZdZ = 0.

Y = Y0dX = dX + Y**2*dYdY = 2*X*Y*dY

(c) (B; A)?

Figure 7: Derivatives of the sequence B; A

do i=e1, e2, sA

end do(a)

do i=e1, e2, sA’

end do(b)

save1 = e1save2 = e2save3 = sdo i=e1, e2, s

Aend do

e1 = save1e2 = save2s = save3do i=e1+k*s, e1, -s

A*end do

(c)

trace_idx=1trace(trace_idx) = 1do i=1, e, s

trace_idx += 1trace(trace_idx)=2A

end do

do i=trace_max, 1, -1block = trace(i)if (block.eq.2) then

A*end if

end do(d)

Figure 8: Do loop differentiation

11

Page 12: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

y = x + a*b(a) No overwritting

y = y + a*b(b) Overwritting of Y

Figure 9: Source code of Case 4accl = accl + yccl*bbccl = bccl + yccl*axccl = xccl + ycclyccl = 0.

(a) No overwritting

accl = accl + yccl*bbccl = bccl + yccl*a

yccl = yccl(b) Overwritting of Y

Figure 10: Cotangent of Case 4(a) and 4(b) with respect to x,a,b

subroutine cumul (x,y,a,b,n)integer nreal x(n),y(n),a(n),b(n)

do i=1, n

y(i) = x(i) + a(i)*b(i)end doend

(a) Source code of cumul

subroutine cumulcl (x,y,a,b,n,xccl,yccl,accl.bccl)

do i=1, nxccl(i) = xccl(i) + yccl(i)accl(i) = accl(i) + yccl(i)*bccl(i)bccl(i) = bccl(i) + yccl(i)*accl(i)yccl(i) = 0.

end doend

(b) Cotangent code of cumul

Figure 11: Cotangent of cumul with repect to y,a,b

call cumul (X,Y,A,B,N)(a) No overwritting

call cumul (Y,Y,A,B,N)(b) Overwritting of Y

Figure 12: Source code of Case 5call cumulcl (X,Y,A,B,N,XCCL,YCCL,ACCL,BCCL)

(a) No overwrittingcall cumulcl (Y,Y,A,B,N,XCCL,YCCL,ACCL,BCCL)

(b) Overwritting of Y

Figure 13: Cotangent of Case 5(a) and 5(b) with respect to x,a,b

12

Page 13: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

subroutine sub0 (x,a,b)integer ijkmparameter (ijkm=10)common /total/y1, y2, y3real y1(ijkm), y2(ijkm), y3(ijkm)real a(3*ijkm), b(3*ijkm), x(3*ijkm)

a(1) = y2(4)**2call cumul (x,y1,a,b,3*ijkm)b(1) = y3(4)**2end

(a) Main subprogram

Figure 14: Source code of Case 5subroutine sub0tl (x, a, b, xttl, attl, bttl)

attl(1) = 2*y2ttl(4)*y2(4)a(1) = y2(4)**2call cumultl(x, y1, a, b, 3*ijkm,

xttl, y1ttl, attl, bttl)bttl(1) = 0.b(1) = y3(4)**2end

(a) Incorrect code

subroutine sub0tl (x, a, b, xttl, attl, bttl)

attl(1) = 2*y2ttl(4)*y2(4)a(1) = y2(4)**2call cumultl(x, y1, a, b, 3*ijkm,

xttl, y1ttl, attl, bttl)bttl(1) = 2*y3(4)*y3ttl(4)b(1) = y3(4)**2end

(b) Correct code

Figure 15: Incorrect and correct tangent codes of Case 5 with respect to y2, x, a, b

subroutine sub0cl (x, a, b, xccl, accl, bccl)

save1 = a(1)a(1) = y2(4)**2DO nnn1 = 1, ijkm

save2(nnn1) = y1(nnn1)end DOcall cumul(x, y1, a, b, 3*ijkm)save3 = b(1)b(1) = y3(4)**2

b(1) = save3bccl(1) = 0.DO nnn1 = ijkm, 1, -1

y1(nnn1) = save2(nnn1)end DOcall cumulcl(x, y1, a, b, 3*ijkm,

xccl, y1ccl, accl, bccl)a(1) = save1y2ccl(4) = y2ccl(4)+accl(1)*(2*y2(4))accl(1) = 0.end

(a) Incorrect code

subroutine sub0cl (x, a, b, xccl, accl, bccl)

save1 = a(1)a(1) = y2(4)**2DO nnn1 = 1, 3*ijkm

save2(nnn1) = y1(nnn1)end DOcall cumul(x, y1, a, b, 3*ijkm)save3 = b(1)b(1) = y3(4)**2

b(1) = save3y3ccl(4) = y3ccl(4) + 2*y3(4)*bccl(1)bccl(1) = 0.DO nnn1 = 1, 3*ijkm

y1(nnn1) = save2(nnn1)end DOcall cumulcl(x, y1, a, b, 3*ijkm,

xccl, y1ccl, accl, bccl)a(1) = save1y2ccl(4) = y2ccl(4)+accl(1)*(2*y2(4))accl(1) = 0.end

(b) Correct code

Figure 16: Incorrect and correct cotangent codes of Case 5 with respect to y2, x, a, b

13

Page 14: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

Y = alpha * X1 + X2Z = beta * X1 * X2K = Y + Z**2

(a) Original code

YTTL = alpha * X1TTLY = alpha * X1 + X2

ZTTL = beta * X2 * X1TTLZ = beta * X1 * X2

KTTL = YTTL + 2*Z*ZTTLK = Y + Z**2

(b) Standard tangent code

Figure 17: Original and standard tangent codes of Problem 2 w.r.t X1

Z = beta * X1 * X2(a) Original code

ZTTL = beta * X2 * X1TTLZ = beta * X1 * X2

(b) Tangent code

Figure 18: Slicing of the original code with respect to Z

ZTTL = beta * X2 * X1TTL(a) with respect to ZTTL

YTTL = alpha * X1TTL

ZTTL = beta * X2 * X1TTLZ = beta * X1 * X2

KTTL = YTTL + 2*Z*ZTTL(b) with respect to KTTL

Figure 19: Slicing of the standard tangent code

do i=1, ncall push_scalar (x(i),1)x(i) = x(i)**2

end do(a) Standard

call push_buffer (x(1),n)do i=1, n

x(i) = x(i)**2end do

(b) Optimised

call push_scalar (z,1)z = y1call push_scalar (y1,1)y1 = x**2call push_scalar (y2,1)y2 = a*xcall push_scalar (y3,1)y3 = b

(c) Standard

call push_scalar (z,1)z = y1call push_buffer (y1,1,y2,1,y3,1)y1 = x**2y2 = a*xy3 = bz = y1

(d) Optimised

Figure 20: Standard and optimised direct parts

14

Page 15: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

do i=1, n

Y = F(Y)end do

(a) Original

do i=1, nS(i) = YY = F(Y)

end do

do i=n, 1, -1Y = S(i)

YCCL = F’(Y,YCCL)end do

(b) All storage

S = Ydo i=1, n

Y = F(Y)end do

do i=n, 1, -1Y = Sdo j=1, i-1

Y = F(Y)end doYCCL = F’(Y,YCCL)

end do

(c) All recomputation

push_check (Y,0)do i=1, n

Y = F(Y)end do

do i=n, 1, -1pop_check (Y,p)if (p.eq.n-1) then

YCCL = F’(Y,YCCL)else next_check(p,n,q)

do j=1, qY = F(Y)

end dopush_check (Y,q)

end do(d) Checkpoints

Figure 21: Loop cotangent code

15

Page 16: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

subroutine sub0 (x,y,xttl,yttl)

lttl = xttl + 2l = x+2

yttl = -cos(l)*lttly = sin(l)end

(a)

call compute_direction (dx1)call sub0tl (x,y,dx1,dy1)

call change_direction (dx1,dy1,dx2)call sub0tl (x,y,dx2,dy2)

(b)

Figure 22: Standard iterative direction computationsubroutine sub0_store (x,y)

l = x+2

s2 = sin(l)push (s2)y = s2end

(a)

subroutine sub0tl_fast(x,y,xttl,yttl)

lttl = xttl + 2l = x+2

yttl = -cos(l)*lttl

pop (s2)y = s2end

(b)

call sub0_store (x,y)

call compute_direction (dx1)call sub0tl_fast (x,y,dx1,dy1)

call change_direction (dx1,dy1,dx2)call sub0tl_fast (x,y,dx2,dy2)

(c)

Figure 23: Optimised iterative direction computation: First solutionsubroutine sub0tl_store

(x,y,xttl,yttl)

lttl = xttl + 2l = x+2

s1 = cos(l)push(s1)yttl = -s1*lttl

s2 = sin(l)push (s2)y = s2end

(a)

subroutine sub0tl_fast(x,y,xttl,yttl)

lttl = xttl + 2l = x+2

yttl = -cos(l)*lttl

pop (s2)y = s2end

(b)

call compute_direction (dx1)call sub0tl_store (x,y,dx1,dy1)

call change_direction (dx1,dy1,dx2)call sub0tl_fast (x,y,dx2,dy2)

(c)

Figure 24: Optimised iterative direction computation: Second solution

16

Page 17: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

subroutine sub1 (n,X,Y)parameter (ndim = 100000)real X(ndim), Y

do i=1, nX(i) = X(i)**2

end doY = 0do i=1, n

Y = Y+X(i)end doend

(a)

subroutine sub0 (n,X,Y)parameter (ndim = 100000)real X(ndim), Y

call sub1 (10,X,Y)end

(b)

subroutine sub0cl (n,X,Y)parameter (ndim = 100000)real X(ndim), Yreal XCCL(ndim), YCCL

call setsave (X,ndim)call sub1(10,X,Y)

call getsave (X,ndim)call sub1cl (10,X,Y,XCCL,YCCL)end

(c)

Figure 25: Really used array size

17

Page 18: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

subroutine sub0 (X,Y)

C Computes Y from XCALL compute (X,Y)

CALL writefile (Y,’output.data’)CALL readfile (Z,’output.data’)

C Computes K from ZCALL compute (Z,K)end

(a) Main sub-program

subroutine writefile (X,filename)

open (1,file=filename)write (1,*) Xclose(1)end

subroutine readfile (X,filename)

open (1,file=filename)read (1,*) Xclose(1)end

(b) Auxiliary sub-programs

Figure 26: Source code of Case 1subroutine sub0tl (X,Y,XTTL,YTTL)

C Computes Y from X and YTTL from X, XTTLCALL computetl (X,Y,XTTL,YTTL)

CALL writefiletl (Y,’output.data’,YTTL)CALL readfile (Z,’output.data’)

C Computes K from ZCALL compute (Z,K)

end(a)

subroutine writefiletl (X,XTTL,filename)

open (1,file=filename)write (1,*) Xclose(1)end

(b)

Figure 27: Incorrect tangent of Case 1

subroutine sub0tl (X,Y,XTTL,YTTL)

C Computes Y from X and YTTL from X, XTTLCALL computetl (X,Y,XTTL,YTTL)

CALL writefiletl (Y,’output.data’,YTTL)CALL readfiletl (Z,’output.data’,ZTTL)

C Computes K from Z and KTTL from Z, ZTTLCALL computetl (Z,K,ZTTL,KTTL)

end(a)

subroutine writefiletl (X,filename,XTTL)

open (1,file=filename)write (1,*) X, XTTLclose(1)end

subroutine readfiletl (X,filename,XTTL)

open (1,file=filename)read (1,*) X, XTTLclose(1)end

(b)

Figure 28: Correct tangent of Case 1

18

Page 19: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

subroutine sub0 (X,Y)

REAL X(IC,JC,1), Y, ZINTEGER K, J, I

CALL sub1 (X,224)

C Computes Z from XCALL compute (X,Z)end

(a) Main sub-program

subroutine sub1 (X,Y)

REAL X(IC,JC,1), YINTEGER K, J, I

DO K=1,KCDO J=1,JC

DO I=1,ICX(I,J,K)=Y

END DOEND DO

END DOend

(b) Aux. sub-program

Figure 29: Original code of Case 3subroutine sub0tl (X,Y,XTTL,YTTL)

REAL X(IC,JC,1), Y, ZREAL XTTL(IC,JC,1), YTTL, ZTTLINTEGER K, J, I

VAL = 224VALTTL = 0.CALL sub1tl (X,VAL,XTTL,VALTTL)

C Computes Z from XC and ZTTL from X and XTTL

CALL computetl (X,Z,XTTL,ZTTL)end

(a)

subroutine sub1tl (X,Y,XTTL,YTTL)

REAL X(IC,JC,1), YREAL XTTL(IC,JC,1), YTTLINTEGER K, J, I

DO K=1,KCDO J=1,JC

DO I=1,ICXTTL(I,J,K)=YTTLX (I,J,K)=Y

END DOEND DO

END DOend

(b)

Figure 30: Tangent code of Case 3 with respect to X

subroutine sub0 (X)common /size/n,imax,jmaxreal X(n), Y(n)

call sub1(X)end

(a)

subroutine sub1 (X,Y)real X(1), Y(1)

call sub2(X,Y)end

(b)

subroutine sub2 (X,Y)common /size/n,imax,jmaxreal X(*)real Y(*)

do i=1, imax*jmaxX(i) = X(i)*Y(i)

end doend

(c)

Figure 31: Variable array dimension

19

Page 20: CompilationaspectsofAutomaticDifferentiationbyexampleschristele.faure.pagesperso-orange.fr/publications/challenges.pdf · CompilationaspectsofAutomaticDifferentiationbyexamples

function sub0 (x,y,z)real x(n), y(n), z(n), sub2external sub2

call sub1 (sub2,x,y,sub0,n)end

(a)

function sub2 (x,y)real x, y, sub2

sub2 = x + yend

(b)

subroutine sub1 (fun,x,y,z,n)real x(n), y(n), z(n)do i=1, n

z(i) = fun (x(i),y(i))end doend

(c)

Figure 32: Original codesubroutine sub0tl (x, y, z, n,

xttl, yttl, zttl)

integer nreal x(n), y(n), z(n)real xttl(n), yttl(n), zttl(n)

call sub1tl(sub2, x, y, z, n,xttl, yttl, zttl)

end(a)

subroutine sub1tl (fun, x, y, z, n,xttl, yttl, zttl)

real x(n), y(n), z(n)real xttl(n), yttl(n), zttl(n)

do i = 1, ncall funtl(x(i), y(i), z(i),

xttl(i), yttl(i), zttl(i))end doend

(b)

Figure 33: Incorrect tangent codefunction sub0tl (x,y,xttl,yttl)integer nreal x(n), y(n), z(n),sub2external sub2real xttl(n), yttl(n), sub2tlexternal sub2tl

call sub1tl (sub2,x,y,z,n,sub2tl,xttl,yttl,zttl)

end

(a)

subroutine sub2tl (x,y,z,xttl,yttl,zttl)

real x, y, z

zttl = xttl + yttlz = x + yend

(b)

subroutine sub1tl (fun,x,y,z,n,funtl,xttl,yttl,zttl)

integer nreal x(n), y(n), z(n)real xttl(n), yttl(n), zttl(n)

do i=1, ncall funtl(x(i),y(i),z(i),

xttl(i),yttl(i),zttl(i))end doend

(c)

Figure 34: Correct tangent code

20