Download - Low-Memory Tour Reversal in Directed Graphs1

Low-Memory Tour Reversal in Directed Graphs1

Uwe Naumann, Viktor Mosenkis, Elmar PeiseLuFG Informatik 12, RWTH Aachen University, Germany

1EuroAD 9, INRIA, Nov. 2009

Contents

◮ Motivation: Derivative Code Compilers (dcc)

◮ Motivating Example

◮ Control Flow Reversal Problem

◮ Tour Reversal Problem

◮ Loop Compression

◮ Offline Algorithm

◮ Online Algorithm

◮ Implementation

◮ Implementation

◮ Test Results

◮ Conclusion and Outlook

Motivation: Derivative Code Compilers (dcc)

F : Rn → R

m ⇒ Tokens ⇒ AST ⇒ AAST ⇒ F ′

tangent-linear:

F (x, x) ≡ ∇F (x) · x, x ∈ Rn

adjoint:F (x, y) ≡ ∇F (x)T · y, y ∈ R

m

second-order tangent-linear:

F (x, x, x) ≡< ∇2F (x), x, x >, x, x ∈ Rn

second-order adjoint:

F (x, x, y) ≡< ∇2F (x), x, y >, x ∈ Rn, y ∈ R

m

... and higher-order tangent-linear and adjoint codes.

dcc

◮ AD by source code transformation for subset of C

◮ started off as teaching tool

◮ higher-order derivative codes by reapplication

◮ joint call tree reversal

◮ additional overloading mode

◮ main component of AC-SAMMM (see outlook)

◮ there are others2 ... :-(

2Tapenade, TAC, ADIC

Case Study (Perhaps Later ...)

Use dcc to generate

◮ TLC

◮ ADC

◮ SOTLC

◮ SOADC◮ FoR◮ RoF◮ RoR

◮ TOTLC (FoFoR)

◮ TOADC (FoFoR)

for

y = f (x) =

n−1∏

i=0

xi (Speelpenning).

Motivating Example

F : Rn → R y = F (x) =

n−1∏

i=0

xi

void F( i n t n ,f l o a t ∗x ,f l o a t &y ) {

i n t i =0;while ( i<n ) {

i f ( i ==1)y=x [ i ] ;

e l s e

y=y∗x [ i ]i=i +1;

}}

e.g n = 3

y=x [ 0 ] ;y=y∗x [ 1 ] ;y=y∗x [ 2 ] ;

xa [2 ]= y∗ ya ; ya=x [ 2 ] ∗ ya ;xa [1 ]= y∗ ya ; ya=x [ 1 ] ∗ ya ;xa [0 ]= ya ;

Motivating Example (cont’d)

void Fa ( i n t n , f l o a t ∗x , f l o a t ∗ xa , f l o a t &y , f l o a t ya ){


i f ( i ==1){ push ( cs , 1 ) ;

push ( fds , y ) ;y=x [ i ] ; }

e l s e

{ push ( cs , 2 ) ;push ( fds , y ) ;y=y∗x [ i ] }

push ( cs , 3 ) ;push ( i d s , i ) ;i=i +1;

}

i n t bb ;while ( pop ( cs , bb ) )

switch ( bb ) {case 1 : { y=pop ( f d s ) ;

xa [ i ]=ya ;break ; }

case 2 : { y=pop ( f d s ) ;xa [ i ]=y∗ ya ;ya=x [ i ]∗ ya ;break ; }

case 3 : { i=pop ( i d s ) ;break ; }

}

}

Motivating Example (cont’d)


i f ( i ==1){ push ( cs , 1 ) ;

push ( fds , y ) ;y=x [ i ] ; }

e l s e

{ push ( cs , 2 ) ;push ( fds , y ) ;y=y∗x [ i ] }

push ( cs , 3 ) ;push ( i d s , i ) ;i=i +1;

}

control stack (cs)

1, 2, 3, 2, 3, 2, 3, . . . , 2, 3

float data stack (fds)

y0, y1, y2, y3, y4, . . .

integer data stack (ids)

0, 1, 2, 3, 4, . . .

or0, 1, 1, 1, 1, . . .

if increments are stored

Control Flow Reversal (CFR)

Given: y = F (x) as conditional SAC:

for (j = n; j < n + p + m; j + +)if (vj−1 > 0)

vj = ϕj−n(vi )i≺j

v−1 = x0; v0 = x1

if (v0 > 0) v1 = v−1 · v0 else ...if (v1 > 0) v2 = sin(v1) else ...if (v2 > 0) v3 = v−1 · v2 else ...if (v3 > 0) v4 = v3/v0 else ...if (v4 > 0) v5 = cos(v3) else ...x0 = v4; x1 = v5

-1 0

1

2

3

5 4

DAG G = (V ,E )

Wanted: predecessors of all vertices in reverse order requiringevaluation of conditions in reverse order, i.e. the values ofv4, v3, v2, v1, v0

CFR is NP-complete

... by reduction from DAG Reversal

... from Fixed Cost DAG Reversal

◮ unit cost for storage andrecomputation

◮ fixed total cost |V |

◮ minimize storage

... from Vertex Cover enabling recom-putation at unit cost from stored argu-ments.

-1 0

1

2

3

5 4

◮ U. Naumann: DAG Reversal is NP-complete, Journal ofDiscrete Algorithms, Volume 7, Issue 4, December 2009, Pages402-410.

Tour Reversal

1 : . . .f o r ( i =0; i <2∗n , i++) {

2 : i f ( i %2){ . . .}3 : e l s e { . . . }

}4 : . . .

2

1

3

4

0

1

0

1

... by enumeration of basic blocks

1, 2, 3, . . . , 2, 3, 4

... by counting loop traversals and flagging branches

0, 1 . . . , 0, 1, 2n (reducible control flow only!)

Tour Reversal ProblemUse minimal memory to reverse a tour (i1, i2, . . . , il ) in a directedgraph G = (V ,E ).

Loop Compression

Enumerate Basic Blocks

◮ uncompressed tour:1, 2, 3, . . . , 2, 3, 42n + 2 integers

◮ compressed tour: 1, 2, 3, [n, 2], 46 integers

Count loops and flag branches

◮ uncompressed tour:0, 1, 0, 1, 0, 1, . . . , 0, 1, 2n2n + 1 integers

◮ compressed tour: 0, 1, [n, 2], 2n5 integers

1 : . . .f o r ( i =0; i <2∗n , i++) {

2 : i f ( i %2){ . . .}3 : e l s e { . . . }

}4 : . . .

Offline Algorithm

Dynamic programming compresses according to

f (di ,j) =

1 i = j

mini≤s<j

f (di ,s ◦ ds+1,j) otherwise,

where di ,j is the optimally compressed subtour from i -th to j-thindex and

f (di ,j) := |N ∩ V | (loop elements not counted).

Problems

◮ need to see whole tour in order to find an optimal solution

◮ slow (O(n3) for tour of length n)

Online Algorithm

1, 2, 3, . . . , 2, 3, 4 becomes

push(1): 1

push(2): 1, 2

push(3): 1, 2, 3

push(2): 1, 2, 3, 2

push(3): 1, 2, 3, [2, 2]

push(2): 1, 2, 3, [2, 2], 2

push(3): 1, 2, 3, [3, 2]

...

last push: 1, 2, 3, [n, 2], 4

pop: 1, 2, 3, [n, 2]

pop: 1, 2, 3, [n −1, 2], 2

◮ push◮ pushes element to top of

stack◮ compresses the loops on

window of predefined size

◮ pop◮ unrolls one instance of

last loop if needed◮ returns element from top

of stack and removes itfrom stack

Test Results

Problem uncompr. stack size compr. stack size compr. rate

burger 3432 MB 144 Byte 25013889

bratu 4576 MB 168 Byte 28572380

ljc 3692 MB 704 KByte 5500

Problem Window size Runtime rate

burger 51 ≈ 2.5

bratu 51 ≈ 2.8

ljc 51 ≈ 5.5

ljc 11 ≈ 3.3

Implementation

◮ C++ front-end◮ distribution: ccistack.{hpp,cpp}◮ class ccistack◮ void ccistack::push(int) and int ccistack::pop()

◮ C front-end◮ distribution: cistack.{h,c}◮ void init (struct cistack *k,

int MaxStackSize, int WindowSize)

to allocate◮ void finalize (struct cistack *k) to free◮ void push (struct cistack *k, int i)◮ int pop (struct cistack *k) and

int empty (struct cistack *k)

◮ Fortran front-end◮ uses C front-end

C++ Front-End

#include <f s t r eam>using namespace s t d ;#include ” c i s t a c k . h”

i n t main ( i n t argc , char∗ a rgv [ ] ) {i f s t r e a m i n f i l e ( a rgv [ 1 ] ) ;o f s t r eam o u t f i l e ( a rgv [ 2 ] ) ;c i s t a c k cs (100 , 21 ) ; i n t v ;whi le ( ! i n f i l e . e o f ( ) ) {

i n f i l e >> v ;i f ( ! i n f i l e . e o f ( ) ) cs . push ( v ) ;

}whi le ( ! cs . empty ( ) )

o u t f i l e << cs . pop ( ) << end l ;return 0 ;

}

C Front-End

#include ” c i s t a c k . h”

i n t main ( i n t argc , char∗ a rgv [ ] ) {s t ruct c i s t a c k cs ;FILE∗ i n f i l e=fopen ( a rgv [ 1 ] , ” r+” ) ;FILE∗ o u t f i l e=fopen ( a rgv [ 2 ] , ”w+” ) ;i n i t (&cs , 1 0 0 , 4 ) ;i n t v ;whi le ( f s c a n f ( i n f i l e , ”%i ” ,&v)>0) push(&cs , v ) ;whi le ( ! empty ( ) ) f p r i n t f ( o u t f i l e , ”%i \n” , pop(&cs ) ) ;f i n a l i z e (&cs ) ;f c l o s e ( c sou t ) ; f c l o s e ( o u t f i l e ) ; f c l o s e ( i n f i l e ) ;

}

Conclusion (I)

◮ powerful control flow reversal tool ready to use for developersof adjoint compiler technology and/or debugging/profilingtools

◮ paper (almost) ready for submission

Outlook

◮ compression of integer data stack

◮ compression of float data stack

Conclusion (II)

You need a (adjoint) derivative code compiler if

◮ finite differences cannot be trusted

◮ finite differences or exact forward sensitivities are tooexpensive

◮ you are unable to build and solve the adjoint system manually

You need to invest3, 6, 18, 36

(wo)man months for sustained

runtime of adjoint

runtime of original simulation

of50, 20, < 10, < 4

Further Ongoing Activities at STCE

◮ CompAD-III (J. Riehme and D. Gendler, with UHerts andNAG)

◮ Adjoint error correction in ICON (J. Riehme and K. Leppkes,with MPI-M)

◮ Model reliability analysis in Sisyphe/Telemac (J. Riehme, withBAW)

◮ Data assimilation in JURASSIC (E. Varnik and M. Forster,with ICG-I at FZJ)

◮ dcc and the AaChen platform for Structured AutomaticManipulation of Mathematical Models (AC-SAMMM)(M. Forster and B. Gendler, with AVT)

◮ Toward adjoint MPI (M. Schanen, with ANL and INRIA)

◮ Pushing TBR (with ANL and INRIA)

◮ Call tree reversal (H. Lakhdar and X. Jin)

Further Ongoing Activities at STCE (cont.)

◮ Elimination techniques in linearized DAGs (V. Mosenkis)

◮ Higher-order adjoints for uncertainty quatification (M.Beckers, with UHerts)

◮ Adjoint subgradients for McCormick relaxations (M. Beckers,V. Mosenkis, and M. Maier, with MechEng at MIT)

◮ What color is the non-constant part of your Jacobian?(E. Varnik and L. Razik)

◮ Shared-memory parallelism in tape-based adjoints (K. Leppkesand J. Riehme)

◮ Submitted: Hybrid AD for C/C++ (ADOL-C + dcc,D. Gendler, with UPaderborn)

The AaChen platform for Structured Automatic

Manipulation of Mathematical Models

dcc Configuration

...DyOSMexa

Solution Solution

Problem

I2

I1

dcc

I3tailored AD

manipulation

symbolicAD

model refinement

ModelsDerivative Custom

Standard (AC−SAMMM)

MathematicalSubmodels inC− (imperative)

MathematicalModel in C−+(descriptive)

Diversity (The World)

Model in Mathematical

− Modelica

− gPROMS− ...

ToC−+

Global Optimization using McCormick Relaxations

-1

0

1

2

3

4

5

-6 -4 -2 0 2 4

Original functionNat. interval ext. underest.

Convex underestimatorAffine underestimator

Global upper bound

(adjoint) subgradients by NAG Fortran compiler based on� A. Mitsos, B. Chachuat, and P. I. Barton: McCormick-Based Relaxations of

Algorithms, SIAM Journal on Optimization, 2009.

(C. Corbett, M. Beckers, V. Mosenkis, M. Maier)

Uncertainty Quantification / Robust Optimizatione.g. minµx (µy +

√

σ2y ) where y = F (x) and w.l.o.g. F : R → R

using

◮ approximate mean

µy = F (µx) +F ′′(µx)

2· σ2

x

◮ approximate variance

σ2y = F ′(µx)

2 σ2x + F ′(µx)F ′′(µx)Sx σ3

x

+1

4

(

F ′′(µx))2

(Kx − 1)σ4x

for given initial mean µx , variance σ2x , skewness Sx , and kurtosis

Kx of x ∈ R. Second-order method (e.g. Newton) requiresderivatives up to fourth order. (M. Beckers)

Adjoint (Total Model) Error Correction

... in ICON GCM: ICOsahedral Non-hydrostatic General Circulation Modeldeveloped by Max Planck Institute forMeteorology and Deutscher Wetterdienst.

(Discrete) Adjoint by NAG FortranCompiler. (J. Riehme)

(References to) theoretical foundations e.g. in

� R. Rannacher, F.-T. Suttmeier: A posteriori error control in finite element

methods via duality techniques: Application to perfect plasticity.

Computational Mechanics, 1998.

� M.B. Giles, N.A. Pierce and E. Suli: Progress in adjoint error correction for

integral functionals, Computing and Visualization in Science, 2004.

Adjoint Error Correction (Linear Case)

Given: xf = F (A, xs) after solution of A · xf = xs and value oflinear objective J = J(g , xf ) =< g , xf > .

Wanted: corrected objective J =< g , xcf > − < xc

s ,A · xcf − xs > .

Solution:AT · xs = xf = g

NOT using discrete adjoint code

xf = A−1 · xs

J =< g , xf >

xf = g · J = g

xs = A−T · xf

produced by derivative code compiler.

Adjoint Error Correction (Nonlinear Case)

Given: xf = F (A, xs) after solution of N(x) = 0 and nonlinearobjective J = J(xf ) with gradient g ≡ ∂J

∂xf.

Wanted: corrected objective J = J(xcf )− < xc

s ,N(xcf ) > .

Solution to the adjoint system preferably using discrete adjointcode

xf = F (xs)

J = J(xf )

xf = g · J = g

xs= F ′(xs)T · xf

produced by derivative code compiler.

Objective: Potential Energy wrt. Height Field

rotating earth; all water except at poles; objective: potentialenergy somewhere in middle of the Sahara ... (F. Rauser, P. Korn)

Test Case 3: Unsteady Solid Body Rotation from� Laeuter: Unsteady analytical solutions of the spherical shallow water

equations. Journal of Computational Physics, 2005

Download - Low-Memory Tour Reversal in Directed Graphs1

Top Related