Download - Low-Memory Tour Reversal in Directed Graphs1
Low-Memory Tour Reversal in Directed Graphs1
Uwe Naumann, Viktor Mosenkis, Elmar PeiseLuFG Informatik 12, RWTH Aachen University, Germany
1EuroAD 9, INRIA, Nov. 2009
Contents
◮ Motivation: Derivative Code Compilers (dcc)
◮ Motivating Example
◮ Control Flow Reversal Problem
◮ Tour Reversal Problem
◮ Loop Compression
◮ Offline Algorithm
◮ Online Algorithm
◮ Implementation
◮ Implementation
◮ Test Results
◮ Conclusion and Outlook
Motivation: Derivative Code Compilers (dcc)
F : Rn → R
m ⇒ Tokens ⇒ AST ⇒ AAST ⇒ F ′
tangent-linear:
F (x, x) ≡ ∇F (x) · x, x ∈ Rn
adjoint:F (x, y) ≡ ∇F (x)T · y, y ∈ R
m
second-order tangent-linear:
F (x, x, x) ≡< ∇2F (x), x, x >, x, x ∈ Rn
second-order adjoint:
F (x, x, y) ≡< ∇2F (x), x, y >, x ∈ Rn, y ∈ R
m
... and higher-order tangent-linear and adjoint codes.
dcc
◮ AD by source code transformation for subset of C
◮ started off as teaching tool
◮ higher-order derivative codes by reapplication
◮ joint call tree reversal
◮ additional overloading mode
◮ main component of AC-SAMMM (see outlook)
◮ there are others2 ... :-(
2Tapenade, TAC, ADIC
Case Study (Perhaps Later ...)
Use dcc to generate
◮ TLC
◮ ADC
◮ SOTLC
◮ SOADC◮ FoR◮ RoF◮ RoR
◮ TOTLC (FoFoR)
◮ TOADC (FoFoR)
for
y = f (x) =
n−1∏
i=0
xi (Speelpenning).
Motivating Example
F : Rn → R y = F (x) =
n−1∏
i=0
xi
void F( i n t n ,f l o a t ∗x ,f l o a t &y ) {
i n t i =0;while ( i<n ) {
i f ( i ==1)y=x [ i ] ;
e l s e
y=y∗x [ i ]i=i +1;
}}
e.g n = 3
y=x [ 0 ] ;y=y∗x [ 1 ] ;y=y∗x [ 2 ] ;
xa [2 ]= y∗ ya ; ya=x [ 2 ] ∗ ya ;xa [1 ]= y∗ ya ; ya=x [ 1 ] ∗ ya ;xa [0 ]= ya ;
Motivating Example (cont’d)
void Fa ( i n t n , f l o a t ∗x , f l o a t ∗ xa , f l o a t &y , f l o a t ya ){
i n t i =0;while ( i<n ) {
i f ( i ==1){ push ( cs , 1 ) ;
push ( fds , y ) ;y=x [ i ] ; }
e l s e
{ push ( cs , 2 ) ;push ( fds , y ) ;y=y∗x [ i ] }
push ( cs , 3 ) ;push ( i d s , i ) ;i=i +1;
}
i n t bb ;while ( pop ( cs , bb ) )
switch ( bb ) {case 1 : { y=pop ( f d s ) ;
xa [ i ]=ya ;break ; }
case 2 : { y=pop ( f d s ) ;xa [ i ]=y∗ ya ;ya=x [ i ]∗ ya ;break ; }
case 3 : { i=pop ( i d s ) ;break ; }
}
}
Motivating Example (cont’d)
i n t i =0;while ( i<n ) {
i f ( i ==1){ push ( cs , 1 ) ;
push ( fds , y ) ;y=x [ i ] ; }
e l s e
{ push ( cs , 2 ) ;push ( fds , y ) ;y=y∗x [ i ] }
push ( cs , 3 ) ;push ( i d s , i ) ;i=i +1;
}
control stack (cs)
1, 2, 3, 2, 3, 2, 3, . . . , 2, 3
float data stack (fds)
y0, y1, y2, y3, y4, . . .
integer data stack (ids)
0, 1, 2, 3, 4, . . .
or0, 1, 1, 1, 1, . . .
if increments are stored
Control Flow Reversal (CFR)
Given: y = F (x) as conditional SAC:
for (j = n; j < n + p + m; j + +)if (vj−1 > 0)
vj = ϕj−n(vi )i≺j
v−1 = x0; v0 = x1
if (v0 > 0) v1 = v−1 · v0 else ...if (v1 > 0) v2 = sin(v1) else ...if (v2 > 0) v3 = v−1 · v2 else ...if (v3 > 0) v4 = v3/v0 else ...if (v4 > 0) v5 = cos(v3) else ...x0 = v4; x1 = v5
-1 0
1
2
3
5 4
DAG G = (V ,E )
Wanted: predecessors of all vertices in reverse order requiringevaluation of conditions in reverse order, i.e. the values ofv4, v3, v2, v1, v0
CFR is NP-complete
... by reduction from DAG Reversal
... from Fixed Cost DAG Reversal
◮ unit cost for storage andrecomputation
◮ fixed total cost |V |
◮ minimize storage
... from Vertex Cover enabling recom-putation at unit cost from stored argu-ments.
-1 0
1
2
3
5 4
◮ U. Naumann: DAG Reversal is NP-complete, Journal ofDiscrete Algorithms, Volume 7, Issue 4, December 2009, Pages402-410.
Tour Reversal
1 : . . .f o r ( i =0; i <2∗n , i++) {
2 : i f ( i %2){ . . .}3 : e l s e { . . . }
}4 : . . .
2
1
3
4
0
1
0
1
... by enumeration of basic blocks
1, 2, 3, . . . , 2, 3, 4
... by counting loop traversals and flagging branches
0, 1 . . . , 0, 1, 2n (reducible control flow only!)
Tour Reversal ProblemUse minimal memory to reverse a tour (i1, i2, . . . , il ) in a directedgraph G = (V ,E ).
Loop Compression
Enumerate Basic Blocks
◮ uncompressed tour:1, 2, 3, . . . , 2, 3, 42n + 2 integers
◮ compressed tour: 1, 2, 3, [n, 2], 46 integers
Count loops and flag branches
◮ uncompressed tour:0, 1, 0, 1, 0, 1, . . . , 0, 1, 2n2n + 1 integers
◮ compressed tour: 0, 1, [n, 2], 2n5 integers
1 : . . .f o r ( i =0; i <2∗n , i++) {
2 : i f ( i %2){ . . .}3 : e l s e { . . . }
}4 : . . .
Offline Algorithm
Dynamic programming compresses according to
f (di ,j) =
1 i = j
mini≤s<j
f (di ,s ◦ ds+1,j) otherwise,
where di ,j is the optimally compressed subtour from i -th to j-thindex and
f (di ,j) := |N ∩ V | (loop elements not counted).
Problems
◮ need to see whole tour in order to find an optimal solution
◮ slow (O(n3) for tour of length n)
Online Algorithm
1, 2, 3, . . . , 2, 3, 4 becomes
push(1): 1
push(2): 1, 2
push(3): 1, 2, 3
push(2): 1, 2, 3, 2
push(3): 1, 2, 3, [2, 2]
push(2): 1, 2, 3, [2, 2], 2
push(3): 1, 2, 3, [3, 2]
...
last push: 1, 2, 3, [n, 2], 4
pop: 1, 2, 3, [n, 2]
pop: 1, 2, 3, [n −1, 2], 2
◮ push◮ pushes element to top of
stack◮ compresses the loops on
window of predefined size
◮ pop◮ unrolls one instance of
last loop if needed◮ returns element from top
of stack and removes itfrom stack
Test Results
Problem uncompr. stack size compr. stack size compr. rate
burger 3432 MB 144 Byte 25013889
bratu 4576 MB 168 Byte 28572380
ljc 3692 MB 704 KByte 5500
Problem Window size Runtime rate
burger 51 ≈ 2.5
bratu 51 ≈ 2.8
ljc 51 ≈ 5.5
ljc 11 ≈ 3.3
Implementation
◮ C++ front-end◮ distribution: ccistack.{hpp,cpp}◮ class ccistack◮ void ccistack::push(int) and int ccistack::pop()
◮ C front-end◮ distribution: cistack.{h,c}◮ void init (struct cistack *k,
int MaxStackSize, int WindowSize)
to allocate◮ void finalize (struct cistack *k) to free◮ void push (struct cistack *k, int i)◮ int pop (struct cistack *k) and
int empty (struct cistack *k)
◮ Fortran front-end◮ uses C front-end
C++ Front-End
#include <f s t r eam>using namespace s t d ;#include ” c i s t a c k . h”
i n t main ( i n t argc , char∗ a rgv [ ] ) {i f s t r e a m i n f i l e ( a rgv [ 1 ] ) ;o f s t r eam o u t f i l e ( a rgv [ 2 ] ) ;c i s t a c k cs (100 , 21 ) ; i n t v ;whi le ( ! i n f i l e . e o f ( ) ) {
i n f i l e >> v ;i f ( ! i n f i l e . e o f ( ) ) cs . push ( v ) ;
}whi le ( ! cs . empty ( ) )
o u t f i l e << cs . pop ( ) << end l ;return 0 ;
}
C Front-End
#include ” c i s t a c k . h”
i n t main ( i n t argc , char∗ a rgv [ ] ) {s t ruct c i s t a c k cs ;FILE∗ i n f i l e=fopen ( a rgv [ 1 ] , ” r+” ) ;FILE∗ o u t f i l e=fopen ( a rgv [ 2 ] , ”w+” ) ;i n i t (&cs , 1 0 0 , 4 ) ;i n t v ;whi le ( f s c a n f ( i n f i l e , ”%i ” ,&v)>0) push(&cs , v ) ;whi le ( ! empty ( ) ) f p r i n t f ( o u t f i l e , ”%i \n” , pop(&cs ) ) ;f i n a l i z e (&cs ) ;f c l o s e ( c sou t ) ; f c l o s e ( o u t f i l e ) ; f c l o s e ( i n f i l e ) ;
}
Conclusion (I)
◮ powerful control flow reversal tool ready to use for developersof adjoint compiler technology and/or debugging/profilingtools
◮ paper (almost) ready for submission
Outlook
◮ compression of integer data stack
◮ compression of float data stack
Conclusion (II)
You need a (adjoint) derivative code compiler if
◮ finite differences cannot be trusted
◮ finite differences or exact forward sensitivities are tooexpensive
◮ you are unable to build and solve the adjoint system manually
You need to invest3, 6, 18, 36
(wo)man months for sustained
runtime of adjoint
runtime of original simulation
of50, 20, < 10, < 4
Further Ongoing Activities at STCE
◮ CompAD-III (J. Riehme and D. Gendler, with UHerts andNAG)
◮ Adjoint error correction in ICON (J. Riehme and K. Leppkes,with MPI-M)
◮ Model reliability analysis in Sisyphe/Telemac (J. Riehme, withBAW)
◮ Data assimilation in JURASSIC (E. Varnik and M. Forster,with ICG-I at FZJ)
◮ dcc and the AaChen platform for Structured AutomaticManipulation of Mathematical Models (AC-SAMMM)(M. Forster and B. Gendler, with AVT)
◮ Toward adjoint MPI (M. Schanen, with ANL and INRIA)
◮ Pushing TBR (with ANL and INRIA)
◮ Call tree reversal (H. Lakhdar and X. Jin)
Further Ongoing Activities at STCE (cont.)
◮ Elimination techniques in linearized DAGs (V. Mosenkis)
◮ Higher-order adjoints for uncertainty quatification (M.Beckers, with UHerts)
◮ Adjoint subgradients for McCormick relaxations (M. Beckers,V. Mosenkis, and M. Maier, with MechEng at MIT)
◮ What color is the non-constant part of your Jacobian?(E. Varnik and L. Razik)
◮ Shared-memory parallelism in tape-based adjoints (K. Leppkesand J. Riehme)
◮ Submitted: Hybrid AD for C/C++ (ADOL-C + dcc,D. Gendler, with UPaderborn)
The AaChen platform for Structured Automatic
Manipulation of Mathematical Models
dcc Configuration
...DyOSMexa
Solution Solution
Problem
I2
I1
dcc
I3tailored AD
manipulation
symbolicAD
model refinement
ModelsDerivative Custom
Standard (AC−SAMMM)
MathematicalSubmodels inC− (imperative)
MathematicalModel in C−+(descriptive)
Diversity (The World)
Model in Mathematical
− Modelica
− gPROMS− ...
ToC−+
Global Optimization using McCormick Relaxations
-1
0
1
2
3
4
5
-6 -4 -2 0 2 4
Original functionNat. interval ext. underest.
Convex underestimatorAffine underestimator
Global upper bound
(adjoint) subgradients by NAG Fortran compiler based on� A. Mitsos, B. Chachuat, and P. I. Barton: McCormick-Based Relaxations of
Algorithms, SIAM Journal on Optimization, 2009.
(C. Corbett, M. Beckers, V. Mosenkis, M. Maier)
Uncertainty Quantification / Robust Optimizatione.g. minµx (µy +
√
σ2y ) where y = F (x) and w.l.o.g. F : R → R
using
◮ approximate mean
µy = F (µx) +F ′′(µx)
2· σ2
x
◮ approximate variance
σ2y = F ′(µx)
2 σ2x + F ′(µx)F ′′(µx)Sx σ3
x
+1
4
(
F ′′(µx))2
(Kx − 1)σ4x
for given initial mean µx , variance σ2x , skewness Sx , and kurtosis
Kx of x ∈ R. Second-order method (e.g. Newton) requiresderivatives up to fourth order. (M. Beckers)
Adjoint (Total Model) Error Correction
... in ICON GCM: ICOsahedral Non-hydrostatic General Circulation Modeldeveloped by Max Planck Institute forMeteorology and Deutscher Wetterdienst.
(Discrete) Adjoint by NAG FortranCompiler. (J. Riehme)
(References to) theoretical foundations e.g. in
� R. Rannacher, F.-T. Suttmeier: A posteriori error control in finite element
methods via duality techniques: Application to perfect plasticity.
Computational Mechanics, 1998.
� M.B. Giles, N.A. Pierce and E. Suli: Progress in adjoint error correction for
integral functionals, Computing and Visualization in Science, 2004.
Adjoint Error Correction (Linear Case)
Given: xf = F (A, xs) after solution of A · xf = xs and value oflinear objective J = J(g , xf ) =< g , xf > .
Wanted: corrected objective J =< g , xcf > − < xc
s ,A · xcf − xs > .
Solution:AT · xs = xf = g
NOT using discrete adjoint code
xf = A−1 · xs
J =< g , xf >
xf = g · J = g
xs = A−T · xf
produced by derivative code compiler.
Adjoint Error Correction (Nonlinear Case)
Given: xf = F (A, xs) after solution of N(x) = 0 and nonlinearobjective J = J(xf ) with gradient g ≡ ∂J
∂xf.
Wanted: corrected objective J = J(xcf )− < xc
s ,N(xcf ) > .
Solution to the adjoint system preferably using discrete adjointcode
xf = F (xs)
J = J(xf )
xf = g · J = g
xs= F ′(xs)T · xf
produced by derivative code compiler.
Objective: Potential Energy wrt. Height Field
rotating earth; all water except at poles; objective: potentialenergy somewhere in middle of the Sahara ... (F. Rauser, P. Korn)
Test Case 3: Unsteady Solid Body Rotation from� Laeuter: Unsteady analytical solutions of the spherical shallow water
equations. Journal of Computational Physics, 2005