the laplace transform versus parareal -...

Big Iron Chef Episode:

The Laplace Transform versus Parareal

Craig C. Douglas (University of Wyoming and the King Abdullah University of Science & Technology (KAUST))

with

Dongwoo Sheen and Imbunm Kim (Seoul National University) Hyoseop Lee (Alcatel-Lucent Bell Labs – Seoul)

Samir Karaa (Sultan Qaboos University)

The research is based on work partially supported by AFOSR, NRF, NSF, KAUST, and the Seoul R & D Program.

Scaling Up to Massive Parallelism Airplane alternate energy example (A-380): • One Rolls Royce engine is equivalent to

~75,000 standard blow dryers. • 4 dirty or 300,000 green power sources?

o How much would batteries for 300,000 blow dryers weigh to fly an A-380 from Sydney to LAX? ! Answer: Too much to get the plane off the

ground. (Open research area in batteries). o Now think about Peta/Exa-scale computing…

Time Evolution Problems Differential Equations:

– Ordinary: u'= f (u) – Parabolic: du

dt =L(u)+ f

Classical methods based on time stepping:

…

… t0 t1

tNt−1tNt

Computational Complexity Example Suppose we have Ns spatial points, Nt time steps, and we solve a parabolic equation solved using backward Euler combined with multigrid. Then the cost of solving the problem has two cases:

Serial O(Nt ⋅Ns) Parallel O(Nt ⋅log2Ns)

The Big Question Can we robustly solve parts of the problem later in time before fully approximating the solution earlier in time using something similar to a traditional numerical algorithm for solving partial differential equations? For a long time the answer was thought to be no… (cf. [A. Deshpande, S. Malhotra, C. C. Douglas, and M. H. Schultz, A rigorous analysis of time domain parallelism. Parallel Algorithms and Applications, 6 (1995), pp. 53-62].)

Parareal Papers

See also [M.J. Gander and S. Vandewalle, Analysis of the parareal time-parallel time-integration method, SIAM J. Sci. Comput., 29 (2007), pp. 556–578].

Parareal Algorithm Consider the ODE u'= f (u) starting from an initial condition of u1=u(t1). Use two time propagation operators of the form, • G(t2,t1,u1) is a rough approximation of u(t2). • F(t2,t1,u1) is a more accurate approximation of u(t2).

For example,

Parareal starts with a coarse initial guess Un0 at time points

t1, t2, , tN and computes Unk for

k=1, 2, … by a series of correction iterations.

Parareal Implementation Loop over for k=0,1,..., Iteration #s for n=0,1,...,N−1, Time steps Un+1k+1=G(tn+1,tn,Unk+1)+F(tn+1,tn,Unk)−G(tn+1,tn,Unk); Comments: Dominant part of the computation (F) is embarrassingly parallel in time. About five lines of Matlab to experiment with Parareal.

Parareal Update Pattern

About Convergence • When converged, we have F-propagator

accuracy at each tn . • Convergence guaranteed after N iterations on

t0, t1, , tN .

• Is this it??? (Remember the Big Question?)

Typical Theorem for Parareal Theorem (LMT 2001): For u'=−au and u(0)=u0, let F(tn+1,tn,Unk) denote the exact solution at tn+1 and G(tn+1,tn,Unk) be the backward Euler approximation with time step ΔT . Then

max1≤n≤N

u(tn)−Unk ≤CkΔTk+1.

Some Interpretations of Parareal • Just a solver for the F-equations if Parareal

iterates until convergence. • A new time integrator if the number of

iterations is fixed in advance. • Is it related to an already known time

integration method from the dark ages (BG era) of the printed document library?

BG = before Google

Multiple Shooting Methods For N intervals for solving u'= f (u), u(0)=u0, t∈[0,1],

define Un+1k+1=un(tn+1,Unk)+∂un∂Un

(tn+1,Unk)Unk+1−Unk⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟.

Theorem: If in the multiple shooting method, un(tn+1,Unk)≈F(tn+1,tn,Unk) and

∂un∂Un


⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟≈G(tn+1,tn,Unk+1)−G(tn+1,tn,Unk),

then the multiple shooting and Parareal methods coincide.

Interpretation and Commentary Interpretation: Parareal = multiple shooting with a coarse approximate Jacobean. Comment: Different approximations for the

∂un∂Un


⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

term lead to different time-parallel algorithms. See [H.B. Keller, Numerical Solution of Two-Point Boundary Value Problems (CBMS-NSF Regional Conference Series in Applied Mathematics), SIAM, 1976].

Speedup Let S=TSTP

= NtFNtG+K NtG+(N /P)tF

⎛

⎝

⎜⎜

⎞

⎠

⎟⎟

≈PK , where

P = Number of processor cores K = Number of iterations N = Number of time steps tG , tF = 1 step cost of the G and F propagators. Limited speedup if K is large. Useless if P=1.

Where Is Parareal Useful? Fluid, structure, molecular dynamics, … problems. Extensions developed in recent years:

1. Combined with multilevel in time, space 2. Domain decomposition in space 3. Subspace filtering 4. Combined with waveform relaxation 5. Combined with Kalman/stochastic filtering

17

Simple Computational Examples

1. For u(t0)=1, t∈0,30⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥, ΔT =1 and Δt=0.01,

u'(t)=−u(t)+sin(t).

Use the trapezoidal rule. The initial error ~1.

With K=13, error ~10−14 .

18

2. Brusselator problem: For x(0)=0, y(0)=1, t∈0,12⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥, T =12, ΔT =T /32, Δt=T /320,

x=1+x2y−4x and y=3x−x2y.

Use a 4th order Runge-Kutta scheme. With K=7, error ~10−12.

See [M.J. Gander and S. Vandewalle, Analysis of the parareal time-parallel time-integration method, SIAM J. Sci. Comput., 29 (2007), pp. 556–578].

Laplace Transform (LT)

We solve

∂u∂t+Au= f , t∈(0,T ], starting from u(0)=u0.

Given some z∈ and a function u(⋅,t), the Laplace transform in time is given by

u(⋅,z)≡L[u](z)= u(⋅,t)e−ztdt0∞∫ .

We are left solving by any reasonable elliptic solver the transformed problem

u(⋅,z)= zI+A⎛

⎝

⎜⎜

⎞

⎠

⎟⎟

−1u0(⋅)+ f (⋅,z)⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟.

We assume for some CS∈

+ the real parts of singular points of u0(⋅)+ f (⋅,z)≤CS . Let the integral contour be a straight line parallel to the imaginary axis,

Γ≡ z∈ | z(ω)=α+iω, α≥Cs, ω∈[−∞→∞]=⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪.

The Laplace inversion formula is u(⋅,t)= 1

2πi u(⋅,z)eztdzΓ∫ .

When z0 and z∈Γ has negative real parts, the

discretization error in evaluating the integrals for u(⋅,t) is significantly reduced for all t>0. We deform Γ to the left half plane with all singularities to its left with a hyperbola contour

Γ= z∈ | z(ω)+isω, ω∈[−∞→∞], ζ (ω)=γ − ω2+υ2⎧

⎨⎪

⎩⎪

⎫

⎬⎪

⎭⎪

In essence, the hyperbola contour must be kept away from the spectrum of −A and the singular points of f (z).

Define ψ (ω)= tanh(τω2 ): (−∞,∞)→[−1,1]. Then

Use the trapezoidal rule for discretization.

u(t)= 12π i ezt

Γ∫ u(z)dz

= 12π i e{σ (ω)+isω}t

−∞∞∫ uσ (ω)+iω

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

{σ′(ω)+is}dω

= 12π i e{σ (ψ−1(y))+isψ−1(y)}t

−11∫ uσ (ψ−1(y))+isψ−1(y)

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

σ′(ψ−1(y))+is⎧

⎨

⎪⎪

⎩

⎪⎪

⎫

⎬

⎪⎪

⎭

⎪⎪

dψ−1dy (y)dy.

Higher Order Compact Finite Differences Restrict to A=−aΔ in 2D with f ≡0 and unit square domain. We can easily construct 4th and 6th order 9 point discrete stencils. Define

σ s=u j,k+1+u j+1,k+u j,k−1+u j−1,k ,

σc=u j+1,k+1+u j+1,k−1+u j−1,k−1+u j−1,k+1,

ψ s=(u0) j,k+1+(u0) j+1,k+(u0) j,k−1+(u0) j−1,k , and

ψ c=(u0) j+1,k+1+(u0) j+1,k−1+(u0) j−1,k−1+(u0) j−1,k+1.

Then the stencils have the form A0u j,k+Asσ s+Acσc=B0(u0) j,k+Bsψ s+Bcψ c,

for j,k=1, , Nx and u j,k=0 if jk( j−Nx)(k−Nx)=0.

4th order Dirichlet problem: A0=

10a3 +h2z 1+h2z12a

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

, As=−2a3 , Ac=−a6 ,

B0=h223+

h2z12a

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

, Bs=h212 , and Bc=0

Neumann problem: modify stencil and Bc.

6th order Dirchlet problem:

A0=10a3 +h2 46z45 +

h2z12a+

h4z3360a2

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

, As=− 2a3 +h2z90

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

,

Ac=− a6−h2z180

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

, and B0=h2 1+h2z12a+

h4z2360a2

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

.

The right hand side is given by

B0(u0) j,k+h412+

h6z360a

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟(u0xx+u0yy)+

h6360(4u0xxyy+u0xxxx+u0yyyy).

Parabolic Example Consider

ut−uxx+uyy

5π2 =0, u(x,y,0)=esin(2πx)sin(πy)

with exact solution u(x,y,t)=e1−tsin(2πx)sin(πy). The Laplace transformed problem is given by zu− uxx+uyy

5π2 =esin(2πx)sin(πy) in [0,1]2, u=0 on ∂[0,1],

where

Γ=z∈ | z(ω)+isω, ω∈[−∞→∞],ζ (ω)=γ − ω2+υ2, ω(y)=2

τ tanh−1y=1τ log1+y

1−y

⎧

⎨

⎪⎪⎪

⎩

⎪⎪⎪

⎫

⎬

⎪⎪⎪

⎭

⎪⎪⎪.

Using [J.A.C. Weideman, and L.N. Trefethen, Parabolic and

hyperbolic contours for computing the Bromwich integral. Math.

Comp., 76 (2007), pp. 1341–1356], we get

α =1.1721, a(α)=cosh−1 2α(4α−π )sinα

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

, γ =4πα−π2a(α) ⋅Nzt ,

υ=γ sin(α), s=γ cot(α), and τ = log(2Nz−1)

γ sin(α)sinh a(α)Nz−1Nz

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

.

Since we know α , γ =134.8, υ=124.2, s=0.4213, and τ =0.02633. Holy cow! ☺

Numerical Experiments for Example We use ,

, and

is

MADPACK used as solver in LT code [C.C. Douglas, Madpack: a family of abstract multigrid or multilevel solvers, Comput. Appl. Math., 14 (1995), pp. 3–20].

Nz=30Nx= 10,20,40,80

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

z∈Γ

4th order compact scheme

4th order high order ADI method (Karaa code) 100, 500, and 1,000 time steps (one solve per step)

Laplace transform takes Nz=30 solves only. LT is the clear winner. ☺

6th order compact scheme

Laplace transform versus Parareal (6th order) 44 Intel Xeon quad core processor cluster with 1 Gb/sec Ethernet, Nz=32, and Nx=160.

Small Parallel Computer Clusters Choose Nz points on the hyperbola contour. For P=CNz

Nz processing cores, where CN is small, then

the Laplace transform is the obvious choice (if applicable), particularly if CNz

=1.

Reasoning: Parareal may use too many iterations, whereas we know we have Nz-fold parallelism trivially with the Laplace transform (and more with parallel solvers).

Conclusions

• There are at least two families of robust algorithms to create parallel in both time and space (parabolic) PDE solvers for parallel computers. o Other methods? Sinc methods

• For one processing core, neither is appropriate. • For a small number of computing cores, the Laplace

transform is clearly the better choice if it is applicable. o Solve Laplace transformed problems in parallel to get

highly parallel solver. • For a large number of processing cores, Parareal is clearly

the method of choice to try first. • Big Question answer: No clear answer #

References

• A. Deshpande, S. Malhotra, C.C. Douglas, and M.H. Schultz, A rigorous analysis of time domain parallelism, Parallel Algorithms and Applications, 6 (1995), pp. 53–62.

• C.C. Douglas, Madpack: a family of abstract multigrid or multilevel solvers, Comput. Appl. Math., 14 (1995), pp. 3–20.

• C.C. Douglas, I. Kim, H. Lee, and D. Sheen, Higher-order schemes for the Laplace transformation method for parabolic problems, Comput. Visual Sci., 14 (2011), pp. 39–47.

• M.J. Gander and S. Vandewalle, Analysis of the parareal time-parallel time-integration method, SIAM J. Sci. Comput., 29 (2007), pp. 556–578.

• S. Karaa and J. Zhang, High order ADI method for solving unsteady convection-diffusion problems, J. Comput. Phys., 198 (2004), pp. 1–9.

• H.B. Keller, Numerical Solution of Two-Point Boundary Value Problems (CBMS-NSF Regional Conference Series in Applied Mathematics), SIAM, 1976.

• J.L. Lions, Y. Maday, and G. Turinici, A parareal in time discretization of PDE’s, C.R. Acad. Sci. Paris Ser. I Math., 332 (2001), pp. 661–668.

• J.A.C. Weideman and L.N. Trefethen, Parabolic and hyperbolic contours for computing the Bromwich integral, Math. Comp., 76(2007), pp. 1341–1356.

the laplace transform versus parareal -...

Documents