class19_pde-i-handout.pdf

7/29/2019 class19_pde-I-handout.pdf

1/78

Partial Differential Equations in HPC, I

M. D. Jones, Ph.D.

Center for Computational ResearchUniversity at Buffalo

State University of New York

High Performance Computing I, 2012

M. D. Jones, Ph.D. (CCR/UB) Partial Differential Equations in HPC, I HPC-I Fall 2012 1 / 84


2/78

Introduction Types

Types of PDEs

Recapitulate - types of partial differential equations (2nd order, 2D):

Axx + 2Bxy + Cyy + Dx + Ey + F = 0

Three major classes:Hyperbolic (often time dependent, e.g., wave equation)

Elliptic (steady state, e.g., Poissons Equation)

Parabolic (often time dependent, e.g. diffusion equation)

(nonlinear when coefficients depend on )



3/78

Introduction Hyperbolic

Hyperbolic Problems

Hyperbolic problems:

det(Z) = A B

B C < 0

Examples: Wave equation

Explicit time steps

Solution at point depends on neighboring points at prior time step



4/78

Introduction Elliptic

Elliptic Problems

Elliptic Problems:

Z =

A B

B C

= positive definite

Examples: Laplace equation, Poisson Equation

Steady-state problems

Solution at points depends on all other points

Locality is tougher than hyperbolic problems



5/78

Introduction Parabolic

Parabolic Problems

Parabolic Problems:

det(Z) = A B

B C

= 0

Examples: Heat equation, Diffusion equation

Needs an elliptic solve at each time step


I d i P b li


6/78

Introduction Parabolic

Elements, Differences, Discretizations ...

There are many forms of discretization that can be used on these

systems:

Finite Elements

Finite Differences

Adaptive combinations of the above

for our purposes, though, we can illustrate many of the basic ideas

using a simple one-dimensional example on a regular mesh.


B d V l P bl Si l Elli ti E l i 1D


7/78

Boundary Value Problems Simple Elliptic Example in 1D

Simple Elliptic Example in 1D

For starters we consider the following boundary value problem:

d2

dx2= f(x), x [0, 1],

with initial conditions:

d/dx(0) = 0, (1) = 0.

The plus side of this example is that we have an integral general

solution:

(x) =

1

x

dt

t

0

f(s)ds




8/78


Mesh Approach

Now we apply a regular mesh, i.e. a uniform one defined by N points

uniformly distributed over the domain:

xi = (i 1)x, x = h= 1/N.

Using a mid-point formula for the derivative:

2hd/dx(x) = (x + h/2) (x h/2),

Applying this central difference formula twice yields

(x h) + 2(x) (x + h) = h2d2/dx2(x),

or in another form

n1 + 2n n+1 = h2

n,




9/78


So now our original boundary value problem can be expressed in finite

differences as

n1 + 2n n+1 = h2fn

and this is now easily reduced to a matrix problem, of course. Butbefore we go there, we can generalize the above to use an arbitrary

mesh, rather than uniform ...




10/78


Finite Differences on an Arbitrary Mesh

Now consider a (possibly) non-uniform mesh, discretized according to0 = x0 < x1 < ... < xN+1 = 1. The discretized boundary value problemthen becomes

2

xn+1 xn1

n+1 n

xn+1 xn

n n1

xn xn1 = f(xn)

Let hn = xn xn1, and we can scale by (hn+1 hn)/2:

hn+1 hn2

f(xn) = n+1 n

hn+1+

n n1hn

,

=

1

hn+1+

1

hn

n

1

hn+1n+1

1

hnn1. (1)




11/78


Eq. 1 is a symmetric tridiagonal matrix, which can be shown to bepositive definite. The i-th row (1 < i < N) has entries:

ai,i1 = 1

hiai,i =

1

hi+

1

hi+1ai,i+1 =

1

hi+1

and the resulting matrix equation looks like:

A = F

and we can subject this matrix equation to our full arsenal of matrix

solution methods.




12/78


Finite Elements for the Boundary Value Problem

Using a mesh-based discretization is not the only way, of course. Thefinite element method uses piecewise-linear hat functions, ui which

are unity at the i-th node, xi, and zero elsewhere. The mathematical

basis is the variational formulation:

10

(x)u

i(x)dx =1

0f(x)ui(x)dx,

and an approximation scheme for the general integral, e.g. the

trapezoidal rule:

xi+1

xi

f(x)dxx

2(f(xi) + f(xi+1)) .


Boundary Value Problems Matrix Factorization


13/78

y

Matrix Factorization

This formulation reduces to the same form as our finite difference one,

and can also be recast for an arbitrary mesh.

Regardless of whether we used finite elements or finite differences,

our simple 1D example reduced to the simple tridiagonal form:

A = F,

where ...




14/78

y

the tridiagonal matrix A is given by:

A =

1 1 0 0 . . . 01 2 1 0 . . . 0

0 1 2 1 . . . 0.

..

.

..

.

..

.

..

.

..

.

..0 . . . 1 2 1 00 . . . 0 1 2 10 . . . 0 0 1 2

and F1 = h2f(x1)/2, Fi = h2f(xi).




15/78

Direct Solution

The tridiagonal matrix A can be factored very simply into a product of

lower (L) and upper (LT) bidiagonal matrices:

L =

1 0 0 0 . . . 01 1 0 0 . . . 0

0 1 1 0 . . . 0...

......

......

...

0 . . . 1 1 0 00 . . . 0 1 1 0

0 . . . 0 0 1 1




16/78

LT =

1 1 0 0 . . . 00 1 1 0 . . . 00 0 1 1 . . . 0...

......

......

...

0 . . . 0 1 1 00 . . . 0 0 1 10 . . . 0 0 0 1

note that LT is just the transpose of L.




17/78

So this starts to look quite familiar to our old friend, LU decomposition.

We introduce the auxiliary vector Y which solves by forward

substitution:

LY = F,

and then the solution is given by backward solution

LT = Y,




18/78

The pseudo-code for this forward and backward solution is given by

the following:

y ( 1 ) = f ( 1 )do i =2,N

y ( i ) = y ( i 1) + f ( i )

end dop h i ( N) = y (N )do i =N1,1,1

p h i ( i ) = p h i ( i 1) + y ( i )end do


Boundary Value Problems Parallel Direct Triangular Solver


19/78

Parallel Prefix for Tridiagonal Systems

Let us consider the solution of a general tridiagonal system given by:

A =

a1 b1 0 0 . . . 0c1 a2 b2 0 . . . 0

0 c2 a3 b3 . . . 0...

......

.... . .

...

which we factor:

A = LU




20/78

and the factors L and U are lower and upper bidiagonal matrices,

respectively,

L =

1 0 0 0 . . . 0l1 1 0 0 . . . 00 l2 1 0 . . . 0...

..

.

..

.

..

.

. ..

..

.

U =

d1 e1 0 0 . . . 00 d2 e2 0 . . . 00 0 d3 e3 . . . 0...

.

.....

.

.... .

.

..




21/78

Matching up terms in Aij = (LU)ij yields the following three equations:

ci1 = li1di1 (i, i 1) (2)

ai = li1ei1 + di (i, i) (3)

bi = ei (i 1, i) (4)

From Eq. 2 we obtain li1 = ci1/di1 and substituting into Eq. 3yields a recurrence relation for di:

di = aici1ei1

di1= ai +

fidi1

,

where fi = ci1ei1.




22/78

The parallel prefix method is then used for solving the recurrence

relation for the di by defining di = pi/qi, and expressing in matrix form:

piqi

= ai + fiqi1/pi1 =

ai fi1 0

pi1qi1

= Mi

pi1qi1

= MiMi1 . . . M1

p0q0

(5)

where f1 = 0, p0 = 1, and q0 = 0. In other words, we can write therecurrence relation as the direct product of a sequence of 2x2 matrix

multiplications.



P ll l P fi


23/78

Parallel Prefix

The parallel prefix method is then used (in parallel) to quickly evaluatethe product in Eq. 5 by using a binary tree structure to quickly evaluate

the individual products. The following figure illustrates this process

(albeit for only eight terms): 7M1M 2M 3M 4M 5M 6M M 8

4M3M x 5M 6Mx 7M 8Mx1M 2Mx

1M 2M 3M 4Mxx x 5M 6M 7M 8Mxx x

1M 2M 3M 4Mxx x 5M 6M 7M 8Mxx xx

Note that this procedure is quite generally valid for any associative

operation (i.e. (A B)C = A (BC)), and can be done in onlyO(log2N) steps.



Parallel Efficiency


24/78

Parallel Efficiency

The time taken for this parallel method is then given by:

(P) 2c1n

P+ 2c2 log2 P,

where c1 is the time taken for 1 Flop, and c2 that for one step of

parallel prefix using 2x2 matrices. The parallel efficiency is then just

1

E(P) 2 +

2Plog2 P

n,

where = c2/c1.

E(P) 1

2 + 2,

with n Plog2 P.


Iteration, Iteration, Iteration, ... Back to Jacobi

Recapitulate


25/78

Recapitulate

Recall that in solving the matrix equation Ax = b Jacobis iterativemethod consisted of repeatedly updating the values of x based on their

value in the previous iteration:

x(k)i = 1Ai,i

bij=i

Ai,jx(k1)j ,

we will come back to convergence properties of this relation a bit later.

Now, however, we have a matrix A that has been determined by our

choice of a simple stencil for the derivatives in our PDE, which is verysparse, making for a large reduction in the workload.



Iterative Techniques


26/78

Iterative Techniques

Now we consider, for our simple 1D boundary value problem used inthe last section, iterative methods for solution. The simplest scheme is

Jacobi iteration. For the general mesh in 1D,

k+1n =hn

hn+1 + hn

kn+1 +hn+1

hn+1 + hn

kn1 +hnhn+1

2

f(xn),

whereas for our even simpler uniform mesh:

k+11 = k2 + (h

2/2)f(x1),

k+1n =

kn+1 + kn1 + h2f(xn)

/2. n [2, N]

Recall that xN+1 = 1, and (1) = 0.



Parallelization Strategies for Jacobi


27/78

Parallelization Strategies for Jacobi

First, in the classical Jacobi, each iteration involves a matrix

multiplication, plus a vector addition. As long as the solution vectors k

and k+1 are kept distinct, there is a high degree of data parallelism

(i.e. well suited to OpenMP or similar shared memory API).Each iteration takes O(N) work, but typically O(N2 log N) iterations arerequired to obtain the same accuracy as the original discrete equations

(in fact we can solve such a tridiagonal system in O(N) work in theserial case).


Iteration, Iteration, Iteration, ... Gauss-Seidel

Gauss Seidel


28/78

Gauss-Seidel

The Gauss-Seidel method is a variation on Jacobi that makes use of

the updated solution values as they become available, i.e.

k+11 = k2 + (h

2/2)f(x1),

k+1n =

kn+1 + k+1n1 + h

2f(xn)

/2. n [2, N]

(where n is increasing as we proceed). Gauss-Seidel has advantages

- it converges faster than Jacobi, but we have introduced some rather

severe data dependencies at the same time.



Parallel Gauss-Seidel


29/78

Parallel Gauss-Seidel

Gauss-Seidel is made more difficult to parallelize by the datadependencies introduced by the use of the updated solution vectordata on prior grid points. We can break the dependencies byintroducing a tile, N = rP,

k+1

1= k

2+ (h2/2)f(x

1),

k+1ri+1 =1

2

kri+2 +

kri + h

2f(xri+1)

i [1, P 1]

k+1ri+n

=1

2

kri+n+1 +

k+1ri+n1

+ h2f(xri+n)

n [2, r] i [0, P 1]

In this way we get P parallel tasks worth of work. Pseudo-code forMPI-based version of this approach follows ...




30/78

do i t e r = 1 , N _ i t e r a t i on si f ( myID == 0 ) p h i ( 0 ) = p h i ( 1 )i f ( myiD == N_procs1) p h i ( 1 +N / N _p ro cs ) = 0 ! u pp er BC

do i =1,N/Pu ( i ) = 0 .5( u ( i + 1 )+ u ( i 1) ) + f ( i ) ! n ot e h=1

end doSENDRECV( u ( 0 ) to / from myID1 )SENDRECV( u(1 +N/P ) to / from myID+1 )

end do

in practice this method is never used, though (well, ok, as a smoother

in multigrid, sometimes).


Iteration, Iteration, Iteration, ... Reordering Gauss-Seidel

Reordering


31/78

Reordering

So what is done, in practice, for Gauss-Seidel, is to reorder theequations. For example using odd/even combinations:

k+11 = k2 + (h

2/2)f(x1),

k+1

2n1=

1

2k

2n2+ k

2n+ h2f(x

2n1) n [2, N/2]

k+12n =1

2

k+12n1 +

k+12n+1 + h

2f(x2n1)

n [1, N/2]

and this looks very much (it is) like a pair of Jacobi updates. This is

referred to as coloring the index sets, and the above is an example ofthe red-black coloring scheme (looks like a chess board). Generally

you can use more than two colors, albeit with increasing complexity.



Red-Black Ordering in Gauss-Seidel


32/78

Red Black Ordering in Gauss Seidel

The red-black scheme is illustrated in the following examples (this is in2D, where it looks a bit more pretty). Gray squares have been updated.

Left figure is simple Gauss-Seidel, while right is a classic red-black.

Note the 5-point stencil illustrating the data dependence at a particular

grid point.



Parallel Example


33/78

Parallel Example

Updating our example, a two-color scheme is used, but in a slightly

unorthodox way, in which the values phi(k) are one color, and

remaining values phi(1) ...phi(k-1) the other color. This is doneto facilitate the use of message-passing parallelism, and to better

overlap computation with communication.




34/78

k = N /P

t a g _ l ef t = 1i f (myID < Nprocs1) SEND ( p h i ( k ) t o myID +1 , t a g _ l e f t )

do i t e r =1 , N i t e r si f (myID > 0 ) RECV( p h i (0 ) f ro m myID1, t a g _ l ef t )i f ( m yID == 0 ) p h i ( 0 ) = p h i ( 1 )p h i ( 1 ) = 0 . 5 ( p hi ( 0 ) + p hi ( 2 ) ) + f ( 1 )t a g _ ri g h t = 2 i t e ri f (myID > 0 ) SEND( p h i (1 ) t o myID1, t a g _ ri g h t )

do i = 2 , k

1p h i ( i ) = 0 . 5( p h i ( i 1) + p h i ( i + 1) ) + f ( i )end doi f (myID < Nprocs1) RECV ( p h i ( k + 1 ) f r om myID + 1 , t a g _ r i g h t )i f (myID == Nprocs1) p h i ( k + 1) = 0p hi ( k ) = 0 .5 ( p h i ( k + 1 ) + p h i ( k1) ) + f ( k )t a g _ l e f t = 2 i t e r + 1i f (myID < Nprocs1) SEND( p h i ( k ) t o myID +1 , t a g _ l e f t )

end do


Multigrid Methods Multigrid Basics

Multigrid Basics


35/78

u t g d as cs

Multigrid is easy to conceptualize, but considerably more tedious in its

details. In a nutshell:

Jacobi or Gauss-Seidel forms the basic building block

Iterations are halted while problem is transferred to a coarser

mesh

Recursively applied



Multigrid, in a Nutshell


36/78

g ,

For simplicity, consider once again, our linear elliptic problem,

L = f

and if we discretize using a uniform length, h,

Lhh = fh

To this point we have addressed several methods of solving for the

exact solution to the discretized equations (directly solving the matrix

equations as well as relaxation methods like Jacobi/Gauss-Seidel).




37/78

Now, let h be an approximate solution to the discretized equation, andvh = h h. Then we can define a defect or residual as

dh = Lhh fh,

using which our error can be written as

Lhvh = Lhh Lhh = dh




38/78

We have been finding vh by approximating Lh, Lh:

Jacobi: Lh = diagonal part of LhGauss-Seidel: Lh = lower triangular part of Lh:

i =

N

j=i=1

Lijj fi

/Lii

In multigrid, what we are instead going to do is coarsen,

Lh = LH

(and typically H = 2h).




39/78

Using this two-grid approximation,

LHvH = dH

and we need to handle both the fine->coarse (sometimes called

restriction or injection) transition:

dH = Rdh,

and the coarse->fine (prolongationor interpolation)

vh = PvH

and our new solution is just newh = h + vh.




40/78

In summary, our two-grid cycle looks like:

Compute defect on fine grid: dh = Lhh fh

Restrict defect: dH = Rdh

Solve on coarse grid for correction: LHvH = dHInterpolate correction to fine grid: vh = PvH

Compute next approximation, newh = h + vh




41/78

The next level of refinement is that we smooth out higher-frequency

errors, for which relaxation schemes (Jacobi/Gauss-Seidel) are well

suited. In the context of our two-grid scheme, we pre-smooth on the

initial fine grid, and post-smooth the corrected solution.




42/78

Putting the pieces together, the next idea is to develop a full multigrid

by applying the scheme recursively down to some trivially simple

(either by direct or relaxation methods) coarse grid. This is known as a

cycle, but let us first develop a bit more mathematical formalism.



Multigrid Notation


43/78

Some definitions are in order:Let K > 1 define the multigrid levels

Level-zero Grid is our starting coarse grid:

xi = (i 1)x, x = 1/N, 1 i N + 1

Next level, the level-one grid, comes from subdividing level-zero:

xN+2i = xi i [1, N + 1]

xN+2i+1 =1

2 (xi + xi+1) i [1, N]



Multigrid Notation (contd)


44/78

Level-k grid (k is the number of points in level k, 0 = N + 1, andNk is the number of points in all previous levels N0 0,N1 = N + 1):

xNk+2i1 = xNk1+i i [1, k1]

xNk+2i =12

(xNk1+i + xNk1+i+1) i [1, k1 1]

and induction leads to

k = 2k1 1Nk+1 = Nk + k




45/78

As an example, consider the following figure:

5=33

4=17

3=9

2=5

1=3

0=2

which shows 5 levels of a one-dimensional multigrid.



Multigrid Solution


46/78

Let Nk1+1, ..., Nk denote the solution values at level k 1. Thesevalues define a piecewise linear function on the mesh for the k 1-thlevel,

k1(x) =

ki=1

Nk1+iuk1i (x),

where the uk1i (x) basis functions are defined as in the finite elementcase - unity only for the i-th node at level k 1.




47/78

Since the grids are nested, k1(x) is also piecewise linear at level k,for which we can interpolate:

Nk+2i1 = Nk1+i i [1, k1]

Nk+2i =1

2

Nk1+i + Nk1+i+1

i [1, k1 1]

If UNk = 0, then UNk+1 = 0.

It is also necessary to coarsen, or transfer a solution to a coarser grid.

Inverting the coarse-fine refinement is a reasonable choice:

Nk1+i =1

2 Nk+2i1 +1

4

nk+2i2 + Nk+2i

i [1, k1]


Multigrid Methods V-Cycle

Applying Jacobi/Gauss-Seidel


48/78

The Jacobi or Gauss-Seidel method can be applied on any level ofmultigrid. Previously we had (for Gauss-Seidel):

1 2 +1

2h2f(x1)

n1

2

n+1 + n1 + h

2fn

n [2, N]

and now for multigrid:

Nk+1 Nk+2 + FNk+1

Nk+n1

2

Nk+n+1 +Nk+n1 + FNk+n

n [1, k]

where FNk+1 = h2kf(xNk+1)/2, FNk+i = h

2kf(xNk+i).



Applying Jacobi/Gauss-Seidel (contd)


49/78

The Gauss-Seidel/Jacobi iterations are not used for the sake of

convergence, but rather for smoothing the solution, in the sense of

smoothing out error between the approximate and full solution.




50/78

For the k-th level iteration we can write the solution using the

shorthand notation (note this is the converged solution):

Akk = Fk,

after doing, say, mk smoothing steps we write the residual error as

Rk

= Fk

Ak

k

,

where

RNk+1 FNk+1 Nk+1 + Nk+2

RNk+n FNk+n 2Nk+n + Nk+n+1 + Nk+n1 n [2, k]




51/78

The error E = , where is the exact solution of A = F,satisfies the residual equation,

AkEk = Rk.

The important result is

k = k + Ek.

The recursive part comes in solving the residual equation, which is

done on a coarser mesh,

RNk1+i =1

2

RNk+2i1 +1

4RNk+2i2 + RNk+2i i [1, k1]

and so on back to level-zero.




52/78

Now, see why it is called the V-cycle? You go back up the grids

(fine-to-coarse) then back again (the residuals are added back to

using fine-to-coarse steps), resulting in a classic V if you plot thelevels.

1

2

3

4

5

execution time

level


Multigrid Methods Full Multigrid

Full Multigrid


53/78

Repeating the V-cycle on a given level will converge to O(x2) inO(| log x|) iterations

Now that we have all of the pieces, we can outline the full multigrid

V-cycle

Solve Akk = Fk at level 0, 0 = (1, ..., N+1)

1 initially approximated by coarse-to-fine transfer of 0

Level-one proceeds (smoothing, solve level-zero residual

equation, coarse-to-fine interpolation again, more smoothing) r

times

Level-two proceeds as in level-one.

By doing r repetitions at each level, we get the Full Multigrid V-cycle.



Full Multigrid (contd)


54/78

A picture is worth a few hundred words, at least:

E

S

E E

S S

S

S

S

S

E

S

S

S

S

S

E

S

S

S

S

S

S

E

S

S

S

E

execution time

level

where S denotes smoothing, and E the exact solution on the coarsest

grid (illustrated is 4-level, 2-cycle, i.e. K=4, r=2).



Full Multigrid (contd)


55/78

Without proof, it can be shown that full multigrid converges to O(x2)in O(1) iterations. For large enough r, each k-th level iteration willsolve to an accuracy O(x2k). In practice, r = 10 is typically sufficient.


Multigrid Methods Parallel Multigrid

Multigrid in Parallel


56/78

The good news is that multigrid easily parallelizes in basically the

same way as Jacobi/Gauss-Seidel. We have basically two parts:

1 Smoothing, which is indeed Jacobi/Gauss-Seidel, and can be

parallelized in exactly the same fashion, with little modification

2 Inter-grid transfers (refinement & un-refinement is one way of

looking at these interpolations) - these are relatively trivial to deal

with, as only elements at the ends of the arrays need to be

transferred



Multigrid Scaling


57/78

Now lets examine the scaling behavior of multigrid.

Basic V-cycle, we estimate the time as

V K1k=0

c1N2k

P,

where c1 is the time spent doing basic floating-point operations

during smoothing.

Communication is involved just at exchanging data at the ends of

each interval at each step in the V-cycle,

comm latK,



Multigrid Scaling (contd)


58/78

Solving on the coarse grid at level K, for which there are N2K

unknowns. We need to collect the right-hand sides at each

processor, proportional to N2K

. Calling the constant ofproportionality c2,

K c2N2K



Multigrid Scaling, contd


59/78

Putting the pieces together, the overall time is

(P) c1N2P

+ latK + c2N2K

If we equate the first and last terms, P = 2K+1/, where = c2/c1. Inthat case,

(P) 4c1NP

+ lat log2 (P/2) ,

and the efficiency

E1(P) 4 +Plog2 (P/2)

c1N

E(P) 1

4

Plog2 (P/2)

c1N,




60/78

which implies (finally!) that the full multigrid is scalable if

Plog2 P N, and

E(P)

1

4 + /c1



MG Illustrated


61/78

Flow through square duct with 90

bend, LR = Line Relaxation (Gauss-Seidel), PR = PointRelaxation (Gauss), DADI = (diagonalized) Alternating Direction Implicit, (S)MG =

(Single)Multi-Grid. Work units normalzied to DADI-SG iterations.

L. Yuan, "Comparison of Implicit Multigrid Schemes for Three-Dimensional Incompressible Flows," J. Comp. Phys. 177, 134-155

(2002).


Multidimensional Problems Moving to Multiple Dimensions

Moving to Multiple Dimensions


62/78

Yes, in fact there are problems requiring more than one dimension ...

However, what we have done thus far in treating a one dimensional

elliptic problem with mesh methods is easily transferable to 2D and3D.

Lets look at our elliptic problem in 2D ...


Multidimensional Problems 2D Example

2D Example


63/78

We start with

2

x2

2

y2= f(x, y), x, y [0, 1]

and for simplicity we take strictly Dirichlet boundary conditions, = 0on the boundary. Again define a regular mesh, xi = (i 1)x,yj = (j 1)y. Again for simplicity we will take N mesh points in eachdirection. Using finite differences or finite elements (and we could

easily extend to an arbitrary mesh) results in a simple system of linear

equations ...




64/78

For a regular tensor product (tensor product of one-dimensional

bases) mesh,

A =

1

h2

BN IN 0 0 . . . 0IN BN IN 0 . . . 0

0 IN BN IN . . . 0...

.. .

.. .

.. .

.. .

.

..0 . . . IN BN IN 00 . . . 0 IN BN IN0 . . . 0 0 IN BN

where IN is just the NN identity matrix, and ...




65/78

BN =

4 1 0 0 . . . 01 4 1 0 . . . 0

0 1 4 1 . . . 0...

. . .. . .

. . .. . .

...

0 . . . 1 4 1 00 . . . 0 1 4 10 . . . 0 0 1 4

and note that the numerical entries in BN are for 2D, using two-point

central differences.



66/78

Initial Value Problems Basics

Forward to the Future


67/78

Initial value problems can be a bit more difficult to handle with scalableparallel algorithms. In a similar vein to our simple example to illustrate

boundary value problems, we can illustrate initial value problems using

a simple physical problem - the massless pendulum, which obeys:

d2u

dt2= f(u),

with initial conditions u(0) = a0, u

(0) = a1. In this context u is theangle that the pendulum makes with respect to the force of gravity, and

f(u) = mgsin(u).



Finite Differences Redux


68/78

Applying our usual central difference scheme using a two-point formula

yields:

un+1 = 2un un1 f(un),

where = t2

. we can make this a linear problem by taking the smallangle approximation

sin(u) u+ . . . u


69/78

In the linear problem (small oscillations in the case of our pendulum):

d2u

dt2= mgu

and the difference equations become a simple recursion relation:

un+1 = (2 mg)un un1

with starting values

u0 = a,

u1 = a0 a1t,

which will allow for the solution for all n 0.



Matrix/Linear Form


70/78

We can write the linear form using a very simple matrix-vectorequation:

LU = b

or

1 0 0 0 0 . . .T 1 0 0 0 . . .1 T 1 0 0 . . .0 1 T 1 0 . . .0 0 1 T 1 . . ....

.

.

....

.

.

....

. . .

u1u2u3u4u5...

=

(1 mg)a0 + a1ta0

000

.

.

.

where T = (mg 2). For this set of linear equations we can bringour usual bag of tricks in terms of scalable parallel algorithms.



Nonlinearity


71/78

If we do not make the linear approximation in our simple example, we

do not have such an easy (or at least well traveled) path to follow.

Using a similar matrix-vector notation,

F(U) = L0U + mg(U) b = 0 (6)

where L0 is of similar form as before, but with the terms removed (i.e.diagonal elements are 2). The (U) term can also be written as amatrix,

(U)ij = i1,j sin(ui1),

where ij is the Kronecker delta-function, and this term contains all ofour nonlinearities.


Initial Value Problems Newton-Raphson Method

Newton-Raphson Method


72/78

The Newton-Raphson method is one of the most powerful in terms of

solving nonlinear problems. Basically we want to solve Eq. 6,

F(U) = 0

using a root-finding method, i.e. when F(U) crosses the U axis. So for

a given guess Un we drop a tangent line to F and find where that linecrosses the axis:

0 F(Un)

Un+1 Un= F

(Un),

or

Un+1 = Un F(Un

)F

(Un).



Convergence of Newton-Raphson


73/78

Recall that Newton-Raphson has some very nice convergence

properties, provided that you start close to a solution:

|Un+1 Un| C|Un Un1|2

namely the number of correct figures doubles with each iteration.



Multidimensional Newton-Raphson

Extending the previous relations to multiple dimensions is


74/78

Extending the previous relations to multiple dimensions is

straightforward: we replace F

(Un) by the Jacobian matrix of all partial

derivatives of F with respect to vector U:

JF(U)ij =Fi(U)

uj,

and then solve:

JF(Un)V = F(Un),

Un+1 = Un + V.

and repeat until convergence. In our simple pendulum case, we have

J(U)ij = i1,j cos(ui1).


Initial Value Problems Starting Newton-Raphson

Starting Newton-Raphson

So if Newton Raphson converges quickly given a good starting


75/78

So if Newton-Raphson converges quickly given a good starting

approximation, how do we ensure that we have a good starting point?

Generally there is not a good answer to that question, but oneapproach is to use a simplified approximation, f to the nonlinear

function f, and then solve

d2u

dt2 = f(u),

along with the original initial conditions. In our pendulum example, we

could have used f(u) = u instead of f(u) = sin(u). This starting pointgives us a residual,

R(u) =d2u

dt2 f(u) = f(u) f(u).


Initial Value Problems Starting Newton-Raphson

and if we view v as a correction to u we can write:


76/78

and if we view v as a correction to u, we can write:

d2

vdt2

= vf(u) R(u)

using a simple Taylor expansion of f(u+ v) f(u) + vf

(u) + O(v2).Or in other terms:

d2v

dt2= vf

(u)

f(u) f(u)

.

The size of v is zero at the start (thanks to the initial conditions), so its

size can be tracked as the calculation of u proceeds, and provides a

way of monitoring the calculation error.


Initial Value Problems Other Methods

Other Methods


77/78

It is worth mentioning other techniques for initial value problems as

well:

Multigrid in the time variable - start coarse, and them improve.

See, for example:

L. Baffico et. al., Parallel-in-time Molecular-Dynamics

Simulations, Phys. Rev. E 66, 057701 (2002)Wave-form relaxation for a system of ODEs, using domain

decomposition rather than parallelize over the time domain. See,

for example:

J. A. McCammon et. al., Ordinary Differential Equations of

Molecular Dynamics, Comput. Math. Appl. 28, 319 (1994).


Initial Value Problems Other Methods

And Then?


78/78

Of course, this has been a mere sampling of parallel methods for

partial differential equations. Hopefully it has given you a feel for at

least the general principles involved.

In the next discussion of PDEs, we will focus more on available HPC

solution libraries and specialized methods.


class19_pde-i-handout.pdf

Documents