fast finite volume methods for fpdes - brown universityhong wang, university of south carolina...

Fast finite volume methods for FPDEs

MURI Webinar, May 16 2016

Hong Wang

Department of Mathematics, University of South Carolina

[email protected]

Partially supported by ARO MURI grant W911NF-15-1-0562and NSF grant DMS-1216923

Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 1 / 46

Acknowledgements

T. Basu, Occidental College

H. Chen, Shandong Normal University

A. Cheng, N. Du & X. Zhang, Shandong University

W. Cheung, X. Guo, C. Wang & S. Yang, University of South Carolina

H. Fu, China University of Petroleum

J. Jia, Fudan University

Z. Li, D. Yang & S. Zhu, East China Normal University

Y. Ren, Qilu University of Technology

H. Tian, Ocean University of China


Conservative FDEs (del-Castillo-Negrete et al. 2004; Ervin & Roop 2005)

−D(K(x)

(θ C,l

0 D1−βx u− (1− θ) C,rx D1−β

1 u))

= f(x), x ∈ (0, 1),

u(0) = ul, u(1) = ur, 0 < β < 1, 0 ≤ θ ≤ 1.(1)

derived from a local mass balance + a fractional Fick’s law.

θ is the weight of forward versus backward transition probability.

The left- and right-fractional integrals, Caputo and Riemann-Liouvillefractional derivatives are defined by

0Iβxu(x) = 0D

−βx u(x) :=

∫ x

0

(x− s)β−1u(s)

Γ(β)ds,

xIβ1 u(x) = xD

−β1 u(x) :=

∫ 1

x

(s− x)β−1u(s)

Γ(β)ds,

C,l0 D1−β

x u := 0IβxDu,

C,rx D1−β

1 u := −xIβ1Du,R,l0 D1−β

x u := D 0Iβxu,

C,rx D1−β

1 u := −D xIβ1 u.

(2)


A finite volume method (FVM) for conservative FDE (1) with ul = ur = 0

Conservative and non-conservative FDEs are not equivalent.

Finite element/volume methods are suited for conservative FDEs.

Finite difference methods are suited for nonconservative FDEs.

In many applications, local mass conservation is crucial.

A finite-volume scheme naturally has second-order accuracy in space,without a Richardson extrapolation as in finite difference methods.

Let u =∑Nj=1 ujφj , u := [u1, u2, . . . , uN ]T , f := [f1, f2, . . . , fN ]T ,

A := [Ai,j ]Ni,j=1. Integrating (1) over (xi− 1

2, xi+ 1

2) yields

Au = f, fi :=

∫ xi+1/2

xi−1/2

f(x)dx, 1 ≤ i, j ≤ N.

Ai,j :=[K(x)

(θ C,l

0 D1−βx u− (1− θ) C,rx D1−β

1 u)]x=xi−1/2

x=xi+1/2

.

(3)


An efficient storage of A and a fast matrix-vector multiplication of Av

Theorem

A = γ(β)(K− T

β,NL +K+ T β,NR

), K± := diag

(K(xi± 1

2

)Ni=1

)(4)

where T β,NL and T β,NR are Toeplitz matrices of order N .

K(x) appears inside the (first-order) derivative in (1), but (4) and sooptimal storage and fast matrix-vector mutplication still hold.

For problem (1), the condition number κ(A) = O(h−(2−β)).

The number of Krylov iterations is O(h−(1−β/2)) = O(N1−β/2), leadingto an overall computational complexity of O(N2−β/2 logN).

This calls for an effective and efficient preconditioner.


A preconditioned iterative solver for (1) with θ = 1/2 (W. & Du 2013)

Theorem

M := T β,NL + T β,NR is a symmetric and positive-definite, Toeplitz matrix.

Outline of (a perburbation) proof: Let K0 := diag(K(xi)Ni=1

).

γ(β)−1K−10 A

= K−10 K− T

β,NL +K−1

0 K+ T β,NR

= K−10

[K0 + (K− −K0)

]T β,NL +K−1

0

[K0 + (K+ −K0)

]T β,NL

= M +K−10

[(K− −K0)T β,NL + (K+ −K0)T β,NR

]= M +O(h).

(5)

M is a good preconditioner for the finite volume scheme (3)(K−1

0 K− Tβ,NL +K−1

0 K+ T β,NR

)u = γ(β)−1K−1

0 f. (6)


An example run by a preconditioned fast FVM

The data in (1):β = 0.2, θ = 0.5, K(x) = Γ(1.2)(1 + x), ul = ur = 0.

The true solution u(x) = x2(1− x)2, f is computed accordingly

Gauss CGSN ‖u− uG‖L∞ CPU(s) ‖u− uC‖L∞ CPU(s) Itr. #

25 2.018× 10−4 0.000 2.018× 10−4 0.000 32

26 5.157× 10−5 0.000 5.157× 10−5 0.000 65

27 1.294× 10−5 0.000 1.294× 10−5 0.016 128

28 3.214× 10−6 0.047 3.214× 10−6 0.141 217

29 7.893× 10−7 0.500 7.893× 10−7 3.359 599

210 1.887× 10−7 7.797 1.886× 10−7 2 m 2 s 1,110

211 4.030× 10−8 2 m 38 s 4.047× 10−8 21 m 13 s 2,624

212 6.227× 10−9 24 m 29 s 7.468× 10−8 4 h 19 m 7,576

213 5.783× 10−9 3 h 27 m N/A > 2 days > 20,000FCGS PFCGS

‖u− uF ‖L∞ CPU(s) Itr. # ‖u− uS‖L∞ CPU(s) Itr. #

25 2.018× 10−4 0.000 32 2.018× 10−4 0.000 6

26 5.157× 10−5 0.016 63 5.157× 10−5 0.000 5

27 1.294× 10−5 0.031 128 1.294× 10−5 0.000 5

28 3.214× 10−6 0.125 248 3.214× 10−6 0.006 5

29 7.893× 10−7 0.578 576 7.893× 10−7 0.016 5

210 1.886× 10−7 2.281 1,078 1.887× 10−7 0.047 5

211 4.037× 10−8 9.953 1,997 4.038× 10−8 0.078 5

212 1.587× 10−8 57.27 5,130 6.194× 10−9 0.188 5

213 2.372× 10−8 2 m 52 s 7,410 4.345× 10−9 0.391 5


Observations

Use the numerical solutions by Gaussian elimination as a benchmark:

The conjugate gradient squared (CGS) method diverges, due tosignificant amount of round-off errors.The fast CGS (FCGS) reduced the CPU time significantly, as theoperations for each iteration is reduced from O(N2) to O(N logN).

The number of iterations is still O(N1−β/2),It is less accurate than Gaussian at fine meshes due to round-off errors.

The preconditioner M is optimal, so the preconditioned FCGS(PFCGS) has an overall computational cost of O(N log2N).

It significantly reduces round-off errors.It generates more accurate solutions than Gaussian elimination.It further reduces CPU time.


Regularity of the boundary-value problem of FDEs

Error estimates were proved for numerical methods for FDEs, underthe assumption that the true solution is smooth.

For integer-order elliptic or parabolic PDEs, smooth data (anddomain for multi-D problem) =⇒ smooth solution.

u(x) = (x2−β − x1−β)/Γ(3− β) /∈W 1,1/β(0, 1) is the solution of

D(

0D−βx Du

)= 1, x ∈ (0, 1), u(0) = u(1) = 0 (7)

In particular, u /∈ H1(0, 1) for 1/2 ≤ β ≤ 1.

For FDEs smooth data does not ensure smooth solutions

No conditions in the literature to ensure smooth solutions to FDEs.The Nitsche-lifting based proof of optimal-order L2 error estimates inthe literature does not hold even for constant K > 0.What conditions ensures that high-order methods =⇒ high-orderconvergence rates?Solutions may have boundary layers and other singularity, which needto be resolved numerically.


An FVM on a gridded mesh (Jia et al., 2014)

Solutions to FDEs with smooth data and domain may have boundarylayers, a uniform mesh is not effective.

Finite-difference methods out of the question, as Grunwald-Letnikovderivatives are inherently defined on uniform meshes.Riemann-Liouville and Caputo derivatives offer such flexibilities.

Bebause of the nonlocal nature of FDEs, a numerical schemediscretized on an arbitrarily adaptively refined mesh

offers great flexbility and effective approximation propertyoffers possible advantage on its theoretical analysisdestroys the structure of its stiffness matrix and so efficiency.

Motivation: balancing flexibility and efficiency.


The structure of the stiffness matrix

We assume a geometrically refined mesh towards the left endpoint.

Theorem

The matrix A can be decomposed as

A =1

Γ(β + 1)

[diag(K−)

(γQl + (1− γ)Qr

)−diag(K+)

(γPl + (1− γ)Pr

)]diag

(hβ−1

i mi=1

).

Pl, Pr, Ql and Qr are Toeplitz.A has an additional diagonal matrix (reflecting the impact ofthe mesh sizes) multiplier to that on the uniform mesh.


Numerical experiments of a one-sided FDE on a gridded mesh

Consider (1) with K = 1, f = 0, β = 0.98, θ = 1, ul = 0, ur = 1, i.e.,

D(

0D−βx Du

)= 0, x ∈ (0, 1),

u(0) = 0, u(1) = 1

Its solution u(x) = x1−β for x ∈ (0, 1).

N CPU #of iterations

Gauss 256 0.640s512 5.567s1024 59s

CGS 256 2.978s 256512 29s 512

1024 403s 1024

FCGS 256 0.073s 256512 0.139s 512

1024 0.391s 1024


Figure: First row: numerical solutions on a uniform mesh of n = 256, 512, 1024;Second row: numerical solutions on a geometrically refined mesh n = 48, 64, 96.

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

numerical solutionexact solution

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8



An FVM on a locally refined composite mesh (Jia & W. 2016)

Solutions to FDEs with smooth data and domain may have boundarylayers. Numerical solution of FDEs

with a uniform mesh is not effective.with a gridded mesh may resolve the boundary layers, but does notnecessarily provide an accurate global approximation.

We propose to use a composite mesh that consists of

a uniform mesh in most of the domain,a gridded mesh in the cells near the (left) boundary.

The key issue is the structure of the stiffness matrix:

A =

[Al,l Al,rAr,l Ar,r

].

(8)

Ar,r, corresponding to the uniform mesh, has a Toeplitz-like structure.Al,l, corresponding to the gridded mesh, has a Toeplitz-like structurewith an extra right diagonal multiplier.


The structure of the off-diagonal submatrices in the stiffness matrix

The off-diagonal submatrices Al,r and Ar,l

are full due to the nonlocal nature of FDEs,are not Toeplitz-like.

Theorem

Al,r =(1− γ)hβ−1

Γ(β + 1)

(diag(K−l )E − diag(K+

l )D),

Ar,l =γ

Γ(β + 1)(diag(K−r )H − diag(K+

r )G)diag(hβ−1i mi=1).

The typical entries of D and E are of the form

di,j = 2(j + 1− 3 · 2i−m−1)β − (j − 3 · 2i−m−1)β − (j + 2− 3 · 2i−m−1)β ,

gi,j =[2m−j+1

(i+

3

2

)− 1]β− 3

2

[2m−j+1

(i+

3

2

)− 2]β

+1

2

[2m−j+1(i+

3

2

)− 4]β.


Use a fractional binomial expansion, we have

D ≈ −2

(β

2

)[1, 1, . . . , 1]T

[ 1

22−β ,1

32−β , . . . ,1

(n− 1)2−β

]−2

(β

4

)[1, 1, . . . , 1]T

[ 1

24−β ,1

34−β , . . . ,1

(n− 1)4−β

]+18

(β

3

)[2−m, 2−m+1, . . . , 2−1]T

[ 1

23−β ,1

33−β , . . . ,1

(n− 1)3−β

]−108

(β

4

)[2−2m, 2−2m+2, . . . , 2−2]T

[ 1

24−β ,1

34−β , . . . ,1

(n− 1)4−β

].

The matrices can be approximated by a finite sum of low-rank matrices.The matrix-vector multiplication can be performed in O(N) operations.


A block-diagonal preconditioner

We developed a preconditioner based on T. Chan’s circulantpreconditioner Cn, which minimizes ‖A− Cn‖F over all circulantmatrices.

We define a block-diagonal-circulant-block preconditioner M for A

M :=

[M1 00 M2

](9)

M1 is a preconditioner for Al,lM2 is a preconditioner for Ar,r


Numerical experiments of a one-sided FDE on a composite mesh

Consider (1) with K = 1, f = 0, θ = 1, β = 0.9, ul = 0, ur = 1, i.e.,

D(

0D−βx Du

)= 0, x ∈ (0, 1),

u(0) = 0, u(1) = 1

Its solution u(x) = x1−β for x ∈ (0, 1).

n ‖un − u‖ ‖un,m − u‖ ‖un,m − u‖128 4.3546× 10−1 2.6805× 10−1, m = 7 2.0315× 10−1, m = 11256 4.0630× 10−1 2.3336× 10−1, m = 8 1.3403× 10−1, m = 16512 3.7909× 10−1 2.0315× 10−1, m = 9 8.2504× 10−2, m = 22

1024 3.5370× 10−1 1.7685× 10−1, m = 10 3.8488× 10−2, m = 328192 2.8730× 10−1 1.6668× 10−1, m = 13 N/A


Figure: First row: numerical solutions on a uniform mesh of n=256, 8192;Second row: numer. solns. on a composite mesh with n = 256 and m = 8, 16.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

numerical solution

exact solution

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

numerical solution

exact solution

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

numerical solution

exact solution

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

numerical solution

exact solution


Numerical experiments of a two-sided FDE on a locally refined composite mesh

Consider (1) with K = 1, θ = 0.5, β = 0.95, ul = 0, ur = 1,

f(x) =(1− γ)(1− β)

Γ(β)x(1− x)1−β , u(x) = x1−β , x ∈ (0, 1).

m n Error Iterations

23 28 1.4379× 10−1

Gauss 24 29 1.0491× 10−1

25 210 5.8194× 10−2

23 28 1.4379× 10−2 48CGS 24 29 1.0491× 10−1 77

25 210 5.8194× 10−2 142

23 28 1.4379× 10−1 48FCGS 24 29 1.0491× 10−1 78

25 210 5.8194× 10−2 150

23 28 1.4379× 10−1 9PFCGS 24 29 1.0491× 10−1 13

25 210 5.8194× 10−2 16


Table: Numerical results on a uniform mesh

n Error Iterations CPUs

28 1.8827× 10−1 0.01s

Gauss 29 1.8206× 10−1 0.01s

210 1.7596× 10−1 0.05s

211 1.7002× 10−1 0.25s

212 1.6425× 10−1 1.25s

213 1.5867× 10−1 9.76s

214 1.5327× 10−1 97s

28 1.8827× 10−1 46 0.01s

CGS 29 1.8206× 10−1 66 0.01s

210 1.7596× 10−1 94 0.18s

211 1.7002× 10−1 133 0.86s

212 1.6425× 10−1 188 4.94s

213 1.5867× 10−1 266 30.78s

214 1.5327× 10−1 379 187s

28 1.8827× 10−1 46 0.05s

FCGS 29 1.8206× 10−1 66 0.16s

210 1.7596× 10−1 94 0.29s

211 1.7002× 10−1 133 1.16s

212 1.6425× 10−1 188 2.00s

213 1.5867× 10−1 266 12s

214 1.5327× 10−1 379 27s

28 1.8827× 10−1 8 0.02s

PFCGS 29 1.8206× 10−1 8 0.02s

210 1.7596× 10−1 9 0.05s

211 1.7002× 10−1 10 0.09s

212 1.6425× 10−1 10 0.14s

213 1.5867× 10−1 10 0.66s

214 1.5327× 10−1 11 1.00s


An FVM for a two-dimensional space-fractional FPDE

−Dx

(Kx(x, y)(γx 0I

αx + (1− γx) xI

α1 )Dxu

)−Dy

(Ky(x, y)(γy 0I

βy + (1− γy) yI

β1 )Dyu

)= f(x, y), (x, y) ∈ Ω,

u(x, y) = 0, (x, y) ∈ ∂Ω.

(10)

Ω = (0, 1)2 is a square domain,

Homogeneous Dirichlet boundary condition is assumed.

A = Ax + Ay =(Axl,j)Ny

l,j=1+(Ayl,j)Ny

l,j=1(11)


Structure of the stiffness matrix A

TheoremAx has a block tridiagonal structure, and each of its diagonal blocks has the form

Axl,l =hy

Γ(α+ 1)h1−αx

[diag(Kx,−

l )(γx D

(α,Nx)L,− + (1− γx) D

(α,Nx)R,−

)−diag(Kx,+

l )(γx D

(α,Nx)L,+ + (1− γx) D

(α,Nx)R,+

)], 1 ≤ l ≤ Ny,

Axl,l−1 =hy

Γ(α+ 1)h1−αx

[diag(Kx,−

l )(γx L

(α,Nx)L,− + (1− γx) L

(α,Nx)R,−

)−diag(Kx,+

l )(γx L

(α,Nx)L,+ + (1− γx) L

(α,Nx)R,+

)], 2 ≤ l ≤ Ny,

Axl,l+1 =hy

Γ(α+ 1)h1−αx

[diag(Kx,−

l )(γx U

(α,Nx)L,− + (1− γx) U

(α,Nx)R,−

)−diag(Kx,+

l )(γx U

(α,Nx)L,+ + (1− γx) U

(α,Nx)R,+

)], 1 ≤ l ≤ Ny − 1.


TheoremAy is a full block matrix, each of its blocks is tridiagonal. Furthermore, Ay has the form

Ay =hx

Γ(β + 1)h1−βy

diag(Ky,−k )

[(γyL

(β,Ny)

L,− + (1− γy)L(β,Ny)

R,−

)⊗ I(Nx)−

+(γyD

(β,Ny)

L,− + (1− γy)D(β,Ny−1)

R,−

)⊗ I(Nx)

+(γyU

(β,Ny)

L,− + (1− γy)U(β,Ny)

R,−

)⊗ I(Nx)

+

]− hx

Γ(β + 1)h1−βy

diag(Ky,+k )

[(γyL

(β,Ny)

L,+ + (1− γy)L(β,Ny)

R,+

)⊗ I(Nx)−

+(γyD

(β,Ny)

L,+ + (1− γy)D(β,Ny−1)

R,+

)⊗ I(Nx)

+(γyU

(β,Ny)

L,+ + (1− γy)U(β,Ny)

R,+

)⊗ I(Nx)

+

].

TheoremAv can be evaluated in O(N logN) operations, and A can be stored in O(N) memory.


An example experiment of a 2D FPDE by a preconditioned fast FVM

γx = γy = 0.5, α = β = 0.8, and Kx = Ky = 1.

The solution u(x, y) := 256x2(1− x)2y2(1− y)2.

The right-hand side is calculated accordingly.


Nx = Ny ‖uh − u‖L2 # of iter. CPUs

25 2.705367E-3 46sGauss 26 6.793973E-4 1h 2m

27 1.694831E-4 7d 17h28 out of memory

25 2.705367E-3 25 6.04sCGS 26 6.793973E-4 40 3m 5s

27 1.694831E-4 70 2h 34m28 out of memory

25 2.705367E-3 24 0.48sFCGS 26 6.793973E-4 37 1.53s

27 1.694831E-4 60 12s28 4.216027E-5 92 49s29 divergent

25 2.705367E-3 11 0.28sPFCGS 26 6.793973E-4 12 0.57s

27 1.694831E-4 13 2.95s28 4.216027E-5 16 9.74s29 1.047953E-5 18 54s210 2.605420E-6 21 4m 37s211 6.481977E-7 25 27m 32s212 1.610818E-7 33 1h 38m


Numerical results of a fast FVM on a 2D locally refined composite mesh

A fast FVM can be derived for an FPDE on a 2D locally refinedcomposite mesh, similarly to what we did in 1D.

mx = my nx = ny Error Iterations CPUs

28 1.1810× 10−1 56 1min 55s29 1.0652× 10−1 84 14 min 41 s

uniform 210 9.6038× 10−2 126 2 h 13 minmesh 211 8.6568× 10−2 188 1 day 4 h

22 25 1.1324× 10−1 21 5.2 s22 26 1.0211× 10−1 26 16 s

Locally 23 25 7.2598× 10−2 63 16 srefined 23 26 6.5423× 10−2 65 42 smesh 24 25 3.4491× 10−2 679 3 min 32 s

24 26 3.1085× 10−2 647 7 min 2 s


Table: Comparison of Gauss, CGS and FCGS

mx = my nx = ny Error Iterations CPUs

22 25 1.1324× 10−1 1 min 37 sGauss 23 26 6.5434× 10−2 2 h 16 min

24 27 2.8015× 10−2 16 days 13 h22 25 1.1324× 10−1 21 4 min 20 s

CGS 23 26 6.5409× 10−2 62 3 h 51 min24 27 3.8433× 10−2 505 12 days 13h22 25 1.1324× 10−1 21 5s

FCGS 23 26 6.5423× 10−2 65 41s24 27 2.8099× 10−2 607 17 min 45 s


A space-fractional FPDE on a two-dimensional convex domain

−Dx

(Kx(x, y)(γx a1(y)I

αx + (1− γx) xI

αb1(y))Dxu

)−Dy

(Ky(x, y)(γy a2(x)I

βy + (1− γy) yI

βb2(x))Dyu

)= f(x, y),

(x, y) ∈ Ωs,

u(x, y) = 0, (x, y) ∈ ∂Ωs.

(12)

For an FPDE on a two-dimensional convex domain Ωs,

the lower (or upper) limits of the left (or right) fractional integrals areno longer constant.Because of the nonlocal nature of the FPDEs and the variable limits ofthe fractional derivatives, the stiffness matrix of the correspondingFVM is not Toeplitz-like, in general.It is not clear how to develop a fast FVM in this case.

Assume that problem (12) can be extended to a rectangular domain

Ω := (a1, b1)× (a2, b2) ⊃ Ωs.


A volume-penalized fast FVM

A volume-penalized boundary-value problem of the FPDE on Ω is

−Dx

(Kx(x, y)(γx a1

Iαx + (1− γx) xIαb1

)Dxuη)

−Dy

(Ky(x, y)(γy a2

Iβy + (1− γy) yIβ

b2)Dyuη

)+

1− 1Ωs(x, y)

ηuη = f(x, y), (x, y) ∈ Ω,

u(x, y) = 0, (x, y) ∈ ∂Ω.

(13)

All the fractional derivatives are now defined on (the rectangular) Ω.Compared to its integer-order cousin, all (the limits of) the fractionalderivatives are changed!

limη→0+

uη(x, y) = 0, (x, y) ∈ Ω\Ωs.

The extended fractional derivatives are anticipated to converge to theoriginal fractional derivatives.The fast FVM developed for FPDEs on rectangular domains can apply!


Numerical results of a fast FVM on a unit disk Ωs = (x, y) : 1− x2 − y2 > 0

Kx = Ky = 0.005, γx = γy = 0.5, u(x, y) = (1− x2 − y2)2;

f computed accordingly, Ω = (−1, 1)× (−1, 1).

We measure the L2 errors of the numerical solutions in Ωs and use linearregression to fit the convergence rates

‖uh − u‖L2(Ωs) ≤Mhκ,

We measure the L2 norms of the numerical solutions in Ω\Ωs and fit theconvergence rates

‖uh‖L2(Ω\Ωs) ≤Mhκ.

We present the number of iterations and the CPU time consumed by thefast FVM.


α = β = 0.1 Nx = Ny ‖uh − u‖L2(Ωs)‖uh‖L2(Ω\Ωs)

# of iter. CPUs

24 5.105588E-3 4.611015E-4 27 0.23s

η = 20 25 1.165746E-3 2.779546E-4 50 0.79s

26 3.326281E-4 9.079025E-5 101 3.24s

27 9.642932E-5 1.405125E-5 160 16s

28 2.508560E-5 7.478186E-6 307 1m 12s

29 6.907164E-6 2.467309E-6 592 14m 17sconv. rate M = 50, κ = 1.88 M = 50, κ = 1.60

24 5.106374E-3 4.589053E-4 31 0.25s

η = 1 25 1.166439E-3 2.731328E-4 50 0.79s

26 3.317735E-4 8.890816E-5 100 3.31s

27 9.640066E-5 1.396566E-5 185 18s

28 2.500944E-5 7.381211E-6 308 1m 12s


24 5.169564E-3 3.297783E-4 29 0.38s

η = 0.1 25 1.209662E-3 1.567743E-4 56 1.51s

26 3.178022E-4 4.382610E-5 93 5.99s

27 9.424197E-5 9.725284E-6 169 20s

28 2.327820E-5 4.231561E-6 325 2m 7s


24 5.317458E-3 1.289428E-5 225 2.23s

η = 0.01 25 1.299251E-3 1.360030E-5 191 3.89s

26 3.271512E-4 5.886561E-6 189 7.94s

27 9.106917E-5 2.067034E-6 191 25s

28 2.310850E-5 8.687672E-7 350 1m 57s



α = β = 0.9 Nx = Ny ‖uh − u‖L2(Ωs)‖uh‖L2(Ω\Ωs)

# of iter. CPUs

24 9.362172E-3 1.362098E-3 11 0.12s

η = 20 25 2.519421E-3 2.851351E-4 16 0.28s

26 6.595694E-4 5.736933E-5 22 1.09s

27 1.692336E-4 1.094364E-5 33 3.28s

28 4.309593E-5 2.227257E-6 49 11s


24 9.340856E-3 1.006460E-3 12 0.13s

η = 1 25 2.508966E-3 2.386773E-4 18 0.34s

26 6.569196E-4 5.093327E-5 23 0.78s

27 1.687507E-4 1.019236E-5 34 4.22s

28 4.299552E-5 2.116632E-6 49 11.50s


24 9.316244E-3 4.396209E-5 50 0.34s

η = 0.1 25 2.482261E-3 2.186055E-5 82 1.30s

26 6.471300E-4 7.664450E-6 84 3.05s

27 1.661195E-4 2.535404E-6 75 7.22s

28 4.232844E-5 7.805887E-7 77 18s


24 9.315837E-3 4.568987E-7 120 0.95s

η = 0.01 25 2.480667E-3 2.483698E-7 961 15s

26 6.461560E-4 9.835563E-8 330 11s

27 1.656546E-4 3.973004E-8 380 40s

28 4.213543E-5 1.675305E-8 447 2m 1s



2D-Conservative FPDEs (Ervin & Roop 2007; Meerschaert et al 2006)

−∫ 2π

0

(Dθ K IβθDθu(x, y)

)P (dθ) = f(x, y), in Ω ⊂ R2,

u = 0, on ∂Ω.

(14)

P (dθ) is a probability measure on [0, 2π),

Dθ is the differential operator in the direction of θ

Dθu(x, y) :=(

cos θ∂

∂x+ sin θ

∂

∂y

)u(x, y),

and Iβθ , with 0 < β < 1, represents the βth order fractional integraloperator in the direction of θ given by

Iβθ u(x, y) :=

∫ ∞0

sβ−1

Γ(β)u(x− s cos θ, y − s sin θ)ds.

If P (dθ) is atomic with atoms 0, π/2, π, 3π/2, then (14) reduces tothe usual coordinate form.


A Galerkin weak formulation and its well-posedness (Ervin & Roop 2007)

Galerkin formulation: given f ∈ H−(1−β/2)(Ω), seek u ∈ H1−β/20 (Ω)

B(u, v) :=

∫ 2π

0

[ ∫Ω

K IβθDθu Dθvdxdy]P (dθ) = 〈f, v〉,

∀ v ∈ H1−β/20 (Ω).

(15)

Theorem

B(·, ·) is coercive and continuous on H1−β/20 (Ω)×H1−β/2

0 (Ω). Hence, theGalerkin weak formulation (15) has a unique solution. Moreover,

‖u‖H1−β/2(Ω) ≤ C‖f‖H−(1−β/2)(Ω).


A Galerkin finite element method

Let h1 := 1/(N1 + 1), h2 := 1/(N2 + 1), xi := ih1, and yj := jh2.

Let ψ(ξ) = 1− |ξ| for ξ ∈ [−1, 1] and 0 elsewhere. Let

φi,j(x, y) := ψ

(x− xih1

)ψ

(y − yjh2

), 1 ≤ i ≤ N1, 2 ≤ j ≤ N2,

uh(x, y) =

N2∑j′=1

N1∑i′=1

ui′,j′φi′,j′(x, y), (x, y) ∈ Ω.

A bilinear finite element scheme for i = 1, . . . , N1 and j = 1, . . . , N2

N2∑j′=1

N1∑i′=1

B(φi′,j′ , φi,j

)ui′,j′ =

(f, φi,j

)L2 =: fi,j . (16)


A matrix form of the finite element scheme

Let N := N1N2, A =[Am,n

]Nm,n=1

, and

u :=[u1,1, . . . , uN1,1, u1,2, . . . , uN1,2, . . . , u1,N2 , . . . , uN1,N2

]T,

f :=[f1,1, . . . , fN1,1, f1,2, . . . , fN1,2, . . . , f1,N2

, . . . , fN1,N2

]TLet Am,n := B

(φi′,j′ , φi,j

)with

m = (j − 1)N1 + i, 1 ≤ i ≤ N1, 1 ≤ j ≤ N2,

n = (j′ − 1)N1 + i′, 1 ≤ i′ ≤ N1, 1 ≤ j′ ≤ N2.(17)

The finite element scheme (16) can be expressed in a matrix form

Au = f. (18)


Features of the finite element scheme

Features of numerical methods for coordinate-form FPDEs

A is dense, the number of nonzero entries at each row = O(N1 +N2),which →∞ as N →∞.The number of nonzero entries at each row divided by the total numberof the entries at the same row = O((N1 +N2)/N) = O(N−1/2).A has a tensor produce structure.

Features of the finite element method for full FPDEs

A is full.A has a complicated structure, as it couples the nodes in all thedirections!It does not seem feasible to explore a tensor-produce structure of A.We instead explore the translation invariance property of A.


Translation invariant structure of A

Theorem

Let the indices (i1, j1), (i′1, j′1), (i2, j2), and (i′2, j

′2) be related by

i′1 − i1 = i′2 − i2, j′1 − j1 = j′2 − j2. (19)

Then the following translation-invariance property holds∫ 2π

0

[ ∫Ω

K D−βθ Dθφi′1,j′1(x, y)Dθφi1,j1(x, y)dxdy]P (dθ)

=

∫ 2π

0

[ ∫Ω

K D−βθ Dθφi′2,j′2(x, y)Dθφi2,j2(x, y)dxdy]P (dθ).

(20)


Figure: Illustration of the translation invariance

Ωi1, j

1

Ωi2, j

2

Ωi1′, j

1′

Ωi2′, j

2′

(ξ, η)

(x, y)(ξ′, η′)

(x′, y′)

s1

s2

s1

s2


Theorem

The stiffness matrix A is an N2-by-N2 block-Toeplitz matrix

A =

T0 T1 . . . TN2−2 TN2−1

T−1 T0 T1

. . . TN2−2

.... . .

. . .. . .

...

T2−N2

. . . T−1 T0 T1

T1−N2 T2−N2 . . . T−1 T0

, (21)

Each block Tj is an N1-by-N1 Toeplitz matrix

Tj =

t0,j t1,j . . . tN1−2,j tN1−1,j

t−1,j t0,j t1,j. . . tN1−2,j

.... . .

. . .. . .

...

t2−N1,j

. . . t−1,j t0,j t1,jt1−N1,j t2−N1,j . . . t−1,j t0,j

. (22)


Impact of the theorem

Av can be evaluated in O(N logN) operations, by embedded into a4N -by-4N block-circulant-circulant-block matrix.

For coordinate FPDEs, Ay is block-Toeplitz-circulant-block that can beembedded into a 2N -by-2N block-circulant-circulant-block matrix.

A is generated by O(N) parameters.

A requires only O(N) memory to store.Unlike finite difference methods, the evaluation of A is very expensive.Only O(N) (in contrast to N2) entries of A need to be evaluated, asignificant reduction of CPU time.

A block-circulant-circulant-block preconditioner can be developed.


Numerical experiments

A 4-point (2 points in x or y) Gauss-Legendre quadrature is used toevaluate entries of A and the right-hand side

The finite element scheme is solved by the fast congugate gradientsquared (FCGS), the preconditioned fast CGS (PFCGS), and Gaussianelimination (Gauss) solvers.

These solvers were implemented using Compaq Visual Fortran 6.6 ona ThinkPad T410 Laptop.


An example run for a coordinate FPDE

β = 0.5, Ki := 1 + sin 2θi for i = 1, 2, 3, 4.

u = x2(1− x)2y2(1− y)2, f is calculated accordingly.

Table: The convergence rates of the Gauss, FCGS, and PFCGS solutions

Gauss FCGS PFCGS

N1=N2 ‖u− uh‖L2(Ω) ‖u− uh‖L2(Ω) ‖u− uh‖L2(Ω) Conv. Rate

23 3.487× 10−5 3.487× 10−5 3.487× 10−5

24 8.876× 10−6 8.876× 10−6 8.876× 10−6 1.9725 2.097× 10−6 2.097× 10−6 2.097× 10−6 2.0826 4.759× 10−7 4.759× 10−7 4.759× 10−7 2.1427 N/A 1.055× 10−7 1.056× 10−7 2.1728 N/A 2.307× 10−8 2.311× 10−8 2.1929 N/A 4.999× 10−9 5.003× 10−9 2.21210 N/A 1.079× 10−9 1.078× 10−9 2.21


Table: The CPU time of the FCGS, PFCGS, and Gauss

full A O(N) entries Gauss FCGS PFCGS

N1=N2 CPU CPU CPU CPU Itr. # CPU Itr. #

23 0.91s 0.05s 0.00s 0.00s 5 0.00s 424 14s 0.20s 0.05s 0.00s 9 0.00s 625 3m47s 0.83s 19s 0.05s 15 0.05s 726 1h2m 3.48s 25m6s 0.45s 28 0.19s 1027 N/A 14s N/A 3.44s 52 0.94s 1128 N/A 55s N/A 35s 94 6.73s 1529 N/A 3m37s N/A 4m49s 170 44s 21210 N/A 14m39s N/A 35m43s 300 4m13s 29


Thank You

for Your Attention!


fast finite volume methods for fpdes - brown universityhong wang, university of south carolina...

Documents