fast finite volume methods for fpdes - brown universityhong wang, university of south carolina...
TRANSCRIPT
Fast finite volume methods for FPDEs
MURI Webinar, May 16 2016
Hong Wang
Department of Mathematics, University of South Carolina
Partially supported by ARO MURI grant W911NF-15-1-0562and NSF grant DMS-1216923
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 1 / 46
Acknowledgements
T. Basu, Occidental College
H. Chen, Shandong Normal University
A. Cheng, N. Du & X. Zhang, Shandong University
W. Cheung, X. Guo, C. Wang & S. Yang, University of South Carolina
H. Fu, China University of Petroleum
J. Jia, Fudan University
Z. Li, D. Yang & S. Zhu, East China Normal University
Y. Ren, Qilu University of Technology
H. Tian, Ocean University of China
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 2 / 46
Conservative FDEs (del-Castillo-Negrete et al. 2004; Ervin & Roop 2005)
−D(K(x)
(θ C,l
0 D1−βx u− (1− θ) C,rx D1−β
1 u))
= f(x), x ∈ (0, 1),
u(0) = ul, u(1) = ur, 0 < β < 1, 0 ≤ θ ≤ 1.(1)
derived from a local mass balance + a fractional Fick’s law.
θ is the weight of forward versus backward transition probability.
The left- and right-fractional integrals, Caputo and Riemann-Liouvillefractional derivatives are defined by
0Iβxu(x) = 0D
−βx u(x) :=
∫ x
0
(x− s)β−1u(s)
Γ(β)ds,
xIβ1 u(x) = xD
−β1 u(x) :=
∫ 1
x
(s− x)β−1u(s)
Γ(β)ds,
C,l0 D1−β
x u := 0IβxDu,
C,rx D1−β
1 u := −xIβ1Du,R,l0 D1−β
x u := D 0Iβxu,
C,rx D1−β
1 u := −D xIβ1 u.
(2)
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 3 / 46
A finite volume method (FVM) for conservative FDE (1) with ul = ur = 0
Conservative and non-conservative FDEs are not equivalent.
Finite element/volume methods are suited for conservative FDEs.
Finite difference methods are suited for nonconservative FDEs.
In many applications, local mass conservation is crucial.
A finite-volume scheme naturally has second-order accuracy in space,without a Richardson extrapolation as in finite difference methods.
Let u =∑Nj=1 ujφj , u := [u1, u2, . . . , uN ]T , f := [f1, f2, . . . , fN ]T ,
A := [Ai,j ]Ni,j=1. Integrating (1) over (xi− 1
2, xi+ 1
2) yields
Au = f, fi :=
∫ xi+1/2
xi−1/2
f(x)dx, 1 ≤ i, j ≤ N.
Ai,j :=[K(x)
(θ C,l
0 D1−βx u− (1− θ) C,rx D1−β
1 u)]x=xi−1/2
x=xi+1/2
.
(3)
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 4 / 46
An efficient storage of A and a fast matrix-vector multiplication of Av
Theorem
A = γ(β)(K− T
β,NL +K+ T β,NR
), K± := diag
(K(xi± 1
2
)Ni=1
)(4)
where T β,NL and T β,NR are Toeplitz matrices of order N .
K(x) appears inside the (first-order) derivative in (1), but (4) and sooptimal storage and fast matrix-vector mutplication still hold.
For problem (1), the condition number κ(A) = O(h−(2−β)).
The number of Krylov iterations is O(h−(1−β/2)) = O(N1−β/2), leadingto an overall computational complexity of O(N2−β/2 logN).
This calls for an effective and efficient preconditioner.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 5 / 46
A preconditioned iterative solver for (1) with θ = 1/2 (W. & Du 2013)
Theorem
M := T β,NL + T β,NR is a symmetric and positive-definite, Toeplitz matrix.
Outline of (a perburbation) proof: Let K0 := diag(K(xi)Ni=1
).
γ(β)−1K−10 A
= K−10 K− T
β,NL +K−1
0 K+ T β,NR
= K−10
[K0 + (K− −K0)
]T β,NL +K−1
0
[K0 + (K+ −K0)
]T β,NL
= M +K−10
[(K− −K0)T β,NL + (K+ −K0)T β,NR
]= M +O(h).
(5)
M is a good preconditioner for the finite volume scheme (3)(K−1
0 K− Tβ,NL +K−1
0 K+ T β,NR
)u = γ(β)−1K−1
0 f. (6)
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 6 / 46
An example run by a preconditioned fast FVM
The data in (1):β = 0.2, θ = 0.5, K(x) = Γ(1.2)(1 + x), ul = ur = 0.
The true solution u(x) = x2(1− x)2, f is computed accordingly
Gauss CGSN ‖u− uG‖L∞ CPU(s) ‖u− uC‖L∞ CPU(s) Itr. #
25 2.018× 10−4 0.000 2.018× 10−4 0.000 32
26 5.157× 10−5 0.000 5.157× 10−5 0.000 65
27 1.294× 10−5 0.000 1.294× 10−5 0.016 128
28 3.214× 10−6 0.047 3.214× 10−6 0.141 217
29 7.893× 10−7 0.500 7.893× 10−7 3.359 599
210 1.887× 10−7 7.797 1.886× 10−7 2 m 2 s 1,110
211 4.030× 10−8 2 m 38 s 4.047× 10−8 21 m 13 s 2,624
212 6.227× 10−9 24 m 29 s 7.468× 10−8 4 h 19 m 7,576
213 5.783× 10−9 3 h 27 m N/A > 2 days > 20,000FCGS PFCGS
‖u− uF ‖L∞ CPU(s) Itr. # ‖u− uS‖L∞ CPU(s) Itr. #
25 2.018× 10−4 0.000 32 2.018× 10−4 0.000 6
26 5.157× 10−5 0.016 63 5.157× 10−5 0.000 5
27 1.294× 10−5 0.031 128 1.294× 10−5 0.000 5
28 3.214× 10−6 0.125 248 3.214× 10−6 0.006 5
29 7.893× 10−7 0.578 576 7.893× 10−7 0.016 5
210 1.886× 10−7 2.281 1,078 1.887× 10−7 0.047 5
211 4.037× 10−8 9.953 1,997 4.038× 10−8 0.078 5
212 1.587× 10−8 57.27 5,130 6.194× 10−9 0.188 5
213 2.372× 10−8 2 m 52 s 7,410 4.345× 10−9 0.391 5
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 7 / 46
Observations
Use the numerical solutions by Gaussian elimination as a benchmark:
The conjugate gradient squared (CGS) method diverges, due tosignificant amount of round-off errors.The fast CGS (FCGS) reduced the CPU time significantly, as theoperations for each iteration is reduced from O(N2) to O(N logN).
The number of iterations is still O(N1−β/2),It is less accurate than Gaussian at fine meshes due to round-off errors.
The preconditioner M is optimal, so the preconditioned FCGS(PFCGS) has an overall computational cost of O(N log2N).
It significantly reduces round-off errors.It generates more accurate solutions than Gaussian elimination.It further reduces CPU time.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 8 / 46
Regularity of the boundary-value problem of FDEs
Error estimates were proved for numerical methods for FDEs, underthe assumption that the true solution is smooth.
For integer-order elliptic or parabolic PDEs, smooth data (anddomain for multi-D problem) =⇒ smooth solution.
u(x) = (x2−β − x1−β)/Γ(3− β) /∈W 1,1/β(0, 1) is the solution of
D(
0D−βx Du
)= 1, x ∈ (0, 1), u(0) = u(1) = 0 (7)
In particular, u /∈ H1(0, 1) for 1/2 ≤ β ≤ 1.
For FDEs smooth data does not ensure smooth solutions
No conditions in the literature to ensure smooth solutions to FDEs.The Nitsche-lifting based proof of optimal-order L2 error estimates inthe literature does not hold even for constant K > 0.What conditions ensures that high-order methods =⇒ high-orderconvergence rates?Solutions may have boundary layers and other singularity, which needto be resolved numerically.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 9 / 46
An FVM on a gridded mesh (Jia et al., 2014)
Solutions to FDEs with smooth data and domain may have boundarylayers, a uniform mesh is not effective.
Finite-difference methods out of the question, as Grunwald-Letnikovderivatives are inherently defined on uniform meshes.Riemann-Liouville and Caputo derivatives offer such flexibilities.
Bebause of the nonlocal nature of FDEs, a numerical schemediscretized on an arbitrarily adaptively refined mesh
offers great flexbility and effective approximation propertyoffers possible advantage on its theoretical analysisdestroys the structure of its stiffness matrix and so efficiency.
Motivation: balancing flexibility and efficiency.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 10 / 46
The structure of the stiffness matrix
We assume a geometrically refined mesh towards the left endpoint.
Theorem
The matrix A can be decomposed as
A =1
Γ(β + 1)
[diag(K−)
(γQl + (1− γ)Qr
)−diag(K+)
(γPl + (1− γ)Pr
)]diag
(hβ−1
i mi=1
).
Pl, Pr, Ql and Qr are Toeplitz.A has an additional diagonal matrix (reflecting the impact ofthe mesh sizes) multiplier to that on the uniform mesh.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 11 / 46
Numerical experiments of a one-sided FDE on a gridded mesh
Consider (1) with K = 1, f = 0, β = 0.98, θ = 1, ul = 0, ur = 1, i.e.,
D(
0D−βx Du
)= 0, x ∈ (0, 1),
u(0) = 0, u(1) = 1
Its solution u(x) = x1−β for x ∈ (0, 1).
N CPU #of iterations
Gauss 256 0.640s512 5.567s1024 59s
CGS 256 2.978s 256512 29s 512
1024 403s 1024
FCGS 256 0.073s 256512 0.139s 512
1024 0.391s 1024
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 12 / 46
Figure: First row: numerical solutions on a uniform mesh of n = 256, 512, 1024;Second row: numerical solutions on a geometrically refined mesh n = 48, 64, 96.
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
numerical solutionexact solution
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
numerical solutionexact solution
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
numerical solutionexact solution
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
numerical solutionexact solution
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
numerical solutionexact solution
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
numerical solutionexact solution
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 13 / 46
An FVM on a locally refined composite mesh (Jia & W. 2016)
Solutions to FDEs with smooth data and domain may have boundarylayers. Numerical solution of FDEs
with a uniform mesh is not effective.with a gridded mesh may resolve the boundary layers, but does notnecessarily provide an accurate global approximation.
We propose to use a composite mesh that consists of
a uniform mesh in most of the domain,a gridded mesh in the cells near the (left) boundary.
The key issue is the structure of the stiffness matrix:
A =
[Al,l Al,rAr,l Ar,r
].
(8)
Ar,r, corresponding to the uniform mesh, has a Toeplitz-like structure.Al,l, corresponding to the gridded mesh, has a Toeplitz-like structurewith an extra right diagonal multiplier.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 14 / 46
The structure of the off-diagonal submatrices in the stiffness matrix
The off-diagonal submatrices Al,r and Ar,l
are full due to the nonlocal nature of FDEs,are not Toeplitz-like.
Theorem
Al,r =(1− γ)hβ−1
Γ(β + 1)
(diag(K−l )E − diag(K+
l )D),
Ar,l =γ
Γ(β + 1)(diag(K−r )H − diag(K+
r )G)diag(hβ−1i mi=1).
The typical entries of D and E are of the form
di,j = 2(j + 1− 3 · 2i−m−1)β − (j − 3 · 2i−m−1)β − (j + 2− 3 · 2i−m−1)β ,
gi,j =[2m−j+1
(i+
3
2
)− 1]β− 3
2
[2m−j+1
(i+
3
2
)− 2]β
+1
2
[2m−j+1(i+
3
2
)− 4]β.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 15 / 46
Use a fractional binomial expansion, we have
D ≈ −2
(β
2
)[1, 1, . . . , 1]T
[ 1
22−β ,1
32−β , . . . ,1
(n− 1)2−β
]−2
(β
4
)[1, 1, . . . , 1]T
[ 1
24−β ,1
34−β , . . . ,1
(n− 1)4−β
]+18
(β
3
)[2−m, 2−m+1, . . . , 2−1]T
[ 1
23−β ,1
33−β , . . . ,1
(n− 1)3−β
]−108
(β
4
)[2−2m, 2−2m+2, . . . , 2−2]T
[ 1
24−β ,1
34−β , . . . ,1
(n− 1)4−β
].
The matrices can be approximated by a finite sum of low-rank matrices.The matrix-vector multiplication can be performed in O(N) operations.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 16 / 46
A block-diagonal preconditioner
We developed a preconditioner based on T. Chan’s circulantpreconditioner Cn, which minimizes ‖A− Cn‖F over all circulantmatrices.
We define a block-diagonal-circulant-block preconditioner M for A
M :=
[M1 00 M2
](9)
M1 is a preconditioner for Al,lM2 is a preconditioner for Ar,r
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 17 / 46
Numerical experiments of a one-sided FDE on a composite mesh
Consider (1) with K = 1, f = 0, θ = 1, β = 0.9, ul = 0, ur = 1, i.e.,
D(
0D−βx Du
)= 0, x ∈ (0, 1),
u(0) = 0, u(1) = 1
Its solution u(x) = x1−β for x ∈ (0, 1).
n ‖un − u‖ ‖un,m − u‖ ‖un,m − u‖128 4.3546× 10−1 2.6805× 10−1, m = 7 2.0315× 10−1, m = 11256 4.0630× 10−1 2.3336× 10−1, m = 8 1.3403× 10−1, m = 16512 3.7909× 10−1 2.0315× 10−1, m = 9 8.2504× 10−2, m = 22
1024 3.5370× 10−1 1.7685× 10−1, m = 10 3.8488× 10−2, m = 328192 2.8730× 10−1 1.6668× 10−1, m = 13 N/A
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 18 / 46
Figure: First row: numerical solutions on a uniform mesh of n=256, 8192;Second row: numer. solns. on a composite mesh with n = 256 and m = 8, 16.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
numerical solution
exact solution
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
numerical solution
exact solution
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
numerical solution
exact solution
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
numerical solution
exact solution
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 19 / 46
Numerical experiments of a two-sided FDE on a locally refined composite mesh
Consider (1) with K = 1, θ = 0.5, β = 0.95, ul = 0, ur = 1,
f(x) =(1− γ)(1− β)
Γ(β)x(1− x)1−β , u(x) = x1−β , x ∈ (0, 1).
m n Error Iterations
23 28 1.4379× 10−1
Gauss 24 29 1.0491× 10−1
25 210 5.8194× 10−2
23 28 1.4379× 10−2 48CGS 24 29 1.0491× 10−1 77
25 210 5.8194× 10−2 142
23 28 1.4379× 10−1 48FCGS 24 29 1.0491× 10−1 78
25 210 5.8194× 10−2 150
23 28 1.4379× 10−1 9PFCGS 24 29 1.0491× 10−1 13
25 210 5.8194× 10−2 16
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 20 / 46
Table: Numerical results on a uniform mesh
n Error Iterations CPUs
28 1.8827× 10−1 0.01s
Gauss 29 1.8206× 10−1 0.01s
210 1.7596× 10−1 0.05s
211 1.7002× 10−1 0.25s
212 1.6425× 10−1 1.25s
213 1.5867× 10−1 9.76s
214 1.5327× 10−1 97s
28 1.8827× 10−1 46 0.01s
CGS 29 1.8206× 10−1 66 0.01s
210 1.7596× 10−1 94 0.18s
211 1.7002× 10−1 133 0.86s
212 1.6425× 10−1 188 4.94s
213 1.5867× 10−1 266 30.78s
214 1.5327× 10−1 379 187s
28 1.8827× 10−1 46 0.05s
FCGS 29 1.8206× 10−1 66 0.16s
210 1.7596× 10−1 94 0.29s
211 1.7002× 10−1 133 1.16s
212 1.6425× 10−1 188 2.00s
213 1.5867× 10−1 266 12s
214 1.5327× 10−1 379 27s
28 1.8827× 10−1 8 0.02s
PFCGS 29 1.8206× 10−1 8 0.02s
210 1.7596× 10−1 9 0.05s
211 1.7002× 10−1 10 0.09s
212 1.6425× 10−1 10 0.14s
213 1.5867× 10−1 10 0.66s
214 1.5327× 10−1 11 1.00s
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 21 / 46
An FVM for a two-dimensional space-fractional FPDE
−Dx
(Kx(x, y)(γx 0I
αx + (1− γx) xI
α1 )Dxu
)−Dy
(Ky(x, y)(γy 0I
βy + (1− γy) yI
β1 )Dyu
)= f(x, y), (x, y) ∈ Ω,
u(x, y) = 0, (x, y) ∈ ∂Ω.
(10)
Ω = (0, 1)2 is a square domain,
Homogeneous Dirichlet boundary condition is assumed.
A = Ax + Ay =(Axl,j)Ny
l,j=1+(Ayl,j)Ny
l,j=1(11)
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 22 / 46
Structure of the stiffness matrix A
TheoremAx has a block tridiagonal structure, and each of its diagonal blocks has the form
Axl,l =hy
Γ(α+ 1)h1−αx
[diag(Kx,−
l )(γx D
(α,Nx)L,− + (1− γx) D
(α,Nx)R,−
)−diag(Kx,+
l )(γx D
(α,Nx)L,+ + (1− γx) D
(α,Nx)R,+
)], 1 ≤ l ≤ Ny,
Axl,l−1 =hy
Γ(α+ 1)h1−αx
[diag(Kx,−
l )(γx L
(α,Nx)L,− + (1− γx) L
(α,Nx)R,−
)−diag(Kx,+
l )(γx L
(α,Nx)L,+ + (1− γx) L
(α,Nx)R,+
)], 2 ≤ l ≤ Ny,
Axl,l+1 =hy
Γ(α+ 1)h1−αx
[diag(Kx,−
l )(γx U
(α,Nx)L,− + (1− γx) U
(α,Nx)R,−
)−diag(Kx,+
l )(γx U
(α,Nx)L,+ + (1− γx) U
(α,Nx)R,+
)], 1 ≤ l ≤ Ny − 1.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 23 / 46
TheoremAy is a full block matrix, each of its blocks is tridiagonal. Furthermore, Ay has the form
Ay =hx
Γ(β + 1)h1−βy
diag(Ky,−k )
[(γyL
(β,Ny)
L,− + (1− γy)L(β,Ny)
R,−
)⊗ I(Nx)−
+(γyD
(β,Ny)
L,− + (1− γy)D(β,Ny−1)
R,−
)⊗ I(Nx)
+(γyU
(β,Ny)
L,− + (1− γy)U(β,Ny)
R,−
)⊗ I(Nx)
+
]− hx
Γ(β + 1)h1−βy
diag(Ky,+k )
[(γyL
(β,Ny)
L,+ + (1− γy)L(β,Ny)
R,+
)⊗ I(Nx)−
+(γyD
(β,Ny)
L,+ + (1− γy)D(β,Ny−1)
R,+
)⊗ I(Nx)
+(γyU
(β,Ny)
L,+ + (1− γy)U(β,Ny)
R,+
)⊗ I(Nx)
+
].
TheoremAv can be evaluated in O(N logN) operations, and A can be stored in O(N) memory.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 24 / 46
An example experiment of a 2D FPDE by a preconditioned fast FVM
γx = γy = 0.5, α = β = 0.8, and Kx = Ky = 1.
The solution u(x, y) := 256x2(1− x)2y2(1− y)2.
The right-hand side is calculated accordingly.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 25 / 46
Nx = Ny ‖uh − u‖L2 # of iter. CPUs
25 2.705367E-3 46sGauss 26 6.793973E-4 1h 2m
27 1.694831E-4 7d 17h28 out of memory
25 2.705367E-3 25 6.04sCGS 26 6.793973E-4 40 3m 5s
27 1.694831E-4 70 2h 34m28 out of memory
25 2.705367E-3 24 0.48sFCGS 26 6.793973E-4 37 1.53s
27 1.694831E-4 60 12s28 4.216027E-5 92 49s29 divergent
25 2.705367E-3 11 0.28sPFCGS 26 6.793973E-4 12 0.57s
27 1.694831E-4 13 2.95s28 4.216027E-5 16 9.74s29 1.047953E-5 18 54s210 2.605420E-6 21 4m 37s211 6.481977E-7 25 27m 32s212 1.610818E-7 33 1h 38m
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 26 / 46
Numerical results of a fast FVM on a 2D locally refined composite mesh
A fast FVM can be derived for an FPDE on a 2D locally refinedcomposite mesh, similarly to what we did in 1D.
mx = my nx = ny Error Iterations CPUs
28 1.1810× 10−1 56 1min 55s29 1.0652× 10−1 84 14 min 41 s
uniform 210 9.6038× 10−2 126 2 h 13 minmesh 211 8.6568× 10−2 188 1 day 4 h
22 25 1.1324× 10−1 21 5.2 s22 26 1.0211× 10−1 26 16 s
Locally 23 25 7.2598× 10−2 63 16 srefined 23 26 6.5423× 10−2 65 42 smesh 24 25 3.4491× 10−2 679 3 min 32 s
24 26 3.1085× 10−2 647 7 min 2 s
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 27 / 46
Table: Comparison of Gauss, CGS and FCGS
mx = my nx = ny Error Iterations CPUs
22 25 1.1324× 10−1 1 min 37 sGauss 23 26 6.5434× 10−2 2 h 16 min
24 27 2.8015× 10−2 16 days 13 h22 25 1.1324× 10−1 21 4 min 20 s
CGS 23 26 6.5409× 10−2 62 3 h 51 min24 27 3.8433× 10−2 505 12 days 13h22 25 1.1324× 10−1 21 5s
FCGS 23 26 6.5423× 10−2 65 41s24 27 2.8099× 10−2 607 17 min 45 s
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 28 / 46
A space-fractional FPDE on a two-dimensional convex domain
−Dx
(Kx(x, y)(γx a1(y)I
αx + (1− γx) xI
αb1(y))Dxu
)−Dy
(Ky(x, y)(γy a2(x)I
βy + (1− γy) yI
βb2(x))Dyu
)= f(x, y),
(x, y) ∈ Ωs,
u(x, y) = 0, (x, y) ∈ ∂Ωs.
(12)
For an FPDE on a two-dimensional convex domain Ωs,
the lower (or upper) limits of the left (or right) fractional integrals areno longer constant.Because of the nonlocal nature of the FPDEs and the variable limits ofthe fractional derivatives, the stiffness matrix of the correspondingFVM is not Toeplitz-like, in general.It is not clear how to develop a fast FVM in this case.
Assume that problem (12) can be extended to a rectangular domain
Ω := (a1, b1)× (a2, b2) ⊃ Ωs.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 29 / 46
A volume-penalized fast FVM
A volume-penalized boundary-value problem of the FPDE on Ω is
−Dx
(Kx(x, y)(γx a1
Iαx + (1− γx) xIαb1
)Dxuη)
−Dy
(Ky(x, y)(γy a2
Iβy + (1− γy) yIβ
b2)Dyuη
)+
1− 1Ωs(x, y)
ηuη = f(x, y), (x, y) ∈ Ω,
u(x, y) = 0, (x, y) ∈ ∂Ω.
(13)
All the fractional derivatives are now defined on (the rectangular) Ω.Compared to its integer-order cousin, all (the limits of) the fractionalderivatives are changed!
limη→0+
uη(x, y) = 0, (x, y) ∈ Ω\Ωs.
The extended fractional derivatives are anticipated to converge to theoriginal fractional derivatives.The fast FVM developed for FPDEs on rectangular domains can apply!
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 30 / 46
Numerical results of a fast FVM on a unit disk Ωs = (x, y) : 1− x2 − y2 > 0
Kx = Ky = 0.005, γx = γy = 0.5, u(x, y) = (1− x2 − y2)2;
f computed accordingly, Ω = (−1, 1)× (−1, 1).
We measure the L2 errors of the numerical solutions in Ωs and use linearregression to fit the convergence rates
‖uh − u‖L2(Ωs) ≤Mhκ,
We measure the L2 norms of the numerical solutions in Ω\Ωs and fit theconvergence rates
‖uh‖L2(Ω\Ωs) ≤Mhκ.
We present the number of iterations and the CPU time consumed by thefast FVM.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 31 / 46
α = β = 0.1 Nx = Ny ‖uh − u‖L2(Ωs)‖uh‖L2(Ω\Ωs)
# of iter. CPUs
24 5.105588E-3 4.611015E-4 27 0.23s
η = 20 25 1.165746E-3 2.779546E-4 50 0.79s
26 3.326281E-4 9.079025E-5 101 3.24s
27 9.642932E-5 1.405125E-5 160 16s
28 2.508560E-5 7.478186E-6 307 1m 12s
29 6.907164E-6 2.467309E-6 592 14m 17sconv. rate M = 50, κ = 1.88 M = 50, κ = 1.60
24 5.106374E-3 4.589053E-4 31 0.25s
η = 1 25 1.166439E-3 2.731328E-4 50 0.79s
26 3.317735E-4 8.890816E-5 100 3.31s
27 9.640066E-5 1.396566E-5 185 18s
28 2.500944E-5 7.381211E-6 308 1m 12s
29 6.865761E-6 2.408850E-6 591 16m 8sconv. rate M = 50, κ = 1.89 M = 50, κ = 1.60
24 5.169564E-3 3.297783E-4 29 0.38s
η = 0.1 25 1.209662E-3 1.567743E-4 56 1.51s
26 3.178022E-4 4.382610E-5 93 5.99s
27 9.424197E-5 9.725284E-6 169 20s
28 2.327820E-5 4.231561E-6 325 2m 7s
29 6.149905E-6 1.130637E-6 625 22m 12sconv. rate M = 50, κ = 1.93 M = 50, κ = 1.68
24 5.317458E-3 1.289428E-5 225 2.23s
η = 0.01 25 1.299251E-3 1.360030E-5 191 3.89s
26 3.271512E-4 5.886561E-6 189 7.94s
27 9.106917E-5 2.067034E-6 191 25s
28 2.310850E-5 8.687672E-7 350 1m 57s
29 6.124672E-6 2.414062E-7 621 20m 25sconv. rate M = 50, κ = 1.95 M = 50, κ = 1.20
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 32 / 46
α = β = 0.9 Nx = Ny ‖uh − u‖L2(Ωs)‖uh‖L2(Ω\Ωs)
# of iter. CPUs
24 9.362172E-3 1.362098E-3 11 0.12s
η = 20 25 2.519421E-3 2.851351E-4 16 0.28s
26 6.595694E-4 5.736933E-5 22 1.09s
27 1.692336E-4 1.094364E-5 33 3.28s
28 4.309593E-5 2.227257E-6 49 11s
29 1.088034E-5 3.846701E-7 71 1m 30sconv. rate M = 50, κ = 1.95 M = 50, κ = 2.35
24 9.340856E-3 1.006460E-3 12 0.13s
η = 1 25 2.508966E-3 2.386773E-4 18 0.34s
26 6.569196E-4 5.093327E-5 23 0.78s
27 1.687507E-4 1.019236E-5 34 4.22s
28 4.299552E-5 2.116632E-6 49 11.50s
29 1.086598E-5 3.724706E-7 72 1m 32sconv. rate M = 50, κ = 1.95 M = 50, κ = 2.28
24 9.316244E-3 4.396209E-5 50 0.34s
η = 0.1 25 2.482261E-3 2.186055E-5 82 1.30s
26 6.471300E-4 7.664450E-6 84 3.05s
27 1.661195E-4 2.535404E-6 75 7.22s
28 4.232844E-5 7.805887E-7 77 18s
29 1.074239E-5 1.901050E-7 83 2m 29sconv. rate M = 50, κ = 1.95 M = 50, κ = 1.58
24 9.315837E-3 4.568987E-7 120 0.95s
η = 0.01 25 2.480667E-3 2.483698E-7 961 15s
26 6.461560E-4 9.835563E-8 330 11s
27 1.656546E-4 3.973004E-8 380 40s
28 4.213543E-5 1.675305E-8 447 2m 1s
29 1.068342E-5 5.988083E-9 522 13m 12sconv. rate M = 50, κ = 1.96 M = 50, κ = 1.26
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 33 / 46
2D-Conservative FPDEs (Ervin & Roop 2007; Meerschaert et al 2006)
−∫ 2π
0
(Dθ K IβθDθu(x, y)
)P (dθ) = f(x, y), in Ω ⊂ R2,
u = 0, on ∂Ω.
(14)
P (dθ) is a probability measure on [0, 2π),
Dθ is the differential operator in the direction of θ
Dθu(x, y) :=(
cos θ∂
∂x+ sin θ
∂
∂y
)u(x, y),
and Iβθ , with 0 < β < 1, represents the βth order fractional integraloperator in the direction of θ given by
Iβθ u(x, y) :=
∫ ∞0
sβ−1
Γ(β)u(x− s cos θ, y − s sin θ)ds.
If P (dθ) is atomic with atoms 0, π/2, π, 3π/2, then (14) reduces tothe usual coordinate form.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 34 / 46
A Galerkin weak formulation and its well-posedness (Ervin & Roop 2007)
Galerkin formulation: given f ∈ H−(1−β/2)(Ω), seek u ∈ H1−β/20 (Ω)
B(u, v) :=
∫ 2π
0
[ ∫Ω
K IβθDθu Dθvdxdy]P (dθ) = 〈f, v〉,
∀ v ∈ H1−β/20 (Ω).
(15)
Theorem
B(·, ·) is coercive and continuous on H1−β/20 (Ω)×H1−β/2
0 (Ω). Hence, theGalerkin weak formulation (15) has a unique solution. Moreover,
‖u‖H1−β/2(Ω) ≤ C‖f‖H−(1−β/2)(Ω).
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 35 / 46
A Galerkin finite element method
Let h1 := 1/(N1 + 1), h2 := 1/(N2 + 1), xi := ih1, and yj := jh2.
Let ψ(ξ) = 1− |ξ| for ξ ∈ [−1, 1] and 0 elsewhere. Let
φi,j(x, y) := ψ
(x− xih1
)ψ
(y − yjh2
), 1 ≤ i ≤ N1, 2 ≤ j ≤ N2,
uh(x, y) =
N2∑j′=1
N1∑i′=1
ui′,j′φi′,j′(x, y), (x, y) ∈ Ω.
A bilinear finite element scheme for i = 1, . . . , N1 and j = 1, . . . , N2
N2∑j′=1
N1∑i′=1
B(φi′,j′ , φi,j
)ui′,j′ =
(f, φi,j
)L2 =: fi,j . (16)
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 36 / 46
A matrix form of the finite element scheme
Let N := N1N2, A =[Am,n
]Nm,n=1
, and
u :=[u1,1, . . . , uN1,1, u1,2, . . . , uN1,2, . . . , u1,N2 , . . . , uN1,N2
]T,
f :=[f1,1, . . . , fN1,1, f1,2, . . . , fN1,2, . . . , f1,N2
, . . . , fN1,N2
]TLet Am,n := B
(φi′,j′ , φi,j
)with
m = (j − 1)N1 + i, 1 ≤ i ≤ N1, 1 ≤ j ≤ N2,
n = (j′ − 1)N1 + i′, 1 ≤ i′ ≤ N1, 1 ≤ j′ ≤ N2.(17)
The finite element scheme (16) can be expressed in a matrix form
Au = f. (18)
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 37 / 46
Features of the finite element scheme
Features of numerical methods for coordinate-form FPDEs
A is dense, the number of nonzero entries at each row = O(N1 +N2),which →∞ as N →∞.The number of nonzero entries at each row divided by the total numberof the entries at the same row = O((N1 +N2)/N) = O(N−1/2).A has a tensor produce structure.
Features of the finite element method for full FPDEs
A is full.A has a complicated structure, as it couples the nodes in all thedirections!It does not seem feasible to explore a tensor-produce structure of A.We instead explore the translation invariance property of A.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 38 / 46
Translation invariant structure of A
Theorem
Let the indices (i1, j1), (i′1, j′1), (i2, j2), and (i′2, j
′2) be related by
i′1 − i1 = i′2 − i2, j′1 − j1 = j′2 − j2. (19)
Then the following translation-invariance property holds∫ 2π
0
[ ∫Ω
K D−βθ Dθφi′1,j′1(x, y)Dθφi1,j1(x, y)dxdy]P (dθ)
=
∫ 2π
0
[ ∫Ω
K D−βθ Dθφi′2,j′2(x, y)Dθφi2,j2(x, y)dxdy]P (dθ).
(20)
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 39 / 46
Figure: Illustration of the translation invariance
Ωi1, j
1
Ωi2, j
2
Ωi1′, j
1′
Ωi2′, j
2′
(ξ, η)
(x, y)(ξ′, η′)
(x′, y′)
s1
s2
s1
s2
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 40 / 46
Theorem
The stiffness matrix A is an N2-by-N2 block-Toeplitz matrix
A =
T0 T1 . . . TN2−2 TN2−1
T−1 T0 T1
. . . TN2−2
.... . .
. . .. . .
...
T2−N2
. . . T−1 T0 T1
T1−N2 T2−N2 . . . T−1 T0
, (21)
Each block Tj is an N1-by-N1 Toeplitz matrix
Tj =
t0,j t1,j . . . tN1−2,j tN1−1,j
t−1,j t0,j t1,j. . . tN1−2,j
.... . .
. . .. . .
...
t2−N1,j
. . . t−1,j t0,j t1,jt1−N1,j t2−N1,j . . . t−1,j t0,j
. (22)
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 41 / 46
Impact of the theorem
Av can be evaluated in O(N logN) operations, by embedded into a4N -by-4N block-circulant-circulant-block matrix.
For coordinate FPDEs, Ay is block-Toeplitz-circulant-block that can beembedded into a 2N -by-2N block-circulant-circulant-block matrix.
A is generated by O(N) parameters.
A requires only O(N) memory to store.Unlike finite difference methods, the evaluation of A is very expensive.Only O(N) (in contrast to N2) entries of A need to be evaluated, asignificant reduction of CPU time.
A block-circulant-circulant-block preconditioner can be developed.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 42 / 46
Numerical experiments
A 4-point (2 points in x or y) Gauss-Legendre quadrature is used toevaluate entries of A and the right-hand side
The finite element scheme is solved by the fast congugate gradientsquared (FCGS), the preconditioned fast CGS (PFCGS), and Gaussianelimination (Gauss) solvers.
These solvers were implemented using Compaq Visual Fortran 6.6 ona ThinkPad T410 Laptop.
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 43 / 46
An example run for a coordinate FPDE
β = 0.5, Ki := 1 + sin 2θi for i = 1, 2, 3, 4.
u = x2(1− x)2y2(1− y)2, f is calculated accordingly.
Table: The convergence rates of the Gauss, FCGS, and PFCGS solutions
Gauss FCGS PFCGS
N1=N2 ‖u− uh‖L2(Ω) ‖u− uh‖L2(Ω) ‖u− uh‖L2(Ω) Conv. Rate
23 3.487× 10−5 3.487× 10−5 3.487× 10−5
24 8.876× 10−6 8.876× 10−6 8.876× 10−6 1.9725 2.097× 10−6 2.097× 10−6 2.097× 10−6 2.0826 4.759× 10−7 4.759× 10−7 4.759× 10−7 2.1427 N/A 1.055× 10−7 1.056× 10−7 2.1728 N/A 2.307× 10−8 2.311× 10−8 2.1929 N/A 4.999× 10−9 5.003× 10−9 2.21210 N/A 1.079× 10−9 1.078× 10−9 2.21
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 44 / 46
Table: The CPU time of the FCGS, PFCGS, and Gauss
full A O(N) entries Gauss FCGS PFCGS
N1=N2 CPU CPU CPU CPU Itr. # CPU Itr. #
23 0.91s 0.05s 0.00s 0.00s 5 0.00s 424 14s 0.20s 0.05s 0.00s 9 0.00s 625 3m47s 0.83s 19s 0.05s 15 0.05s 726 1h2m 3.48s 25m6s 0.45s 28 0.19s 1027 N/A 14s N/A 3.44s 52 0.94s 1128 N/A 55s N/A 35s 94 6.73s 1529 N/A 3m37s N/A 4m49s 170 44s 21210 N/A 14m39s N/A 35m43s 300 4m13s 29
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 45 / 46
Thank You
for Your Attention!
Hong Wang, University of South Carolina (Department of Mathematics, University of South Carolina[0.05in] [email protected][0.5in])May 16, 2016 46 / 46