a new parallel domain-decomposed chebyshev...

12
1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric and Oceanic Modeling Hung-Chi Kuo 1 , Yung-Chieh Chang 2 , Yu-Ming Tsai 1 , Ming-Chih Lai 2 , Yu-Heng Tseng 1 , and Hao-Tien Chiang 1 1 Department of Atmospheric Sciences, National Taiwan University, Taipei, Taiwan 2 Department of Applied Mathematics, National Chiao Tung University, Hsinchu, Taiwan Manuscript submitted 19 October 2010 A new domain-decomposed Chebyshev collocation method is developed for regional spectral modeling. The main advantage of Chebyshev spectral method lies in its fast transformation and the convergence rate of Chebyshev polynomials depends only on the smoothness of the expanded function rather than the boundary conditions. In addition to the better convergence in its serial im- plementation, the domain-decomposition approach further facilitates an efficient parallel implementation. The boundary conditions for the individual sub-domains are exchanged through one grid interval overlapping. This novel approach is validated using the standard 1-D advection equation and inviscid Burgers’ equation. It is further applied to the vortex formation and propagation prob- lem using the 2-D fully nonlinear shallow water equations. Multi-dimensional domain-decompositions can be easily extended. The results suggest this new approach retains the advantages of spectral method with high accuracy and exponential error convergence. The domain-decomposition also reduces the spectral operation counts for each sub-domain. Mass and the quadratic quantities such as kinetic energy and enstrophy are all conserved. As a result, the parallel domain-decomposition Chebyshev method shall serve as an efficient alternative for atmospheric and oceanic modeling. 1. Introduction The advance of spectral transform method (Orszag 1970) and Fast Fourier Transform (FFT) brought the success- ful global atmospheric modeling using spectral methods (e.g., Bourke et al. 1977; Machenhauer 1979). Com- paring with finite difference and finite element (volume) methods, the global spectral models can eliminate the singularity of poles and preserve high accuracy and ef- ficiency due to the ”exponential convergence” property and the easy implementation of semi-implicit method. The spectral methods also offer the discrete conserva- tions of kinetic energy and enstrophy, which are impor- tant for two-dimensional turbulence. In addition to the popularity of spectral method for global models, the fi- nite difference and finite element (volume) methods have been used commonly to deal with limited areas in atmo- spheric or oceanic model. The main barrier of spectral methods for its regional scale application is the treatment of time-dependent boundary conditions. Fulton and Schubert (1987a, hereinafter FS87a) and Fulton and Schubert (1987b, hereinafter FS87b) pro- posed a Chebyshev spectral method for the limited area (regional) atmospheric modeling. The method success- fully handles the time dependent boundary conditions while retaining the advantage of spectral method. The appeal of Chebyshev spectral methods in the regional at- mospheric modeling is its efficiency and accuracy due to the fast transform, and the exponential convergence rate for the chosen basis function. The convergence rate of Chebyshev method depends mainly on the smoothness of the expanded function, rather than the boundary condi- tions (FS87a; Kuo and Williams 1992; Kuo and Williams 1998). Moreover, fast Chebyshev transform can be im- plemented via the aids of FFT (FS87a). There are two types of projections for Chebyshev spectral modeling. In the Chebyshev tau approxima- To whom correspondence should be addressed. Y.-H. Tseng Department of Atmospheric Sciences, National Taiwan University, Taipei 106, Taiwan. E-mail: [email protected] Journal of Advances in Modeling Earth Systems – Discussion

Upload: others

Post on 14-Aug-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

1

A New Parallel Domain-Decomposed Chebyshev Collocation Methodfor Atmospheric and Oceanic Modeling

Hung-Chi Kuo 1, Yung-Chieh Chang 2, Yu-Ming Tsai 1, Ming-Chih Lai 2, Yu-HengTseng 1, and Hao-Tien Chiang 1

1Department of Atmospheric Sciences, National Taiwan University, Taipei, Taiwan2Department of Applied Mathematics, National Chiao Tung University, Hsinchu, Taiwan

Manuscript submitted 19 October 2010

A new domain-decomposed Chebyshev collocation method is developed for regional spectral modeling. The main advantageof Chebyshev spectral method lies in its fast transformation and the convergence rate of Chebyshev polynomials depends only onthe smoothness of the expanded function rather than the boundary conditions. In addition to the better convergence in its serial im-plementation, the domain-decomposition approach further facilitates an efficient parallel implementation. The boundary conditionsfor the individual sub-domains are exchanged through one grid interval overlapping. This novel approach is validated using thestandard 1-D advection equation and inviscid Burgers’ equation. It is further applied to the vortex formation and propagation prob-lem using the 2-D fully nonlinear shallow water equations. Multi-dimensional domain-decompositions can be easily extended. Theresults suggest this new approach retains the advantages of spectral method with high accuracy and exponential error convergence.The domain-decomposition also reduces the spectral operation counts for each sub-domain. Mass and the quadratic quantities suchas kinetic energy and enstrophy are all conserved. As a result, the parallel domain-decomposition Chebyshev method shall serve asan efficient alternative for atmospheric and oceanic modeling.

1. Introduction

The advance of spectral transform method (Orszag 1970)and Fast Fourier Transform (FFT) brought the success-ful global atmospheric modeling using spectral methods(e.g., Bourke et al. 1977; Machenhauer 1979). Com-paring with finite difference and finite element (volume)methods, the global spectral models can eliminate thesingularity of poles and preserve high accuracy and ef-ficiency due to the ”exponential convergence” propertyand the easy implementation of semi-implicit method.The spectral methods also offer the discrete conserva-tions of kinetic energy and enstrophy, which are impor-tant for two-dimensional turbulence. In addition to thepopularity of spectral method for global models, the fi-nite difference and finite element (volume) methods havebeen used commonly to deal with limited areas in atmo-spheric or oceanic model. The main barrier of spectralmethods for its regional scale application is the treatment

of time-dependent boundary conditions.

Fulton and Schubert (1987a, hereinafter FS87a) andFulton and Schubert (1987b, hereinafter FS87b) pro-posed a Chebyshev spectral method for the limited area(regional) atmospheric modeling. The method success-fully handles the time dependent boundary conditionswhile retaining the advantage of spectral method. Theappeal of Chebyshev spectral methods in the regional at-mospheric modeling is its efficiency and accuracy due tothe fast transform, and the exponential convergence ratefor the chosen basis function. The convergence rate ofChebyshev method depends mainly on the smoothness ofthe expanded function, rather than the boundary condi-tions (FS87a; Kuo and Williams 1992; Kuo and Williams1998). Moreover, fast Chebyshev transform can be im-plemented via the aids of FFT (FS87a).

There are two types of projections for Chebyshevspectral modeling. In the Chebyshev tau approxima-

To whom correspondence should be addressed.Y.-H. TsengDepartment of Atmospheric Sciences, National Taiwan University, Taipei 106, Taiwan. E-mail: [email protected]

Journal of Advances in Modeling Earth Systems – Discussion

Page 2: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

2 Kuo et al.

tion, the Chebyshev basis functions need not individuallysatisfy the boundary conditions; however, these bound-ary conditions are satisfied by the whole truncated se-ries. The other is called collocation projection, whichrequires the residual to vanish on the collocation pointsand the boundary conditions are implemented in phys-ical space. Kuo and Schubert (1988) studied the sta-bility of cloud-topped boundary layer with a Fourier-Chebyshev tau method based Boussinesq nonhydrostaticconvective model. The method handles the strong gra-dients of temperature and moisture which are critical inboundary layer convection modeling.

The idea of maintaining the exponential convergenceproperty with domain decomposition for spectral meth-ods was initiated by Macaraeg and Streett (1986). Theyused the ”Patched” method, in which each block spansa single interface line between a pair of adjoin domains,to handle the domain interface. Kopriva (1986, 1989)further proposed another composite-grid approach whichmaps the hyperbolic equations in the complicated ge-ometries to several squared sub-domains. The spec-tral method is performed within each domain. The ad-vantages of domain decomposition for Chebyshev spec-tral method was given in Kopriva (1986) while Kopriva(1989) introduced both ”patched” and ”overset” meth-ods for computing hyperbolic equations on complicatedgeometry. A decade later, Kopriva and Kolias (1996) de-veloped a conservative staggered-grid Chebyshev multi-domain method for compressible flows with patchedgrids. Kopriva (1996) found that within each subdomain,the polynomial order can be determined independentlyregardless of its neighbors. Chebyshev pseudospectral(collocation) method with domain decomposition is alsoapplied to deal with viscous flow calculation in orderto reducing the Gibbs phenomena over an entire do-main (Yang and Shizgal 1994). These Chebyshev-typespectral approaches with domain decomposition solveddifferent problems such as hyperbolic equations (Ko-priva 1986; Kopriva 1989), viscous flow (Yang and Shiz-gal 1994), and compressible flows (Kopriva and Kolias1996; Kopriva 1996). However, none has directly ap-plied to the atmospheric modeling.

The spectral method with domain decomposition canbe regarded as a variation of ”spectral element method”(Washington and Parkinson, 2005). One of the typicalspectral element method is the discontinuous Galerkinmethod with the Legendre polynomials (Cockburn 2003;Iskandarani et al. 1995; Ramachandran et al. 2005). Themajor advantage of the discontinuous Galerkin method

is that much of the computation is local to an elementand the conservation property is maintained. It is alsosuitable to simulate complex domain geometry. How-ever, the computational cost is much expensive due tothe spectral element and its parallel implementation re-quires better load balance and arrangement. Rather, theunique benefit of Chebyshev spectral method lies in itsfast transformation (FS87a), which is readily availableand efficient. In addition, Chebyshev spectral methoduses the non-uniform grid arrangement with finer res-olution near the boundary. The domain decompositionenhances the homogeneity of the fine resolution as thenumber of sub-domains increases, which shall providemore flexible alternative for further adaptive grid refine-ment. Thus, the domain-decompositionapproach add theChebyshev spectral method to the list of spectral elementmethods.

The main objective of this study is to develop andimplement a new parallel Chebyshev collocation methodwith domain decomposition which may be suited in theatmospheric and oceanic modeling. This is unique for re-gional spectral models since no periodic boundary con-straint is required. The multi-dimensional domain de-composition provides the most flexible combination andeffective performance for the large-scale atmosphericand oceanic modeling. The spectral operation countsfor derivative computation are even reduced for eachsub-domain. Follow the suggestion of FS87a, we haveadopted the advective form for better accuracy. Since thedomain decomposition is the most efficient way in par-allel computing to improve the computational efficiency,we simply employed Message Passing Interface (MPI)in this study. The multi-domain calculations are per-formed using different computing processes simultane-ously, which reduces a significant amount of computingtime and relieves the memory requirement for a large-scale system from the advantage of parallel computing(Smith et al. 1996).

Section 2 introduces the Chebyshev collocationmethod. The parallel domain-decomposed Chebyshevcollocation method is proposed in section 3. Sec-tion 4 investigates the accuracy of the parallel domain-decomposed Chebyshev method using advection and in-viscid Burgers’ equations. The characteristics are dis-cussed in terms of the exponential convergence of error.Section 5 presents the application of vortex formationand propagation using the 2-D nonlinear shallow waterequations. Finally, concluding remarks are given in sec-tion 6.

JAMES-D

Page 3: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

Parallel Domain-Decomposed Chebyshev Collocation Method 3

2. Methods

Spectral methods seek solutions in terms of a series ofknown basis functions. The essence of choosing basisfunctions is based on the property of ”completeness.”Namely, the solution can be represented by a set of func-tions. Practical consideration of the basis functions forspectral methods is orthogonality and efficient calcula-tion of their projection or inner product. The ”orthogo-nality” of basis functions for spectral methods is practi-cally important in the computation since the coefficientsare independent. The eigenfunctions of Sturm-Liouvilleequation are often chosen to be the basis functions. Foratmospheric spectral modeling with spherical harmonics,Chebyshev and Fourier series are often used in atmo-spheric spectral models. These series allow fast trans-formation to find the spectral coefficients with great ac-curacy and efficiency. On the other hand, the typicalprojection operators to find the spectral coefficients areGalerkin and tau methods (Hesthaven et al. 2007). Inthe Galerkin method, the basis functions need to satisfythe prescribed boundary conditions while the tau methodis more flexible, in which the basis functions are not re-quired to satisfy the boundary condition.

Here, we discussed the convergence property of theseries expansion based on the Sturm-Liouville equation.More detailed analysis can be found in Lanczos (1956),Gottlieb and Orszag (1977), and FS87a. Consider theSturm-Liouville equation in limited domain [a, b],

Lφ(x) = −[p(x)φ′(x)]′ + q(x)φ(x) = λw(x)φ(x)(2.1)

where L is a linear operator involving space derivatives,with determined functions p(x), q(x) and w(x). Theprime represents the differentiation with respect to x.Eq. (2.1) has infinite and countable sets of solutionsφ(x)∞n=0, corresponding to their eigenvalues λ∞

n=0. Theeigenfunctions φn form a complete and orthogonal basisunder the inner product

(φi, φj)w =

∫ b

a

φi(x)φj(x)w(x)dx = δij (2.2)

where δij is the Kronecker’s delta function. Therefore,any smooth function u(x) can be expanded by the basis{φn}∞n=0 with appropriate coefficients as

u(x) =

∞∑n=0

unφn(x) (2.3)

where

un = (u, φn)w. (2.4)

To estimate the magnitude of un, we substitute φn inun = (u, φn)w from Eq. (2.1), i.e.,

φn(x) = λ−1n w−1(x)Lφn(x)

= λ−1n w−1(x){−[p(x)φ′n(x)]

′ + q(x)φn(x)}.(2.5)

Then we get un = λ−1n (u,w−1Lφn)w and the integra-

tion by parts twice gives,

λ−1n (u,w−1Lφn)w = λ−1

n {∫ b

a

u(x)w−1(x){−[p(x)φ′n(x)]′ + q(x)φn(x)}w(x)dx}

= λ−1n [B(u, φn) + (v, φn)w]

(2.6)

where B(u, φn) = p(x)[u′(x)φn(x) − u(x)φ′n(x)]|x=bx=a

and v = w−1Lu.

The Chebyshev polynomials are the solutions ofChebyshev differential equation which is a special caseof the Sturm-Liouville equation with p(x) = (1−x2)1/2,q(x) = 0 andw(x) = (1−x2)−1/2. Taking the case hav-ing the domain [−1, 1] and p(−1) = p(1) = 0, whichmeans B(u, φn) equals to 0 no matter what boundedfunction u is, we can perform integration by parts forthe λ−1

n [(v, φn)w] term repeatedly as long as the func-

tion is smooth enough after each integration. Since the(v, φn)w term is bounded regardless of n and the asymp-totic behavior of eigenvalues λn = O(n2) and eigen-vectors φn(x) = O(1), φ′n(x) = O(n) as n → ∞(Cournat and Hilbert,1953), we get un < O(n−m) ifu is m times differentiable. This indicates the exponen-tial convergence by definition, i.e., the convergence rateof Chebyshev series only depends on the smoothness ofthe expanded function regardless of the boundaries.

This is different from the Fourier series where

Journal of Advances in Modeling Earth Systems – Discussion

Page 4: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

4 Kuo et al.

p(x) = 1 �= 0 and the function u must be periodicand smooth enough to maintain the exponential conver-gence property. Usually, u does not satisfy the periodiccondition in reality and this causes slower convergencerate (B(u, φn) = O(n) and un = O(n−1)). Further-more, given the boundary conditions u(a) = u(b) = 0,the convergence rates must be B(u, φn) = O(1) andun = O(n−2). The slow convergence rate is caused bythe reflection of Gibbs phenomenon since these bound-ary conditions do not satisfy the expansion function φn.

These unique properties motivate the Chebyshevpolynomials Tn(x) to be the appropriate (and flexi-ble) basis functions for boundary independent problems.There are n zeros of the Chebyshev polynomials Tn de-fined on [−1, 1] by

Tn(x) = cos (nθ), where cos θ = x. (2.7)

The Chebyshev collocation points are

xj = cos(θj), with θj =(j + 1/2)π

n, 0 ≤ j ≤ n− 1.

(2.8)〈 , 〉 denotes the inner product

〈f, g〉 =∫ 1

−1

f(x)g(x)√1− x2

dx (2.9)

and the Chebyshev polynomials have the orthogonalproperty

〈Tm, Tn〉 = π

2cnδmn, cn =

{2 , n = 01 , n > 0.

(2.10)

Note that Chebyshev spectral method is known as thebest method to reduce the point-wise error rather thanany other polynomial projection methods. It can alsoeliminate the wildly oscillation close to the boundary, socalled Runge phenomena (Hesthaven et al. 2007). If afunction is expanded by Chebyshev series

ψ(x) =∞∑n=0

ψnTn(x), (2.11)

we can obtain the spectral coefficients ψn by the relation

ψn =2

πcn〈ψ, Tn〉, n = 0, 1, ... (2.12)

Hence, in order to approximate the solution of a functionφ(x), the truncated Chebyshev series can be written as

φN (x, t) =

N∑n=0

φnTn(x). (2.13)

Eq. (2.11) and (2.13) can be calculated efficiently by theFast Chebyshev Transform. Similar to the Fast FourierTransform, the Fast Chebyshev Transform reduces theNdegree of freedom transformation to O(N logN) opera-tions instead of O(N 2) (FS87a). This advantage favorsits potential application in the large-scale atmosphericmodeling.

3. The Parallel Domain Decomposition Ap-proach

The development of parallel domain decomposition ap-proach (e.g., Smith et al. 1996) for Chebyshev collo-cation method is unique and novel in the atmosphericand oceanic modeling. The computational saving is sig-nificant not only due to the distributed computing foreach domain, but also due to the relaxation for time con-straint (i.e., allowing largerΔt), as mentioned in Kopriva(1986). For a limited domain with a specified degreeof freedom N , the time step Δt is mainly constrainedby the Courant-Friedrichs-Lewy (CFL) criteria. For thenon-uniform Chebyshev grid, the minimal Δx is propor-tional to L/N 2, where L is the length of domain, N isthe degree of freedom. Assuming Δts and Δtm are theideal time steps for certain numerical methods in a sin-gle domain andm domain-decomposed sub-domains, re-spectively, the proportionality of time step to the minimalΔx leads to

Δts ∝ L

N2, (3.1)

and, similarly,

Δtm ∝Lm

(Nm )2= m

L

N2(3.2)

where L/m is the length of each sub-domain and theminimal Δx in sub-domains is proportional to 1/( N

m)2

(each sub-domain includes onlyN/m Chebyshev grids).This implies that a m time larger time step can be usedwhen several sub-domains are employed. Ideally, thisshould result in am2 speed-up in the distributed platformif no communication cost is taken into account. Appar-ently, the main advantage of domain decomposition alsofavors the use of Chebyshev method.

Minimal communication between sub-domains is re-quired for an efficient atmospheric or oceanic model.A typical one-grid width overlapped boundaries is usedhere for the data exchange between sub-domains. Fig. 1shows the one-dimensional schematic diagram for the

JAMES-D

Page 5: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

Parallel Domain-Decomposed Chebyshev Collocation Method 5

data exchange along the overlapped boundaries. Thelength of each sub-domain is

La =L

ND − 1/2(ND − 1)(1− cos(NaπN ))

(3.3)

where L is the length of the whole domain,Na+1 is thenumber of overlapped grids between sub-domains withNa ≥ 1,ND is the number of sub-domains, andN is thedegree of freedom of each sub-domain.

Note that the grids setting match well at the over-lapped boundaries. Assuming xMN to be the N th

Chebyshev collocation point at M th sub-domain, we as-sign the data value u(xMNa

) to u(x(M+1)N ) and assignu(x(M+1)(N−Na)

) to u(xM0), respectively.

4. Test Problems

In order to show the accuracy and convergence of thenew domain-decomposed Chebyshev approach, we firstconsider the 1-D linear advection equation in the domain[−1, 1]

∂u

∂t+∂u

∂x= 0. (4.1)

The initial and boundary conditions are given by the an-alytical solution

u(x, t) = A exp[−(x− x0 − t

h)2] (4.2)

where h = 0.2, x0 = −0.5 and A = h−1/2(π/2)−1/4

are chosen as those in FS87a. Here, we also com-pare our results with the fourth-order finite differencescheme (FD4).

Fig. 2 compares the analytical solution and thenumerical approximation by Chebyshev collocationmethod and FD4 at time t = 1.0. The domain is di-vided into two sub-domains with one grid width over-lapping, e.g., two overlapped grids, as shown in Fig. 1.The degree of freedom of each sub-domain is now 12.Fig. 2a shows that the approximation of Chebyshev col-location method with domain decomposition is almostidentical to the analytic solution in 1-D advection prob-lem without numerical dispersion. Fig. 2b further showsthe L2 error of the numerical results of Chebyshev collo-cation method with single domain, double domain, andFD4 method with analytical solution at t = 1.0. As ex-pected, the absolute accuracy is slightly higher in the sin-gle domain than the double domains since the advectedGaussian function is centered at x = 0.5, where the dou-ble domains have coarser resolution. However, the error

convergence in the spectral solutions are uniformly de-creasing on the order of 10(−N/4) as N approaches 64,while the error in FD4 decreases on the order of N . Theexponential convergence of the spectral method remainsregardless of the number of domains.

Next, we consider the inviscid Burgers’ equation inthe limited domain [−1, 1]

∂u

∂t+ u(x)

∂u

∂x= 0 (4.3)

with the initial condition u(x, 0) = u − tan−1(x − x0).The boundary conditions are given by the general solu-tion of Eq. (4.3). Thus, the analytical solution gives

u = u− tan−1(x− ut− x0). (4.4)

Furthermore, the time of scale-collapse can be obtainedby differentiating Eq. (4.4) with respect to x

∂u

∂x= − 1− t∂u∂x

1 + (x− x0 − ut)2.

Assuming x = x0 + ut and u = u, this gives

(∂u

∂x)x=x0+ut = −(1− t(

∂u

∂x)x=x0+ut) (4.5)

and then

(∂u

∂x)x=x0+ut =

1

t− 1→ −∞ as t→ 1. (4.6)

Thus, the time of scale-collapse of u is 1 with the posi-tion of scale-collapse at x0 + u. Given the appropriateu and x0, the analytical solution of Eq. (4.3) at certainx and t can be found numerically for desired accuracy.Fig. 3a shows the numerical results compared with theanalytical solution for the case u = 0.5 and x0 = 0 (thescale-collapse occurs at x0 = 0). The general patterncan be obtained by the numerical solution with only 32total degree of freedom. Fig. 3c shows the correspondingerror distribution in the physical space. Note that the er-ror in double domains is larger than that in single domainsince the scale collapse (sharp front) locates near slightlycoarser collocated points. On the other hand, for anothercase u = 0.5 and x0 = −0.5, the scale collapse happensnear the center of the simulation domain (Fig. 3b), betterresults are achieved for double domains since the singledomain has coarser collocated points there, as expected.

Fig. 4 shows the convergence rates for the case ofx0 = 0 and u = 0.5. The time scale T shows theduration from the integration. Since the scale-collapsetime of u is 1, the worst convergence rates should hap-pen as the time approaching to T = 1. The black

Journal of Advances in Modeling Earth Systems – Discussion

Page 6: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

6 Kuo et al.

Figure 1: One-dimensional schematic diagram of the information exchange at the overlapped boundaries.

Figure 2: (a) Comparison between the analytical solution (Exact) and numerical results for the linear advection equa-tion. Label ”D” represents the Chebyshev collocation method with double domains. The result with the fourth-orderfinite difference scheme (FD4) is also superimposed. (b) The L 2 error for FD4 and Chebyshev collocation methodusing single domain (S) and double domain (D), respectively.

lines indicate different reference slopes on the log plot(mn : log y = nN/24). The convergence rates at thecase with double domains are consistently better than theresults at the case with a single domain since the casewith double domains has a higher resolution near thecentral area. This confirms that the magnitude of erroris indeed determined by the density of collocated grids.Different combination of domain decomposition shouldprovides enough flexibility for adaptive grid refinementand better resolution when needed.

5. 2-D Nonlinear Shallow Water Model

Since the set of primitive equations for the atmospherecould be projected on the vertical normal mode basedon the normal mode analysis, solving the atmosphericprimitive equations is equivalent to solving the set ofshallow water equations (Fulton and Schubert 1985).Therefore, we use the vortex formation and propagationproblem based on shallow water equations in a limited-area (FS87b) to show the advantage of the new approach.

We use the Collocation method (rather than Tau method)due to the ease of implementation with domain decompo-sition approach. The 2D nonlinear shallow water equa-tions are

∂u

∂t+ u

∂u

∂x+ v

∂u

∂y− fv +

∂h

∂x= 0

∂v

∂t+ u

∂v

∂x+ v

∂v

∂y+ fu+

∂h

∂y= 0

∂h

∂t+ u

∂h

∂x+ v

∂h

∂y+ (h+ h)(

∂u

∂x+∂v

∂y) = Q(x, y, t)

(5.1)

where u and v are the velocity components along x andy directions, respectively, f is the Coriolis parameter,Q(x, y, t) is the time varying outer forcing, h is the basicstate of geopotential, and h is the total height deviationfrom the basic state h.

The collocated system of shallow water equations(5.1), following FS87b, can be written as

JAMES-D

Page 7: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

Parallel Domain-Decomposed Chebyshev Collocation Method 7

(a) (b)

(c) (d)

Figure 3: Analytical solution and numerical results in single domain and double domains for inviscid Burgers’ equa-tion with (a) x0 = 0 and u = 0.5 and (b) x0 = −0.5 and u = 0.5, respectively. The error distribution in the physicalspace for (c) x0 = 0 and u = 0.5 and (d) x0 = −0.5 and u = 0.5, respectively.

dujkdt

+ ujku(1,0)jk + vjku

(0,1)jk − f vjk + h

(1,0)jk = 0

dvjkdt

+ ujkv(1,0)jk + vjk v

(0,1)jk + fujk + h

(0,1)jk = 0

dhjkdt

+ ujkh(1,0)jk + v

(0,1)jk + (h+ hjk)(u

(1,0)jk + v

(0,1)jk ) = Qjk

(5.2)

where the subscript jk denotes the variable at the col-located point (xj , yk), and the superscript (i.e., (1, 0) or(0, 1)) represents its derivative along x or y direction,respectively. For each time integration, these variablesare transformed into the spectral space, where the deriva-tives are calculated, and then transformed back to physi-cal space. There are twelve one-dimensional Chebyshevtransforms at each time step, which are more efficientthan the τ method.

We consider the 2-D shallow water equation on the

β−plane (f = f0 + βy, where f0 and β are both con-stants) with gh = c2 is 2500 m2/s2. The domain hasthe size [xa, xb] × [ya, yb] = [−2000km, 2000km] ×[−2000 km, 2000 km]. The center y = 0 located at30◦N (f0 = 2Ω sin π

6 and β = 2ΩR cos π

6 , where Ω isthe earth rotating rate and R is the radius of earth). Anadditional forcing term, Q(x, y, t), is added to simulatethe 2-D vortex dynamics, such as the existence of an ide-alized hurricane:

Journal of Advances in Modeling Earth Systems – Discussion

Page 8: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

8 Kuo et al.

Figure 4: Convergence rate of the inviscid Burgers’ equation with a single domain and double domains, respectively.The scale-collapse occurs at x = 0 with u = 0.5. The convergence gets worse when T is increasing, as expected.Label S represents a single domain case, and label D represents double domains case. The black lines indicate differentreference slopes on the log plot (mn : log y = nN/24).

Q(x, y, t) = 4 q0 exp[−(x− xcx0

)2 − (y − ycy0

)2]t2t−30 e−2t/t0 (5.3)

where q0 = 6250 m2s−2, t0 = 21600 s, e-folding width x0 = y0 = 200 km and centered at(xc, yc) = (1000 km,−1000 km). Note that the maxi-mum Q(x, y, t) occurs when t = t0. The initial condi-tion is then given as

u(x, y, 0) = −Ucos

(πy − yayb − ya

), v(x, y, 0) = 0 (5.4)

whereU = 7.5ms−1. The Q-forcing causes the vortex tovary with time and advected away from the original lo-cation. We make h in geostrophic balance on β−plane,i.e.,

∂h

∂y(x, y, 0) = −(f0 + βy)u. (5.5)

If we set Q(x, y, t) = 0 and Eq. (5.5) is hold for h,the system is in geostrophic balance state on a β−planecontinuously. Regarding to the boundary conditions, theperiodic overset condition is used along x direction for

one grid width at x = xa and x = xb. The wall bound-ary condition is then applied along the other direction aty = ya and y = yb, i.e., v = 0 at y = ya and y = yb.

In our simulation, the vortex formation by the Q-forcing is located at (1000 km,−1000 km). All modelsettings are identical to that in FS87b. The vortex isdrifted from the easterly background flow to the westerlyand recurved due to the β effects and the surroundingvorticity gradient.

The Chebyshev collocation method is used to cal-culate all derivatives and the fourth-order Runge-Kuttamethod is used for the time integration. For the do-main decomposition experiments, we exchange the dataalong x direction initially, followed by y direction. Thesimulation results are on regular grid points via Cheby-shev interpolation. Fig. 5 shows the resulting geopoten-tial field h/c (shown as contours) on 72, 144, and 216hours, respectively. These results agree well with those

JAMES-D

Page 9: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

Parallel Domain-Decomposed Chebyshev Collocation Method 9

Figure 5: The geopotential field h/c for shallow water model with single domain, DD 2× 2, and DD 4× 4 on 72, 144,and 216 hours later, respectively.

in FS87b, and there is almost no difference between sin-gle domain, Domain-decomposed 2 × 2 domains (DD2×2), and Domain-decomposed4×4 domains (DD 4×4)simulations. The vortex propagates smoothly across thesub-domain boundaries without noticeable oscillations.Fig. 6 shows further analysis of the axisymmetric veloc-ity and vorticity on 72 and 216 hours later, respectively.All results are almost identical within the round-off error.

To analyze the convergent rate, we use the single do-

main calculation with the degree of freedom N = 192as the reference state. Fig. 7 compares L2 error be-tween the case using a single domain and the case usingdouble sub-domains, respectively (degree of freedom isN = 192 for both cases). Only the results of day 2 (48hours) and day 4 (96 hours) are shown for generality.These results confirm that the exponential convergenceholds regardless of the number of sub-domains (conver-gence order of 10−4 in these calculations). However, the

Journal of Advances in Modeling Earth Systems – Discussion

Page 10: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

10 Kuo et al.

Table 1: The comparison of computational time, speed-up, and efficiency for different numbers of MPI task in 2-Dnonlinear shallow water simulation (The degree of freedom is N = 192).

Number of Processors Time (seconds) Speed-up Efficiency1 552.59 1 14 89.80 6.15 1.548 18.76 29.44 3.6816 39.97 13.83 0.86

accuracy of the domain decomposed case is slightly lessthan the single domain results due to the difference in thenon-uniform grid spacing.

The parallel performance for the 2-D shallow wa-ter equations model are summarized in Table 1 with thespeed-up defined as SU = T1/Tp, and the efficiencySU/n; T1 and Tp are the cpu time for single domainand p sub-domains with domain-decomposition, respec-tively, and n is the number of MPI tasks. The bench-mark test is performed on a Linux cluster, in which eachcomputing node has two quad-cores processors. Thecommunication is based on the infiniband connection.Note that the efficiency is great than 1 and increases withthe increasing number of MPI tasks for the number ofMPI tasks ≤ 8. This leads to a major advantage of thedomain-decomposed Chebyshev method. Note that it re-quires O(N 2) operations to take derivatives on N de-gree of freedom for the typical Chebyshev collocationmethod (using a 1-D calculation as an example). Foreach subdomain, the degree of freedom can be reducedto N/m, where m is the number of sub-domains. Thisleads to O(N 2/m2) operations simultaneously for eachsub-domain and increases the efficiency for a small num-

ber of MPI tasks. The communication overhead beginsto dominate when the number of MPI tasks further in-creases (e.g., n = 16). This mainly results from moredata exchange across the computing nodes.

Table 2 shows the conservation property of mass, ki-netic energy, and enstrophy in the whole domain afterday 3, 6 and 9, respectively, based on the case of DD4× 4 over a single domain. The results suggest the con-servation of the first and second moments is reasonablyretained using the new domain decomposition approach.

Some weak oscillations are observed in the geopo-tential field (h/c) in all simulations as noted in FS87b.Since it is caused by the energy cascade to unresolvedscales and non-exact boundary conditions, simply in-creasing model resolution will not completely eliminatethe oscillations. Using a small amount of dissipation isone alternative to represent the energy cascade processand reduce the effects from the non-exact boundary data.In FS87b and in our single domain simulation, Cheby-shev models could run quite well without any dissipation.This property still holds for the domain decompositionsimulations.

Table 2: The conservation property of mass, kinetic energy, and enstrophy after day 3, 6 and 9.

Mass Kinetic Energy Enstrophy72 hours 99.99% 100.00% 99.94%144 hours 100.00% 99.92% 99.72%216 hours 99.99% 99.99% 97.32%

6. Concluding Remarks

We have introduced the new Chebyshev collocationmethod with domain decomposition for the atmosphericand oceanic modeling. One grid width overlapping isimposed between the sub-domains. Owing to the uniqueproperty of Chebyshev grids arrangement and the CFL

relation between Δx and Δt, the constraint of Δt withChebyshev method in a single domain can be furtherrelaxed in the domain decomposition approach. Us-ing the new domain-decomposed Chebyshev collocationmethod, we show the exponential convergence in 1-Dlinear advection benchmark problem. In a more strict

JAMES-D

Page 11: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

Parallel Domain-Decomposed Chebyshev Collocation Method 11

(a) (b)

(c) (d)

Figure 6: The comparison of axisymmetric velocity (a, b) and vorticity (c, d) of the vortex with single domain, DD2× 2, and DD 4× 4 on Day 3 and Day 9, respectively.

test of inviscid Burgers’ equation, the model can beintegrated successfully until the shock formation. Weshow that the domain decomposition approach in generalyields accuracy comparable to that of the single domaincalculation. In a more realistic atmospheric modelingwith a 2-D nonlinear shallow water model, the domaindecomposed Chebyshev method results also compare fa-vorably to the single domain spectral method results withan L2 error with respect to a very high resolution modelrun on the order of 10−4. We have also examined thetime-splitting method (Wicker and Skamarock 2002) inour 2D shallow water model. Since two-third of deriva-tive terms in (5.1) need to be calculated in the small grav-ity wave time step, the time-splitting method is not effi-cient in this case, and the additional complication may in-troduce more model instability. As the climate modelingcommunity relies heavily on the parallel supercomput-ers and platforms (i.e., a large amount of nodes and pro-cessors), similar approach such as the spectral elementmethod or discontinuous Galerkin type method may be-come more viable.

Acknowledgments. We are grateful to Wayne Schu-bert for many helpful discussions. This research wassupported by the National Science Council of Taiwanthrough Grants NSC-096-2917-I-002-129.

References

Bourke, W., B. McAvaney, K. Puri, and R. Thurling, 1977:Global modeling of atmospheric modeling of atmosphericflow by spectral methods. Methods in ComputationalPhysics, 17, 267–324.

Cockburn, B., 2003: Discontinuous Galerkin methods. Z.Angew. Math. Mech., 83, 731–754.

Courant, R. and D. Hilbert, 1953: Methods of MathematicalPhysics. Vol. 1. Wiley-Interscience.

Fulton, S. R. and W. H. Schubert, 1985: Vertical normal modetransforms: Theory and application. Mon. Wea. Rev., 113,647-658.

Fulton, S. R. and W. H. Schubert, 1987a: Chebyshev spectralmethods for limited-area models. part I: Model problemanalysis. Mon. Wea. Rev., 115, 1940–1953.

Journal of Advances in Modeling Earth Systems – Discussion

Page 12: A New Parallel Domain-Decomposed Chebyshev …kelvin.as.ntu.edu.tw/.../Publication/45.Chebyshev_2010.pdf1 A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric

12 Kuo et al.

SD

Figure 7: The L2 error between the case using a single domain and the case using double sub-domains, respectively(degree of freedom isN = 192 for both cases). Label S represents the case using single domain, and label D representsthe case using double domains.

Fulton, S. R. and W. H. Schubert, 1987b: Chebyshev spectralmethods for limited-area models. part II: Shallow watermodel. Mon. Wea. Rev., 115, 1954–1965.

Gottlieb, D. and S. A. Orszag, 1977: Numerical Analysis ofSpectral Methods. NSF-CBMS Monogr.

Hesthaven, J. S., S. Gottlieb, and D. Gottlieb, 2007: Spectralmethods for time-dependent problems. Cambridge Uni-versity Press, 273 pp.

Iskandarani, M., D. B. Haidvogel, and J. P. Boyd, 1995: Astaggered spectral element model with application to theoceanic shallow water equations. Int. J. Num. Meth. Flu-ids, 20, 393–414.

Kopriva, D. A., 1986: A spectral multidomain method for thesolution of hyperbolic systems. Appl. Numer. Math., 2,221–241.

Kopriva, D. A., 1989: Computation of hyperbolic equations oncomplicated domains with patched and overset chebyshevgrids. SIAM J. Sci. Stat. Comput., 10 (1), 120–132.

Kopriva, D. A., 1996: A conservative staggered-grid Cheby-shev multidomain method for compressible flows. II. asemi-structured method. J. Comput. Phys., 128, 475–488.

Kopriva, D. A. and J. H. Kolias, 1996: A conservativestaggered-grid Chebyshev multidomain method for com-pressible flows. J. Comput. Phys., 125, 244–261.

Kuo, H.-C., and W. H. Schubert, 1988: Stability of cloud-topped boundary layers. Quart. J. Roy. Meteor. Soc.,114, 887-916.

Kuo, H.-C., and R. T. Williams, 1992: Boundary effects in re-gional spectral models. Mon. Wea. Rev., 120, 2986-2992.

Kuo, H.-C., and R. T. Williams, 1998: Scale dependent accu-racy in regional spectral methods. Mon. Wea. Rev., 126,

2640-2647.

Lanczos, C., 1956: Applied Analysis. Prentice-Hall.

Macaraeg, M. G. and C. L. Streett, 1986: Improvements inspectral collocation discretization through a multiple do-main technique. Appl. Numer. Math., 2, 95–108.

Machenhauer, B., 1979: The spectral method. NumericalMethods used in Atmospheric Models, 2 (17), 121–275.

Orszag, S., 1970: Transform method for calculation of vector-coupled sums: Application to the spectral form of the vor-ticity equation. J. Atmos. Sci., 27, 890–895.

Ramachandran, D. N., S. J. Thomas, and R. D. Loft, 2005:A discontinuous Galerkin transport scheme on the cubedsphere. Mon. Wea. Rev., 133, 814–828.

Shonkwiler, R. W. and L. Lefton, 2006: An introduction to par-allel and vector scientific computing, Cambridge Univer-sity Press, 288 pp.

Smith, B. F., P. E. Bjorstad, and W. D. Gropp, 1996: Do-main decomposition: parallel multilevel methods for el-liptic partial differential equations. Cambridge UniversityPress, 224 pp.

Washington, W. M. and C. L. Parkinson, 2005: An introduc-tion to three-dimensional climate modeling. UniversityScience Books, 353 pp.

Wicker, L. J. and W. C. Skamarock, 2002: Time splitting meth-ods for elastic models using forward time schemes, Mon.Wea. Rev., 130, 2088–2097.

Yang, H. H. and B. Shizgal, 1994: Chebyshev pseudospec-tral multi-domain technique for viscous flow calculation.Comput. Methods Appl. Mech. Engrg., 118, 47–61.

JAMES-D