hp- version discontinuous galerkin methods for hyperbolic conservation...

........:.::::-::;-;:;#I' •••

•

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, VOl. 38. 3889-3908 (1995)

hp- VERSION DISCONTINUOUS GALERKIN METHODS FORHYPERBOLIC CONSERVATION LAWS:A PARALLEL ADAPTIVE STRATEGY

KIM S. BEY

Mail Stop 396. NASA Lllngle.l' Research Center, Hampton, VA 23681. U.S.A.

AIlANI PATRA AND J. TINSLEY ODEN

Tile Texas Insrilll/e for COlllpurarional and Applied M arhematics. University of Texas at Ausrin. Austin. TX 78712, U.S.A.

SUMMARY

This paper describes a parallel algorithm based on discontinuous hp-finite element approximations of linear,scalar, hyperbolic conservation laws. The paper focuseson the development of an elTcctiveparallel adaptivestrategy for such problems. Numerical experimeOlssuggest that these techniques are highly parallelizableand exponentially convergent, thereby yielding cflicien.:yIllany times superior to conventional schemes forhyperbolic problems.

KEY WORDS: hp-methods: parallel computing: hyperbolic equations: adaptive methods

I. INTRODUCTION

This paper presents new results on the development of parallel lip-adaptive discontinuousGaJerkin methods for hyperbolic conservation laws. Whilc these mcthods are applicable togeneral hyperbolic systems of conservation laws. the present study focuses on a model class ofscalar linear hyperbolic conscrvation laws for which it is possible to develop concrete mathemat-ical rcsults such as error estimates and convergence criteria.

The notion of discontinuous Galerkin methods for hyperbolic problems originated in theclassical work of Lesaint and Raviart t over two decades ago. Johnson and Pitkaranta2 general-ized the theory of discontinuous Galerkin methods by introducing mesh-dependent norms andwere able to derive a priori error estimates in such norms for linear hyperbolic problems.Recently, Bey and Oden3 extended thc theory to lip-finite element approximations and deriveda priori and a posteriori error estimates for a class of linear hyperbolic conservation laws. InReference 3, very high accuracies and convergence rates were observed in applying such schemesto representative test problems. The present paper provides further extension of the work inwhich parallel, adaptive hp-strategies are presented. It is shown that the use of discontinuousshape functions of arbitrary order leads to a methodology well suited to parallel architectures andcapable of delivering very high accuracies. High accuracy is accomplished by exploiting localapproximation features that enable the use of II-refinements in regions where the solutionis of low regularity and p-enrichment in regions where the solution is smooth. In the workpresented here, a priori and a posteriori error estimates are used to design an adaptive

CCC 0029-5989/95/223889-20© 1995 by John Wiley & Sons, Ltd.

Received 13 December 1994Revised 14 February 1995

3890 K. S. BEY, A. PATRA AND J. T. ODEN

..

stratcgy that automatically produces IIp-meshcs which deliver solutions with a specified level oferror.

The discontinuous Galerkin method is described in the following section along with a prioricrror estimates which establish rates of convergence of the method with respect to meshrefinements and enrichment of the spectral order of the finite element approximation. A posterioriestimates are reviewed in Section 3 and a 3-step lip-adaptive strategy is presented in Section 4. InSection 4, the possibility of specifying a target error in a suitable norm and automatically refiningthe mesh or locally enriching the spectral order of the approximation so as to obtain that error isprescnted. The parallel implementation of these methods is discussed in Section 5. The issue ofload balancing for lip-meshes is addressed in Section 5 and a special algorithm is presented: theRLBBO (Rccursive Load- Balanced Bisection of an Ordering). Several applications are discussedin Scction 6. It is shown for model problems that the parallel lip-adaptive discontinuous Galerkinschemes are highly efficient and can deliver results with efficiencies scveral orders of magnitudefastcr than most conventional approaches.

2. THE DISCONTINUOUS GALERKIN METHOD

Discontinuous Galerkin methods are valid for non-linear systems of hyperbolic conservationlaws, for example, the Euler equations of gas dynamics. The discussion in this paper focuses onlincar hyperbolic conservation laws which model simple non-dissipative transport processessuch as convective heat transfer or mass transport. Consider the following hyperbolic conserva-tion law

p. Vl/ + all = f in n c flil

II' n l/ = II' n{/ on r_(1)

(2)

whcre It denotes thc quantity that is conserved, for example mass, in a constant unit velocity fieldwith a velocity vector denoted by P = (fJl, fJ2)T. Thc flux across the boundary of the domain isIll/·n. wherc n denotes the unit normal vector pointing outward to the domain boundaryan. L = {x E an I p. n(x) < O} denotes thc inflow boundary. a = a(x) is a bounded measurablcfunction on n such that 0 < ao ~ (/(x), f E L 2 (n), and gEL 2 (f _).While this is the simplest ofhyperbolic conservation laws, solutions to (1) may contain discontinuities along characteristiclines x(s) defined by ax/as = p. Discontinuities arise from discontinuous inflow boundary data.Solutions to (1) belong to the space of functions V (n) = {v E L 2 (n) Ivp E L2 (n)} where vp = p. Vv,

The starting point for the discontinuous Galerkin methods is to develop an appropriate weakformulation of (1) defined on a partition of n into elements, denoted by 9h' Here the elementsK E 9h are genera) quadrilaterals of diameter 11K, The space of admissible solutions is extended tothe partition using the broken space V (9h) = nK,,~. V(K). The standard conventions in finiteclemcnt meshing are assumed to be in force: 9h is a family of partitions Yh and each element Kof 9h is the image of an invertible map F K of a master element K = [ -1, lY The parti-tions f?}h E Fh are regular and, in the present study, it is sufficient to take F K as affine maps.For each partition 9h, approximate solutions are sought in the subspace Vp(f?}h) ={v E U(n)1 VIK 0 Fi I E QPK(K) where QP'(K) denotes the space of functions formed by tensorproducts of Legendre polynomials of degree PK on the master element K. Note that thepolynomial degree, PK. may vary over different elements in the mesh and that functionsvC E Vp(f?}h) are discontinuous across element interfaces. The approximation properties of suchspaces are typified by local interpolation estimates of the following type (see References 4 and 5): ifII E H'(K), there exists a constant C, independent of 11K = diam(K) and PK (the minimal order of

IIp·DISCONTINUOUS GALERKIN

the polynomial shape functions for K), and a polynomial IV of degrce p", such that

I min(p. + 1.5)I"III1-wll,.,,:::;;C 5-' 111111•. ,,; r=O,1

hwhere 11'11,.1( denotes the usual Sobolev norm.

The following notation is used for functions v E V(.9h):

v± = lim v(x ± £~)<_0

3891

(3)

(4)

The discontinuous Galerkin method applied to (I) is written in the following abstract form:

Find IiE Vp(.9h) sllch that

.-.....,-.---............-...... ---- .

L 8,,(1/, v) = L L,,(v) for every V E Vp(.9h)

"efP'. "e~.where (see Reference 3 or 17)

AAdor(A AA II"A) ( II")(A+ '_A+)Bdll, v) = IIfJ+all,/I+-:-zVfJ + I +-:-z II -II.V oK.\'-.p" I. p"

( II,,) (A+ A++ I + -z II. V )aK -"Ir.p"

A lief ( A II" A) ( 11,,) ALdv)= J,1'+-,VfJ + 1+, (y,V),'KJlr.p"" p"

The error in the discontinuous Galerkin solution satisfies the following a priori estimate.3

(5)

(6)

(7)

T l1eorem 1. Lee II E H 5 (0) be a sollie ion to ( I). let Ii be a soilition to (5), and let (3) hold. T lien ehereexists a positive constant C, independent of Ill:' pI(, and II, sllch tllar tile approximation error,e = II - 14,satisfies the following estimate:

{ [ 11~µ[ (II" ) 2 J}tIIIe IIIhP.fJ :::;; C L -r;; max 1, -:1 II II II•. KKe~. PK p"

wllere µI( = min(PK + 1, s) -!. \'K = s - I, and

(8)

(9)

The a priori estimate (8) establishes convergence of the method and is useful for predicting howthe error in numerical solutions behaves with h-refinement or p-enrichment. Unfortunately, itslIsefulness in assessing the accuracy of a given numerical solution is limited since the estimateinvolves unknown constants and the exact solution.

3892 K. S. BEY. A. PATRA AND J. T. ODEN

3. A POSTERIORI ERROR ESTIMATION

A posteriori error estimatcs used here are based on extensions of the crror residual of methodAinsworth and Oden.6 Element error indicators arc computed by solving a suitably constructedlocal problem with the element residual as data. These local indicators arc used in the adaptivestrategy to assess the accuracy of the solution in an elemcnt. Morcover, they contribute toa global error estimate which is accurate enough to provide a reliable assessment of the quality ofthe approximate solution. Detailed dcrivation of the a posteriori estimatc and detailed numericalresults demonstrating its effectiveness for hyperbolic problcms can be found in Bey and Oden3

The local problem is constructed to result in an upper bound on the error. Let t/1K be thesolution to the following local problem:

where

U deC "K -Adt/1K.VK) = -,(P'Vt/1K,P,VVK)K +a(t/1K.vdKPK

and ii > 0 is a constant. The solution to the local problem. measured in the norm

(10)

(II)

( 12)

serves as an element error indicator in the adaptive stralegy. The global error estimate is given by

(13)

~._.....~....._., ......._.--......~..__ ..- ...... -.

The Solulion to the local problem (10) providcs an upper bound on thc global error in thefollowing sense.3

Lemma I. Let t/1 E V (9\) be the soll/tion to tile following problem:

L A~('ftK,v)= L B(e,v), V'veV(&'h)Kfi? KE?

Tllen there exists a positive COl/stant k sl/cll tllat

(14)

( 15)

4. THE hp-ADAPTIVE STRATEGY

The hp-adaptive strategy used here is a generalization of the 3-step slrategy developed by OdenPatra, and Feng.7.8 The strategy in References 7 and 8 was developed for a large class of ellipticproblems and, in several applications, has been shown to yield exponential rates of convergencewith respect to both CPU time and the number of unknowns.8 The lip-adaptive strategy is basedon the assumption that a reliable a posteriori error estimate is available for determining the errorin the approximate solution and that an a priori error estimate is available for determining howchanges in the mesh parameters affect the solution accuracy. The goal of the lip-strategy is todeliver a solution with a specified error in only three steps:

lip-DISCONTINUOUS GALERKIN 3893

( 16)

--..- -

(i) Construct an initial mesh &0 containing N(f!}o) elements. The elements in '~o may haveuniform PI( = Po and essentially uniform III( :::: 110' Solve the problem of interest on Vp.(&o)and estimate the error.

(ii) Construct a mesh &1 by subdividing each element in &0 into the number of elementsrequired to equally distribute the error and reduce it to a specified level. Solve the problemon Vp.(9t) and estimate the error.

(iii) Estimate the local regularity of the solution and divide the mcsh into two sets of elements:one set contains the elements in regions where the solution is of low regularity and theother set contains the remaining elements in the mesh where the solution is smooth.Reduce and equidistribute the error in each set separately. For clements in rcgions wherethe solution is smooth, enrich the approximation space by increasing Pl(. Subdivide theelements in regions where the solution is of low regularity. Solve the problem on the newspace Vp, (&2) and estimate the error.

If the estimated error in the solution is larger than the specified error after the third step.repeat steps (ii) and (iii) until the desired error is attained.

As will be seen. some special features of this approach are nceded for hyperbolic problems. Fordiscontinuous solutions, p-enrichments in step (iii) are confined to elements in regions where thesolution is smooth, since higher-order e1cments at discontinuities may result in oscillatorysolutions. Moreover. p-enrichment of elcments in regions where the solution is of low regularitydoes not improve the accuracy of the approximation. as indicated by the a priori estimate (8). Therate of convergence with respect to p in equation (8) depends on s, a measure of the solutionregularity. Our approach thus does not involve a flux-limiting post processing to controloscillations. It relies on the above adaptive strategy that automatically cmploys low-orderelement approximations in regions of low local regularity. The II-refinement in step (iii) isnecessary if the error in low-regularity regions (e.g. near discontinuities) after step (ii) exceeds thefinal target error specified for step (iii).

The data structure used for the resulting lip-meshes is based on the work of Demkowicz et al,9for continuous finite element approximations, Refinements are achieved by splitting an element inthe initial mesh into four elements. The data structure permits I-irregular meshes as illustrated inFigure 1. For the case where the elements are or different spectral order, the higher spectral orderis imposed upon the common edge. HO\vever, these two properties are necessary for maintainingcontinuity of the finite element approximation and are not needed when using the discontinuousGalerkin approximation.

4.1. Predicting a new mesh

In this section, additional assumptions and the use of the a priori estimate (8) to predict thestructure of a new mesh are described. First, recall that the norm used in the a priori estimate (8)includes jump terms on the element inflow boundaries:

{ L [~lIe/lll; + lIell~+ «e+ - e-»tK.\L + «e»:Knan]}tKefl'. PK

{ [h2P, (J) ] }tI( II( 2

~ C L 2", max l,~ lIull"KKefl'. PK PK

while the norm induced by the upper bound local problem contains no element boundary terms.The norm induced by the upper bound local problem with ii = 1 is used to measure the error, and

3894

Element spectralorders ( P K)

K. S. HEY. A. PATRA AND J. T. ODEN

,

I-irregular mesh

igher order element edgedominates lower order neighbor

Figure I. Non-unifom hp·mesh

it is assumed that all the e1emcnts in the meshes at each step in the adaptive procedure are suchthat max (L 11,;/ p~) = 1. Comparing (9) and the error norm defincd in (13), it is clear that (8)implies that

(17)

'" ..- ~.--.-'<~:.~''''''-:'''.''.lO"._.

In designing thc 3-step adaptive strategy, the a posteriori error estimate is assumed to bea rcasonable approximation to the actual error and is used to replace the true error on theleft-hand side 0[( 17) and the resulting incquality is treated as an equality. This may not be a goodassumption for coarse meshes and rough solutions, but it provides some means for predicting thestructure of thc ncw mesh.

Since the adaptive procedure is based on either refinement or enrichment of the initial mcsh,the term C 11/111;" rcmains constant throughout the adaptive process and can be calculated fromthe estimated error on the initial mesh. Let 00 denote the estimated error for the solution!iDE Vpo([jilo) in step (i). Then,

(18)

or at the element level

v.Po {} .

A" = Ii;; 0.1\111\

(19)

4.2. An h-refinement step

Let Y/T denote the target error to be achieved by the 3-step strategy. We specify 'TT asa percentage of the solution measured in the same norm as the error. For the II-step, we seeka partition which delivers an intermediate target error 'Th = (1.Y/T where (1. is some constant chosenso that 'tT < 'Th' The partition f!IJ, is constructed by subdividing each element K Ef!lJo into11K elements such that the error 81 = 'Ih (II 140 IIA" + 00) is distributed equally among the elementsin f!IJ t. Following (18), we have

(20)

K = l. ... ,N(90)

IIp-DISCONTINUOUS GALER K IN

where

For a mesh which achieves an equally distributed error.

0I.L = 0IN( "IJ . L = I N(CiIl;7".l ' .. ., .7" d

where

IV (,"oJ

N(9.) = L "KK= I

Combining (20)-(22) yields

I 2µ. 2µ.+I_ lK AK .

11K - 2,'. or N(9d·Po I

Equations (23) and (24) are solved iteratively to determine "K for each elcment K E .dJ'o.

Remarks.

3895

(21)

(22)

(23)

(24)

iii:i:::::•..~_-:-:-~

(i) A value of 11K < I signals that de-refinement is needed to equi-distribute the error.Although not implemented in this work. de-rcfinement significantly decreases thc com-putational cost of the ovcrall proccss.

(ii) The largest local errors will occur in elcmcnts which contain a discontinuity. Thcseelements will receive the highest level of refinement.

(iii) For case in implemcntation. the refinement of an element K in thc initial mesh fJ>o islimited to two levels.

(iv) The parameters µK = min (Po + !'r - !) and VK = r - I are global constants dictated bythe regularity of the exact solution It E V(Q). Formally, (8) is not valid when the solutioncontains a discontinuity on the interior of an element. However, numerical expcrimentsconfirm that the rate of convergence of the error is µK = t when the solution containsa discontinuity which is not aligncd with the element interfaces, in agreement withlong-standing error estimates for hyperbolic conservation laws.lo

4.3. A p-enrichme1lt step

Let Oh denote the estimated error in the solution 171 E Vpo(91) obtained in step (ii). Treating thea priori estimate as an equality,

(25)

(26)

which gives the constants AK on 91:

'·E

AK = Ph~EOu : K = I, ... ,N (9.lK

The error is reduced by constructing a distribution of polynomial orders, PI, where thepolynomial order of each element in &'1 is selected to equally distribute the target error. Setting


whcrc for an equally distributed error

II a,[·UT.K=-

Combining (26)-(28) yields the new value PK for each element in ·qPl:

"" P~"8uJN«(jJI)PK =

OT

(27)

(28)

(29)

(30)

For smooth solutions, the parameter 11K in (27) depends on PI( which is unknown at this point.I-Jere we use the value which is actually associated with Po. More discussion on the parameters11K and \'1( is given in Section 4.4.

For discontinuous solutions. p-enrichments are confined to regions where the solution issmooth since increasing p at discontinuities may result in oscillatory solutions and does notimprove the accuracy of the approximation. The local regularity of the solution on each elementK E .qPo is estimated by computing the rate of convergence, ilK' of the local error in steps (i) and (ii):

log e _- log ,j'"n. 02" . = 0.1( L.I.= I h.I. K = I N(tJ,IJ )II I( I" . . . , .7 ()

IK10ghK-log-~

From the a priori estimate (8), the expectcd rate of convergence is Po + t if the solution is smoothin K e f7/Jo. To prohibit p-enrichments in discontinuous regions, we simply set PI. = Po,L = I, ... ,11K if ilK < Po + t for the parent elcment. The contribution of these clements to theglobal error therefore remains fixed in the p-step of the adaptive strategy. If this contributionexceeds the target 0T' then an additional /r-step is required before thc p-stcp to achieve thetarget error.

4.4. An hp-step as an alternative to the p-step

For smooth solutions or for solutions which contain mild irregularitics, the p-step should beadequate to reduce the global error to the specified level. For problems with strong discontinui-ties, however, the local crror at discontinuities may be large and significantly dominate the globalerror obtained after the h-step, particularly in the current implementation where a maximum of2 levels of /r-refinement are permitted.

An alternative to the p-step of the adaptive strategy is an lip-step where h-refinement isperformed at the discontinuity and p-enrichment is performed in smooth regions. In this case. thetarget error is specified as a reduction factor for the error in disconti!luous regions and for thesmooth region individually. Let IJD denote the normalized error in discontinuous regions andIJs denote the normalized error in smooth regions. Then the target error for the /rp-step is specifiedby the reduction factors aD and as as

(31)

IIp-DISCONTINUOUS GAlERKIN 3897

The criterion PK < Po + t, where JiK is given by (30), is used to distinguish discontinuous regionsfrom smooth regions. Equations (23) and (24), with N (&10) replaced by the number of elements inthe discontinuous region, are used to determine the number of elements required to reduce theerror in discontinuous regions to av,/o. Equation (29), with N (&11) replaced by the number ofelements in the smooth region, is used to determine the values of PI( required to reduce the error inthe smooth region to asy/s. This approach can be referred to as an hlJp-adaptive strategy.

4.5. Selection of the parameters

Formally, the parameters JlK and I'K depend on the global regularity of the solution. Howeverthe rate of convergence of the local error depends on the local regularity of the solution. Forpiecewise continuous solutions, the rate of convergence of the local error varies greatly betweenthe smooth regions of the solution and some small neighbourhood around discontinuities. Usingglobal values of these parameters based on an irregular solution can result in over-refinement ofsmooth regions, while using global values based on smooth solutions can result in under-refinement of discontinuities. Here we use local valucs of µK and VK which arc initially computedfor a uniform h-refinemcnt and a uniform p-enrichment of a coarse mesh. Local values of µK arethcn re-computed after the II-step using (30), and thcn used in (27) for the 1'- or IIp·step. Whilethcre is little theoretical justification for using local values, numerical results indicate that theapproach works quite well for solutions with discontinuities.

Selection of the reduction factor 11. used to determine the intermediate target crror for the II-stcpof the adaptivc strategy is important in obtaining an optimal mesh. Specifying a value of a whichgives an intermediatc error which is closer to the target error than it is to the initial error willresult in meshes with mostly II-refinement. Specifying a valuc of 11. which givcs an intermediatetarget error which is closer to the initial crror than it is to the target crror leads to meshes withlittle h-refinement and elements with large values of PK' Numerical expcriments for ellipticproblcms suggest that the optimal choice is 11. = I + 1>(1/0 - I/·rl/Y/T(µ + v). where JI and v are therates of convergence of the global error. 8

5. PARALLEL IMPLEMENTATION

With the aim of solving more general problems in mind, the linear model problem is solved ina way that is easily extendible to the non-linear case, takes full advantage of the discontinuousapproximation, and is amenable to parallel computations: this is achieved by solving the time-dependent conservation law for the steady-state solution. Let u"+ I = '/(', t,,+ I) wheret,,+ I = (n + l)Llt and Llt is the time-step increment. Assuming the solution at time t" is known, thenthe solution to (1) can be obtained using, for example, the forward Euler time marching scheme:

(u"+ I, v)" = (u", v)1( + M[LI((v) - Bdu", v)]. '<::IvE Vp(9"h), '<::IK E &lh (32)

The time-marching versions of the discontinuous Galerkin methods, such as (32), fall naturallyinto the class of single program multiple data (SPMD) parallel applications. Given the solution attime level t", the solution is advanced to time level t,,+ I by solving a (p" + 1) x (pI( + I) system oflinear equations for the (pI( + 1)2 degrees of freedom for every element K in the partition. Theonly coupling between elements arises in the evaluation of the boundary integrals in (6) where thesolution along common edges of neighbouring elements is needed. The evolution of the solutioncan be performed on all the clements simultaneously, once this information is available. Hence,the only information that each partition needs is the solution for elements on the interface ofneighbouring partitions.

3898 K. S. BEY, A. PATRA ANO J. T. OOEN

The primary issue in a parallel implemcntation of discontinuous Galerkin methods is tobalance the workload among the available processors whilc minimizing the communicationbetwcen processors, thereby optimizing the utilization of the multi-processor cnvironment. Fora machine with P processors, this is accomplished by dividing the partition into P subdomainsand assigning the elements contained in a particular subdomain to a particular processor. Tominimize communications. the interface of the subdomain boundaries should have as smalla measure as possible. Moreover, the local nature of the discontinuous Galerkin formulationrequires communication of the solution only along subdomain boundaries at each time step.Since computations arc performed at an element level. communications arc overlapped with thecomputation to further minimize the penalty of communicating between processors. This overlapis achieved by processing the interface elements first and using asynchronous communicationprimitives.

Thc parallel implementation dcscribed here is targeted for the Intel iPSe 860* computer whichis a distributed memory machine with 32 processors arranged in a hypercube architecture.llowever. many of the ideas employed here are general. and applicable to other multi-processormachines.

5. J. Domain decomposition for hp meshes

Most domain decomposition methods have been developed and analyzed for II-type mesheswhcrc the number of dcgrees of freedom, and hence the computational ell'ort, is the same for everyclemcnt in the mesh. For these types of meshes, equally distributing the elements among theavailable processors will usually result in a balanced load.

The most succcssful domain decomposition methods in this situation are based on recursivebisectioning of eithcr the coordinates or an ordering of the elemcnts, In thc rccursivc bisectioningof thc physical domain, trial scparators dcllnc possiblc subdomain conllgurations. The selectionof a particular separator as a subdomain interface is based on the resulting load balance as wcll asinterface sizc. Vavasis11 has obtained theoretical bounds on the achievable load balance andintcrfacc size for such partitionings based on nodal coordinates. Onc disadvantage of thisapproach is that it can be computationally expensive for multi pic space dimensions.

Recursive biscctioning methods based on an ordering of the elements havc a computationaladvantage since the bisectioning is performed on a one-dimensional list of elements. regardless ofthe spatial dimension of the domain. One difficulty with this approach is constructing an orderingwhich preserves the locality of the elements in the mesh. A locality-preserving ordering isnecessary to avoid multiply-connected or disconnected subdomains and to minimize interfacesize. Pot hen el a/.12 construct such an ordering by using the second eigenvector of the Laplacianmatrix associated with the graph of the mesh. However, the computational cost of deriving theordering is quite high. We use here an ordering scheme introduced by Patra and Oden.13

For lip meshes, in which the number of degrees of freedom (and hence the computational effort)vary from element to element. the domain decomposition must include some measure of thecomputational work for each element. Here we investigate a load-based recursive bisection of anordcringt3 which seeks to balance the load using some local measure of the computational work.In this study, empirical clement computational time data are uscd as a measure of computationalload.

• The use of lrademarks or names of manufacturers in lhis report is for accurale reponing' and does not constitute anotlicial endorscll1enl. either expressed or implied. of such products or manufacturers by the National Aeronautics andSpace Administration

hp-DISCONTINUOUS GALER KIN 3899

5.2. Recursive load-based bisection of an ordering

The recursive load-board bisection of an ordering (RLBBO) method is based on recursivebisectioning of an ordering of the elements. The ordering is based on classical results of Peallo 14

and Hilbert 15 concerning a class of continuous mappings of the unit interval onto a unithypercube. The significance of the Peano-Hilbert results is that one can always construct a spacefilling curve that connects a set of points in II-dimensional space and that uniquely maps themonto the unit interval. Applying this mapping to the set of points given by the element centroids,results in an ordcring of the elements defined by its location on the unit interval. Completedetails of the Peano-Hilbert ordering and proof that it is locality-preserving can be found inReference 13.

An algorithm for this method follows:

II.: ~1\-+ Un. where U t is the unit interval and U. is the unit hypercubcDc: Jist of elements in subdomain III,: numbcr of trial separator surfacesiii: quality index for a trial separator

I. Create an ordering of the elements by mapping the centroids of the elements onto a Peano-Hilbcrt curve. (See Rereren~e 13)

2. Let /1' be the distance of the centroid of elemcnt K along the space filling curve.3. Compute maximum and minimum of fK, tm .. and trnin'4. Compute II, trial separator levels as

/j=t. +110 .. -/,mill minII,

5. For each /i compute a quality of interfacc index gj

(dofld' )lJj = abs ~ - 1 . dof,o, + dofintcr

Ol,,~hl

Replace dof by error or other load estimate as appropriate.6. Choose as interface tint the t, that corresponds to lowest qj7. For K = 1 to the total number of elements

If tK ~ tin' thenDl +-- DI u{K}elseD2 +-- D2u{K}end ifend forAt this stage, the original domain has been split into two.

8. Apply 1-7 recursively on each of the generated subdomains.

5.3. DYllamic load smootllillY

For very complex adapted lip meshes, the static load balancing algorithms outlined in theprevious section may not balance the load well, especially if the subdomain sizes become verysmall, due to the unavailability of any good a priori load measure. To improve the load balance,we use a simple partition smoothing technique. After performing a static decomposition, we carry


out a few steps (normally 5) of the time integration and monitor the actual time spent in elementsolution computations on each processor. Since asynchronous communications are used, thismonitored time overlooks communication waits; however, it provides a good measure of theactual load imbalance. To improve the load balance while still maintaining the minimuminterface propertics of the subdomains, we carry out a few passes of the following procedure:

For subdomains i= I to No in this partitioning doFor all interfaces [i.j] in this subdomain do

If computatioll time ill process or i > 1·2 x computation time ill processor jthen assign elements belonging to partition i and interface [i,j] to partition j

end for:end for:

The obvious drawback in this proccdure is that unless the interface elements comprise a smallenough part of the total partition. migrating them may actually make the load imbalance worse.Thus the requiremcnt for this procedure to be effective is that the subdomain size (i.e. the problemsize) be large enough and scale with the number of processors to maintain partitioning elTIciency.Such a requirement in a parallel computing application is common.

Preliminary testing of this procedure appears to indicate reasonable improvement over thebasic static partitioning. More sophisticated strategies (see for example the tiling approach usedby Devinet6) could be used to migrate only a portion of the elements on the interface to migrate,however. this would require an element level measure and significantly complicate the issue ofestimating load.

5.4. Communicatio/Cs

Communication between processors on distributcd memory machincs can significantly affcctthe overall performance of the parallel implementation, particularly if a processor must wait torcceive information from another processor before proceeding with a calculation. To compute thesolution at time level tn+ Ion a particular subdomain, the solution at tn is needed from interfaccelements on neighbouring subdomains. Using synchronous communication strategies, that is,requesting information from neighbouring processors at the time that it is needed, leads tounnecessary waiting by all processors. Moreover, communication conflicts are likely to occursince two-way communication is requircd across interior subdoluain boundaries.

Asynchronous communications are used to minimize this wait time. In this type of communica-tion scheme, a request for information is posted before it is actually required and information ismade available to all processors that will require it as soon as it is computed. In more concreteterms this means that parts of the solution vector corresponding to interface elements are updatedon all processors as soon as they are available and not when they are actualIy required. Theuncoupled eIement-by-element nature of the discontinuous Galerkin computations alIows for thistype of asynchronous communications which completely overlap communication with computa-tions, i.e. a processor computes the solution on another element while the communicationsrequired for previous element solution are being carried out. This overlap is maximized by firstcomputing the solution on the partition interface elements, and then processing those elements onthe interior while the interface element information is simultaneously being communicated to thenecessary processors. Hence the communication time which is usually a major bottleneck inscalable paralIel computations is almost eliminated.

hp·DISCONTI NlJOUS GALER KIN

6. NUMERICAL EXAMPLES

3901

The discontinuous Galerkin method is used to solve several examples to assess the hp-adaptivestrategy described in the preceding sections and to illustrate the parallel performance. Numericalresults verifying the a priori estimate and dcmonstrating the etrectiveness of a posteriori estimatecan be found in References 3 and 17.

6.1. Example 1

We solve the linear model problem (I) with the following data:

(i) n = ( -\, I) x ( -\, I),(ii) P = (1-0. O·O)T.

(iii) a(x) = 1,0,

{

3c-S(l+y» if y<O.(iv) g = _3c-S(l +).2) otherwise.

The source term f in (I) is chosen so that the exact solution is the discontinuous function.

{3e-SIX'''''') ify<O

u(x. V) = 3 -Six'+)") h .. - e at erWlse (33)

:.::'...... "-.:i:0i2

The discontinuity is aligned with element interfaces at y = 0 to illustratc the advantage of usinga discontinuous method to capture discontinuities. particularly if the adaptive scheme includessome shock fitting which aligns the grid with thc discontinuity.

The problem is solved using a variety of uniform mcshes with II-refinements, p-enrichments,and the lip-adaptive strategy with no special treatment at the shock. The error histories for twohp-adaptive solutions with different initial meshes are listed in Tables I and I\. For both cases, the

Table I. Example I-Error history for an adaptive Irp solution startingfrom a uniform 4 x 4 mesh, p = 2

Target AchievedAdaptive normalized normalized ElTcctivitystep crror error index

Initial 4 x 4 mesh, p = 2 - 0'091 1055h-refinement 0,05 0·031 1·073p-en richmcn t 0,005 0·0029 1-424

Table II. Example I-Error history for an adaptive hp solution startingwith a uniform 8 x 8 mesh, p = 1

Target AchievedAdaptive normalized normalized Effectivitystep error error index

Initial 8 x 8 mesh, p = I - 0'154 0,998h-refinement 0,075 0,033 0,996p-enrichmcn t 0,005 0,0055 0·901

3902 K. S. HEY. A. PATRA AND J. T. ODEN

10·

p-enrichmant ~4x4

exact

estimate

10'Number of unknowns

10'

Figure 2. Example I-rates of convergence of the error with respect to the total number of unknowns

.......... -- ' ..-":. ...:.~-:-.,":"'~

PK

@:.1 3,", 211

Figure 3. Example I-spectral order of adaptive hp-enriched mesh

target error was achieved at each step in the adaptive process. Recall that the target error isa global quantity obtained as the square root of the sum of the squares of the element errorindicators (13). Therefore, the error in a single element cannot exceed the target error. Also listedin the tables is the effectivity index for the error estimate. An effectivity index, the ratio of theestimated error to the exact error, close to unity indicates the error estimate is reliable andprovides a good approximation to the actual error. The effectivity indices in Table I are greaterthan unity, indicating that the actual error is less than the estimated error while the effectivityindices in Table II show thai the actual error is slightly less than th'e estimated error. For bothcases, however, the estimated error is suflkiently close to the actual error to result in an effectiveadaptive strategy.

The rate of convergence of the estimaled and exact error is compared in Figure 2. The exacterror (indicated by a solid line in the figure) and the estimated error (indicated by a dashed line) are

hp-()ISCONTI NlJOUS GALERK IN 3903

element errorindicator

I3.5E-33.1&32.6E-32.2E-31.8E-3I.3E-38.8E-44.4E-4

Figure 4. Example l---estirnated error on adaptive hp-mesh

in close agreement. indicating the reliability of the estimate. Note that with thc discontinuityaligned with element interfaces. the error behaves as if the solution is smooth: that is, algebraicrates of convergence are achicved with respect to mesh refinements. and cxponential rates ofconvergence are achicved with respect to p-enrichments. Whcn the discontinuity is aligned withthe element interfaces, the most significant crror rcduction with fewest dcgrccs of frcedom resultsby specifying a target error for the II-step which is closer to lhe initial error than to thc finaltargelerror. This is verified by the twO curves corrcsponding 10 the lip-adaptive solutions in Figure 2.The curves corresponding to the two lip-adaptive solutions in Figure 2 exhibit supcr-linear ratesof convergencc. Thc final lip-adapted mesh is shown in Figure 3 and the rcsultingerror dislribution is shown in Figure 4. Notc that the crror is nearly equi-distributed among theclements in the mesh.

6.2. Example 2

The following data is used in (1):

(i) Q = ( - L I)x ( -1. 1)

(ii) P = ( f, fY(iii) a(x) = 1·0. {5e - U·+y2] + 3e-(1 +(y-t)2J, x = -1

(IV) g(x,y)= "':"'1_8e-Sllx-t)2+tJ. y=-I

The source term fin (I) is chosen so that the exact solution is a function which is discontinuousalong the domain diagonal given by

and shown in Figure 5.

if y> xotherwise

(34)

3904 K. S. IlEY. A. PATRA AND J. T. OOEN

u

8 6.31

7 4.67

6 3.03

5 1.39

4 ·0.263 ·1.90

2 ·3.54I -5.18

Figure 5. Exact solution 10 Example 2

30 ~ no. of interior clementsc-- no. of interface elcmcnls

25

g. 20"0~0. ,.<Il 15

10"

10 15 20 25number of processors

30

Figure 6. Parallel speedup ror a unirorm 32 x 32 mesh or p == I elements

The local nature of the discontinuous Galerkin method results in nearly optimal parallelefficiency for uniform meshes as illustrated in Figure 6. The parallel speedup, defined here as theratio of the CPU time on one processor to the CPU time on 11 processors, using a uniform 32 x 32mesh of p = 1 elements is shown in Figure 6. The departure from ideal speedup as the number ofprocessors increases is due to inter-processor communication. This is verified by the domaindecomposition data listed in Table Ill. The RLBBO partitioning scheme balances the load in thesense that the elements are equally distributed among the available processors. However. as thenumber of processors increase, the communication patterns become more varied due to changesin the number of interface elements on each processor. The different situations that occur arelisted in the third column of Table III. The fourth column of Table III shows that the ratio of thenumber of interior elements to the number of interface elements decreases as the number ofprocessors increase. As this ratio decreases. there is an increasing likelihood that interior elementsmust wait to receive information from neighbouring subdomains.

An alternate measure of parallel efficiency, scalability, is shown in Table IV. Here the workloadper processor remains fixed, that is, the problem size is increased as the number of processors

t'

l'p·DISCONTINUOUS GALERKIN

Table III. RLBBO domain decomposition data for a uniform 32 x 32 mesh ofp = I elements

No. of interior elemcntsNo. of No. of elements No. of intcrfaceprocessors per processors elemcnts No. of interface elements

1 1024 02 512 32 15·04 256 31 7·268 128 38/23 2'37/4'57

16 64 30/22/15 1'13/1'91/3'2732 32 20/18/14/11 0'6/078/1'29/1'91

Tablc IV. Parallel cfficiency of thc discontinuous Galerkin mcthod foruniform meshes

Number of Mesh No. of clcmcnls Solution Totalprocessors p=1 per processor CPU time CPU time

I 4x4 16 175 20·34 8x8 16 20'1 25'216 16x 16 16 230 30·5

2 8x8 32 32'2 37,68 16x 16 32 35,9 43,'32 32 x 32 32 39-3 56·2

3905

. .. ..... .., .~:.~~.::1'~~

increascs. The CPU times listed in Table IV arc nearly constant for a constant workload,indicating that the parallel implcmentation is scalable.

This problcm was solved using the three-step adaptive strategy with the lip-step described inSection 4.4 as the final step in the adaptive process. The parameters CXD = 0·8 and CXs = 0·6 wereused in (31) to specify the target error. The error history for the entire adaptive process is listed inTable V. Note that the target error is nearly achieved at each step in the process. The effectivityindex for the error estimator, also listed in Table V, indicates that the global error is under-estimated for this example. This underestimation is primarily due to the underestimation of thelocal error indicator near the discontinuity.

The adaptive hp-mesh. shown in Figure 7. consists of 526 elements ranging from p = 1 to P = 5for a total of 2879 degrees of freedom. The parallel speedup for this mesh is shown in Figure 8.The RLBBO domain decomposition strategy with the dynamic smoothing described in Section5.2 was used to balance the load. A measure of the effectiveness of the load balancing, the ratioof the average CPU time for all the processors to the maximum processor CPU time is alsoshown in Figure 8. Since asynchronous communications are used in the parallel implementation,these CPU times also include any communication wait time. A ratio of unity indicates a perfectload balance with no communication wait time. Note that the departure from ideal speedupcoincides with decrease in the effectiveness of the load balancing scheme. Clearly, more sophisti-cated migration strategies are needed to improve the load balance when more processors areused.

3906 K. S. BEY, A. PATRA AND J. T. ODEN

Table V. Example 2-Error history for an adaptive hp solution starting from a uniform 8 x 8 mesh. p = 1

Target AchievedAdaptive normalized normalized Effectivitystep error error index

Initial 8 x 8 element p = ! mesh - 0'126 0,73lI·step 0,05 tl059 0,64IIp·step 0,04 0049 0·61

=::;;;:;;::'::7.:-te"

• ~

I i

Figure 7. Adaplive lip-mesh for Example 2

5432I

16

I I

speedup

6

average CPU timeI' = maximum CPU time

6 11number of processors

Figurc 8. Parallcl speedup for the adaplive hp-mesh

16

h".DISCONTINUOUS GAlERKIN

7. CONCLUDING REMARKS

3907

.............. · ..c ..

The development of a parallel lip-adaptive discontinuous Galerkin method for hyperbolicconservation laws is presented in this work. The emphasis of the work is on a model class of linearhyperbolic conservation laws for which it is possible to develop a priori error estimates andreliable a posteriori estimates which provide bounds on the actual error. These estimates areobtained using a mesh-dependent norm which reflects the dependence of the error on the localclement size and the local order of the approximation.3.17

The hp-adaplive strategy is designed to dcliver solutions to a specified error level in an efficientway. This is accomplished using a three-step procedure in which the a posterior; estimate is usedto determine the error in the solution at a particular adaptive step and the a priori estimate is usedto predict the mesh required to deliver a solution with thc specified level of error. The lip-adaptivestrategy makes further use of the a priori estimate to provide detection of discontinuities in thesolution thereby identifying regions where II-refinement and p-enrichment are appropriate.

A parallel implementation of the discontinuous Galerkin method is presented which takes fulladvantage of the local character of the method and includes a special load-balancing strategy forhp-meshes, Recursive Load-Based Bisectioning of an Ordering. This parallel implementationresults in nearly optimal speedups when the ratio of interior clements to subdomain interfaceelements is sufficiently large. The load-balancing strategy, however. becomes less effective as thenumber of processors is increased.

Numerical experiments demonstrate the effectiveness of the a posteriori estimates in providingreliable estimates of thc actual error in the numcrical solution. The numerical examples alsoiIlustrate the ability of the lip-adaptive strategy to provide super-linear convergence rates withrcspect to the number of unknowns in the problem and to deliver solutions with the specifiederror level. even for discontinuous solutions.

ACt<NOWI.EDGEMENTS

The authors Abani Palra and 1. Tinsley Odcn acknowledge support of this work by ARPA underContracl number DABT63-92-C-0042 and ARO under Contract number DAAH04-93-G-OI09.

REFERENCES

\. P. lesaint and P. A. Raviart. 'On a finite element method for solving the ncutron transport cquation·. in C. dc Boor.(cd.). Mathematical Aspects of Finite ElemellCs in Partial Differenrial Equatiolls, Acadcmic Press. New York, 1974,pp. 89-123.

2. C. Johnson and J. Pitkaranta. 'An analysis of the discontinuous Galcrkin method for a scalar hyperbolic equation'.Math. Comput .• 46. 1-26 (1986).

3. K. S. Bey and J. T. adcn. 'hp-version discontinuous Galerkin methods for hyperbolic conservation laws'. Compue.Merhods Appl, Mech. Eng. (a special issue on hp finite elcment methods) to appear.

4. P. G. Cia riel, The Fillite Elelllenr Method for Elliptic Problems, North-Holland, Amsterdam. 1978.5. \. Babuska and M. Suri, 'The hp-vcrsion of the finite elcment method with quasiunifonn meshcs', MilCh. Modeling

Nll/ner. Alia/.. 21,199-238 (1987).6. M. Ainsworth and J. T. Odcn. 'A unified approach to a posteriori crror estimation using element residual methods',

Numerische Mathematik, 65, 23-50 (1993).7. J. T. aden. A. Patra and Y. S. Fcng, 'An hp adaptivc strategy'. in A. K. Noor (cd.), Adaptive, Mllltilevel. and

Hierarchical Compurariollal Strategies. Vol. 157. AMD, ASME. N.Y" 1992, pp. 23-46. .8. J. T. Odcn and A. Patra. 'A parallel adaptive strategy for lip finite element computations'. Compllt. Met/rods Appl.

Mech, Eng .. 121,449-470 (1995).9. L. Demkowicz, J. T. Oden, W. Rachowicz and O. Hardy, Toward a universal/r-p adaptive finite elemcnt strategy.

Part I. Constrained approximation and data structurc'. Compile. Methods Appl. Ml'ch. Ellg .• 77, 79-lt2 (1989).10. R. Sanders, 'On thc convergence of monotone finite difference schcmes with variable spatial differencing', Math.

Compllt., 40, 91-106 (1983).


11. S. A. Vavasis. 'Automatic domain partitioning inthrcc dimensions'. SI AM J. Scielllific Srmisr. Compl/I., 12.950-970(1991).

12. A. Pothen, H. Simon. and K. P. liou, 'Partitioning sparse matrices with eigenvectors of graphs', SIAM J. MatrixAnal. Appl .. 11.430-452 (1990).

13. A. Palra and 1. T. Oden. 'Problem decomposition for adaptive hp finitc elcment methods', Comput. Stmcl. Eng.accepted.

14. G. Peano. 'Sur Une Courbe. que Remplil Toute une Aire Plane', fllarh. AnnCllen. 36, 157-160 (1980).15. D. Hilbert, 'Uber dic Stetige Abbildung einer Linie auf ein F1achenstuck', Marh. Annalen, 38 (1881).16. K. D. Devine. 'An adaptive hp-finite element method with dynamic load balancing for the solution of hyperbolic

conservation laws on massively parallel computers', Ph.f). f)is.~t'rtatioll. Rcnsselacr Polytechnic Institute, Troy.New York, August 1994.

17. K. S. Bcy. 'An hp adaptive discontinuous Galerkin method for hyperbolic conservation laws'. PI/.D. Dissl.'rratitm,University of Texas-Austin, Austin, Texas, May 1994.

hp- version discontinuous galerkin methods for hyperbolic conservation...

Documents