journal 59 sema

79
7/30/2019 Journal 59 Sema http://slidepdf.com/reader/full/journal-59-sema 1/79 S  eMA JOURNAL NUMBER 59 July 2012 contents Weak-renormalized solutions for a system that models non-isothermal solidification, by E. Fern´ andez-Cara, C. Vaz . . . . . . . . . . . . . . . . . . . . . . . . . . . 5  An asymptotic study to explain the role of active transport in models with countercurrent exchangers, by M. Tournus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19  A H ¨ older Continuous Nowhere Improvable Function with Derivative Singular  Distribution , by N. I. Katzourakis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37  A modular lattice Boltzmann solver for GPU computing processors, by M. Astorino, J. Becerra Sagredo, A. Quarteroni . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Upload: m1f2p3

Post on 14-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 1/79

S eMA JOURNAL

NUMBER 59

July 2012

contents

Weak-renormalized solutions for a system that models non-isothermal

solidification, by E. Fernandez-Cara, C. Vaz . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

An asymptotic study to explain the role of active transport in models with

countercurrent exchangers, by M. Tournus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

A H ¨ older Continuous Nowhere Improvable Function with Derivative Singular Distribution, by N. I. Katzourakis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

A modular lattice Boltzmann solver for GPU computing processors, by M.

Astorino, J. Becerra Sagredo, A. Quarteroni . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Page 2: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 2/79

Page 3: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 3/79

Executive Committee of S eMA

President: Pablo Pedregal Tercero

Vicepresident: Rosa Marıa Donat Beneito

Secretary: Julio Moro Carreno

EC Members: Lluis Alseda i Soler, Sergio Amat Plata, Mari Paz Calvo Cabrero,

Francisco Ortegon Gallego, Carlos Pares Madronal, Luis Randez Garcıa, Luis Vega

Gonzalez.

S eMA Journal

Editor-in-Chief: Enrique Fernandez-Cara

Editorial Board: Gregoire Allaire, Carme Calderer, Carlos Conca, Amadeu

Delshams, Martin J. Gander, Francisco Guillen-Gonzalez, Vivette Girault, Arieh

Iserles, Jose M. Mazon, Pablo Pedregal, Ireneo Peral, Benoıt Perthame, Alfio

Quarteroni, Chi-Wang Shu, Daniel B. Szyld, Luis Vega, Enrique Zuazua.

Editorial Staff: Sergio Amat, Carlos Angosto, Sonia Busquier, Marıa J. Moncayo,

Alberto Murillo.

Web page of S eMA

http://www.sema.org.es/

e-mail

[email protected]

Direccion Editorial: Dpto. de Matematica Aplicada y Estadıstica. Univ. Politecnica de Cartagena. Paseo de

Alfonso XIII, 52, 30203 Cartagena (Murcia) Spain. [email protected]

ISSN 1575-9822.

AS-1442-2002.

3

Page 4: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 4/79

Page 5: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 5/79

S eMA Journal

no59(2012), 5–18

WEAK-RENORMALIZED SOLUTIONS FOR A SYSTEM THAT MODELS

NON-ISOTHERMAL SOLIDIFICATION

ENRIQUE FERNANDEZ-CARA∗, CRISTINA VAZ†

∗Departamento EDAN, University of Sevilla, Sevilla, Spain.†Departamento de Matematica, Federal University of Para, PA, Brazil.

[email protected],[email protected]

Abstract

The aim of this paper is to prove the existence of weak-renormalized solutions

to a system of the Navier-Stokes-Boussinesq kind. This system may be regarded

as a modified version of the non-isothermal solidification problem with meltconvection. This task will be accomplished satisfactorily in the two-dimensional

case. Some nontrivial and deep difficulties will be found, however, in three

dimensions in space.

Key words: Parabolic PDEs, renormalized solutions, solidification models, Navier-

Stokes equations.

AMS subject classifications: 35K60, 35Q30, 76D05, 80A22

1 Introduction

This paper deals with the nonlinear system

φt

−ξ 2∆φ + u

·∇φ = φ(φ

−1)(1

−2φ) + θ in Q, (1)

θt − div (κ(φ, θ)∇θ) + u·∇θ = ν (φ, θ)D(u) : D(u) in Q, (2)

ut − div (ν (φ, θ)D(u)) + (u·∇)u + ∇ p = f, div u = 0 in Q, (3)

φ = 0, θ = 0, u = 0 on Σ, (4)

φ(x, 0) = φ0(x), θ(x, 0) = θ0(x), u(x, 0) = u0(x) in Ω, (5)

where Ω ⊂ RN is an open and bounded domain with a C 2 boundary (N = 2or N = 3), T > 0 is given and Q = Ω × (0, T ) denotes a space-time cylinder

with lateral surface Σ = ∂ Ω × (0, T ).

The structure of this system is typical in non-isothermal solidification problems

with melt convection [1, 3, 10]; in this particular context, (1) is called the phase-field

Received: August 31, 2011. Accepted: Juny 5, 2012.

The first author has been partially supported by DGES-MICINN, Grant MTM2010-15592.

5

Page 6: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 6/79

6 E. Fernandez-Cara, C. Vaz

equation and is essentially the same found in [10] with an advection term. The other

equations are standard and straightforward consequences of the usual physical balance

laws (energy, linear momentum and mass).

The unknowns are the phase-field function φ, the temperature θ, the velocity field

u and the hydrostatic pressure p; ξ is a positive constant related to the width of thetransitions layers; κ and ν are strictly positive functions that depend on φ and θ and

must be viewed as a heat diffusion and a kinematic fluid viscosity, respectively; f is an

external field; D(u) is the deformation tensor, i.e.

D(u) =1

2(∇u + ∇uT )

and φ0, θ0 and u0 are given functions.

Throughout this paper, we will denote by C or M generic constants depending

only on known quantities, which will be indicated frequently.

A great deal of attention has been paid to phase-field models for solidification

processes during the last two decades by several authors; see for example [10, 11,

20, 3, 1]. In these works, many situations and many different hypotheses havebeen considered, in special the possibility of motion of molten material during

solidification processes. In our case, the molten material is assumed to behave as an

incompressible fluid with variable viscosity. The resulting system can thus be viewed

as a generalization of the models considered in the previous papers.

We will consider the (simplified) case where the latent heat in the energy equation is

very small and can be neglected. Notice that the equation (2) needs a special treatment

due to the nonlinear right-hand side, that only belongs to L1(Q) since, in general,

D(u) only belongs to L2(Q)N ×N . For this reason, we will consider the notion of

renormalized solutions adapted to our setting.

Renormalized solutions to PDEs were first introduced by DiPerna and P.-

L. Lions [13, 12] in the context of Boltzmann-like equations. Later, they have also

been considered in other situations; let us mention in particular the contributions of

Blanchard, Boccardo, Murat and their co-workers in the framework of second-order

elliptic and parabolic PDEs; see [6, 7, 4, 5] and the references therein; see also [19]

and [8] for more related results.

In order to solve (1)–(5), we will use regularization techniques, truncations,

appropriate estimates and the compactness of approximate solutions.

This paper is organized as follows.

In Section 2, we fix the notation and we introduce some functional spaces. We also

recall certain interpolation and embedding results. We enumerate the hypotheses, we

introduce the concept of weak-renormalized solution adapted to our context and we

state the main result of the paper.

In Section 3, we investigate the solvability of some auxiliary problems.

Section 4 is devoted to present the proof of the existence result for two-dimensional

flows; it is split in three steps, namely, the formulation and resolution of regularized

problems, the obtention of estimates, and the passage to the limit.

Page 7: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 7/79

Renormalized solutions and non-isothermal solidification models 7

2 Preliminaries

2.1 Notation and spaces

For any q

≥1, we denote by Lq(Ω) the standard Lebesgue space with usual norm

denoted by ∥ · ∥q,Ω. For any nonnegative integer m, W m,q(Ω) is the standard Sobolevspace with usual norm denoted by ∥ · ∥m,q,Ω. The space W m,q

0 (Ω) is the closure with

respect to the norm ∥ · ∥m,q,Ω of the space C ∞0 (Ω) of C ∞ functions with compact

support in Ω. We refer for instance to [14] for more details on the previous spaces.

The following result from [21] will be used below: Q

|v|τ dxdt ≤ C ∥v∥ pq/N L∞(0,T ;Lp(Ω))

Q

|∇v|q dxdt, (6)

for every v ∈ Lq(0, T ; W 1,q0 (Ω)) ∩ L∞(0, T ; L p(Ω)) with p, q ≥ 1 and τ =q (N + p)/N .

For the analysis of the motion equation (3), we will need other function spaces.

Thus, let us set

V =

v

∈C ∞0 (Ω)N : div v = 0

; we will denote the closures of

V in L2(Ω)N and H 10 (Ω)N respectively by H and V . Then, H and V are Hilbert spacesfor the corresponding norms and one has

H = v ∈ L2(Ω)N : div v = 0 in Ω, v · n = 0 on ∂ Ω and

V = v ∈ H 10 (Ω)N : div v = 0 in Ω .

The general properties of these spaces can be found for instance in [22].

In the sequel, we will use the following truncation function: for any positive real

number R, we set

T R(s) = s if |s| ≤ R and T R(s) = R sign (s) if |s| > R,

where sign (s) = 0 if s = 0 and sign (s) = s/|s| if s = 0.Since T R is a Lipschitz function, for any function v ∈ W 1,q0 (Ω) one has T R(v) ∈

W 1,q0 (Ω) and the chain rule for the differentiation of T R(v) holds true, that is,

∇T R(v) = T ′R(v)∇v a.e. in Ω.

We will also have to consider the following set:

L(0, T, Ω) = v ∈ L∞(0, T ; L1(Ω)) : T R(v) ∈ L2(0, T ; H 10 (Ω)) ∀R > 0,

limn→+∞

1

n

An(v)

|∇v|2 dxdt = 0.

Here and in the sequel, An(v) stands for the set

An(v) = (x, t) ∈ Q : n ≤ |v(x, t)| ≤ 2n.

We will make use of the following lemma, due to Boccardo and Gallouet (see [7];

see also [18]):

Page 8: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 8/79

8 E. Fernandez-Cara, C. Vaz

Lemma 1 Assume that v ∈ L∞(0, T ; L1(Ω)) , T R(v) ∈ L2(0, T ; H 10 (Ω)) for all

R > 0 and there exists M > 0 such that

∥v∥L∞(0,T ;L1(Ω)) ≤ M and Q

|∇T R(v)|2 dxdt ≤ MR ∀R > 0.

Then, for all 1 < q < (N + 2)/(N + 1) , one has

v ∈ Lq(0, T ; W 1,q0 (Ω)) and ∥v∥Lq(0,T ;W 1,q0 (Ω)) ≤ C (q )M.

2.2 Hypotheses and main result

Along this work, we will assume that the following hypotheses hold:

(H)

f ∈ L2(Q)N , φ0 ∈ L2(Ω), u0 ∈ H, θ0 ∈ L1(Ω),ν, κ ∈ C 0(R×R), 0 < ν 1 ≤ ν ≤ ν 2 and 0 < κ1 ≤ κ ≤ κ2.

We introduce now the definition of weak-renormalized solution to (1)–(5):

Definition 1 It will be said that (φ,θ,u) is a (weak-renormalized) solution to (1) – (5)

if the following conditions are satisfied:

1. u ∈ L∞(0, T ; H ) ∩ L2(0, T ; V ) , φ ∈ L∞(0, T ; L2(Ω)) ∩ L2(0, T ; H 10 (Ω)) ∩L4(Q) and θ ∈ L(0, T, Ω).

2. φ solves (1) in the usual weak sense and φ|t=0 = φ0.

3. u solves (3) in the usual weak sense (together with some p ∈ D′(Q)) and

u|t=0 = u0.

4. For any β ∈ W 2,∞(R) such that Supp β ′ is compact and for any η ∈C 1([0, T ]; H 10 (Ω)) ∩ L∞(Q) such that η|t=T = 0 , we have

Q

β (θ) ηt dxdt +

Q

κ(φ, θ)∇β (θ) · ∇ηdxdt

+

Q

κ(φ, θ)∇θ · ∇β ′(θ) ηdxdt −

Q

(u · ∇β ′(θ)) ηdxdt

=

Q

β ′(θ)ν (φ, θ)D(u) : D(u) ηdxdt +

Ω

β (θ0) η(x, 0) dx.

(7)

We can now state our main result in this paper:

Theorem 2 Assume that N = 2 and (H) holds. Then, there exists at least one

solution to (1) – (5).

Page 9: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 9/79

Renormalized solutions and non-isothermal solidification models 9

3 Some auxiliary problems

In order to prove theorem 2, it is convenient to first consider and solve some auxiliary

problems.

Let

ρϵ

be a regularizing sequence in RN . For any ϵ > 0 and any v∈

H , we will

denote by Rϵv the following function:

Rϵv := ρϵ ∗ v.

Here, v is the extension by zero of v to the whole RN .

Recall that Rϵv ∈ C ∞(RN )N , ∇ · (Rϵv) = 0 in Ω and we have in particular

∥Rϵv∥m,q,Ω ≤ C (m,p,ϵ)∥v∥2,Ω

for all m and q .The first auxiliary problem is the following:

ut − div(m(x, t)D(u)) + ((Rϵu)·∇)u + ∇ p = f, div u = 0 in Q, (8)

u = 0 on Σ, (9)

u(x, 0) = u0(x) in Ω. (10)

Here, we assume thatf ∈ L2(Q)N , u0 ∈ H,m ∈ L∞(Q), 0 < ν 1 ≤ m(x, t) ≤ ν 2 a.e.

(11)

The existence and uniqueness of a weak solution to (8)–(10) can be proved via

a Galerkin method for instance like in [17] or [22] for the classical Navier-Stokes

equations. In that way, the following is obtained:

Proposition 3 Let the assumptions (11) be satisfied. Then there exists exactly one

solution to (8) – (10) , with

u ∈ L2(0, T ; V ) ∩ C 0([0, T ]; H ), ut ∈ L2(0, T ; V ′).

Furthermore, one has ∥u∥L2(0,T ;V ) + ∥u∥L∞(0,T ;H ) + ∥ut∥Lσ(0,T ;V ′) ≤ C,∥ut∥L2(0,T ;V ′) ≤ C (ϵ),

where σ = 2 if N = 2 and σ = 4/3 if N = 3 and C (resp. C (ϵ)) depends on Ω , T ,∥f ∥L2(Q) , ∥u0∥H , ν 1 and ν 2 (resp. these data and ϵ).

Next, we consider a second auxiliary problem, closely related to the phase-field

equation in our original system:

φt − ξ 2∆φ + u · ∇φ = φ(φ − 1)(1 − 2 φ) + h in Q, (12)

φ = 0 on Σ, (13)

φ(x, 0) = φ0(x) in Ω, (14)

Page 10: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 10/79

10 E. Fernandez-Cara, C. Vaz

where

h ∈ L1(0, T ; L2(Ω)), u ∈ L2(0, T ; V ) ∩ L∞(0, T ; H ), φ0 ∈ L2(Ω). (15)

The following result can also be proved by a Galerkin-compactness method.

Proposition 4 Let the assumptions (15) be satisfied. Then, there exists a unique

solution to (12)–(14) , withφ ∈ L2(0, T ; H 10 (Ω)) ∩ C 0w([0, T ]; L2(Ω)) ∩ L4(Q),φt ∈ L1(0, T ; L2(Ω)) + Lσ(0, T ; H −1(Ω))

(16)

and the norms in these spaces bounded by a constant that only depends on Ω , T ,∥h∥L1(0,T ;L2(Ω)) , ∥u∥L2(0,T ;V ) + ∥u∥L∞(0,T ;H ) and ∥φ0∥L2 . If N = 2 , one also has

φ ∈ C 0([0, T ]; L2(Ω)).

Sketch of the proof: Let us first explain how the existence of φ can be established.

Let us denote by φm : [0, T m) → H 10 (Ω) the approximations that can be obtained

from a standard Galerkin scheme where the basis functions are the eigenfunctions of the Dirichlet Laplacian in Ω. In principle, φm is only locally defined, i.e. we can have

T m < T .By setting H (z) := z(z − 1)(2z − 1) ≡ 2z3 − 3z2 + z, it is clear that

1

2

d

dt∥φm∥2

2,Ω + ξ 2∥∇φm∥22,Ω +

Ω

H (φm)φm dx = (h, φm)2,Ω

for all 0 ≤ t < T m. Since H (z)z ≡ 2z4 − 3z3 + z2, we have H (z)z ≥ z4 − C for all

z. After integration in time, we get the following in [0, T m):

∥φm(t)∥22,Ω + ξ 2

t0

∥∇φm(s)∥22,Ω ds +

t0

∥φm(s)∥44,Ω ds

≤ ∥φ0∥22,Ω + 12∥h∥2L1(0,T ;L2(Ω)) + 1

2sup

[0,T m)∥φm(s)∥22,Ω + C,

where C only depends on Ω and T . Consequently, T m = T , the φm are globally

defined and, furthermore,

φm ∈ bounded set in L2(0, T ; H 10 (Ω)) ∩ L∞(0, T ; L2(Ω)) ∩ L4(Q). (17)

On the other hand, since φmt (t) can be written as the orthogonal projection

of (ξ 2∆φm − u · ∇φm − H (φm) + h)(t) on the space spanned by the first meigenfunctions, one has

∥φmt ∥−1,2,Ω ≤ ∥ξ 2∆φm − u · ∇φm − H (φm) + h∥−1,2,Ω

≤ C (∥∇φ∥2,Ω + ∥u · ∇φ

m

∥−1,2,Ω + ∥φ∥3

4,Ω + ∥h∥2,Ω)

in (0, T ), whence it is easy to deduce that

φmt − h ∈ bounded set in Lσ(0, T ; H −1(Ω)). (18)

Page 11: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 11/79

Renormalized solutions and non-isothermal solidification models 11

From (17) and (18), using standard arguments, we can extract a sequence that

converges to a solution to (12)–(14) and satisfies (16).

The uniqueness of φ can be obtained by applying Gronwall’s lemma.

More precisely, let us assume that (for instance) N = 3, let φ1 and φ2 be two

solutions to (12)–(14) satisfying (16) and let us set φ := φ1

− φ2

. Notice that(H (z1) − H (z2)) (z1 − z2)

= 2(z31 − z3

2)(z1 − z2) − 3(z21 − z2

2 )(z1 − z2) + (z1 − z2)

≥ −3(z21 − z2

2)(z1 − z2)

= −3(z1 + z2)(z1 − z2)2

for all z1, z2 ∈ R. Then, one has:

1

2

d

dt∥φ∥2

2,Ω + ξ 2∥∇φ∥22,Ω = −

Ω

(H (φ1) − H (φ2)) φ dx

≤ 3

Ω (

|φ1| + |φ2|)

|φ|2 dx

≤ C (∥φ1∥3,Ω + ∥φ2∥3,Ω

) ∥φ∥2,Ω ∥φ∥6,Ω

≤ C (∥φ1∥2

3,Ω + ∥φ2∥23,Ω

) ∥φ∥22,Ω +

ξ 2

2∥∇φ∥2

2,Ω

in (0, T ). Since ∥φi∥23,Ω ≤ C ∥φi∥2,Ω∥∇φi∥2,Ω, we see that, for some F ∈ L2(0, T ),

one hasd

dt∥φi∥2

2,Ω + ξ 2∥∇φi∥22,Ω ≤ F (t)∥φi∥2

2,Ω.

These inequalities, together with Gronwall’s lemma, imply φ ≡ 0, whence we

necessarily have φ1 ≡ φ2.

4 Proof of theorem 2

4.1 An auxiliary regularized problem

We begin by introducing some notation. Thus, for any ϵ > 0, we set:

(i) θ0ϵ = T 1/ϵ(θ0).

(ii) gϵ = T 1/ϵ(ν (φϵ, θϵ)D(uϵ) : D(uϵ)).

We will begin the proof with no restriction on the dimension (N = 2 or N = 3).

We consider the following regularized version of (1)–(5):

φϵ,t − ξ 2∆φϵ + uϵ · ∇φϵ = φϵ(φϵ − 1)(1 − 2φϵ) + θϵ in Q, (19)

φϵ = 0 on Σ, φϵ(x, 0) = φ0(x) in Ω, (20)

θϵ,t − div (κ(φϵ, θϵ)∇θϵ) + uϵ · ∇θϵ = gϵ in Q, (21)θϵ = 0 on Σ, θϵ(x, 0) = θ0ϵ(x) in Ω, (22)

uϵ,t − div (ν (φϵ, θϵ)D(uϵ)) + ((Rϵuϵ) · ∇)uϵ + ∇ pϵ = f, div uϵ = 0 in Q, (23)

uϵ = 0 on Σ, uϵ(x, 0) = u0(x) in Ω. (24)

Page 12: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 12/79

12 E. Fernandez-Cara, C. Vaz

We then have the following existence result:

Proposition 5 Let the assumptions (H) be fulfilled. Then, for each ϵ > 0 , there exists

at least one solution (φϵ, θϵ, uϵ) to (19)–(24) , with

φϵ, θϵ ∈ L

2

(0, T ; H 10 (Ω)) ∩ C

0w([0, T ]; L

2

(Ω)),uϵ ∈ L2(0, T ; V ) ∩ C 0w([0, T ]; H ).

Proof: The proof can be obtained from a standard application of Schauder’s or Leray-

Schauder’s fixed point theorem.

Let us consider the mapping Λϵ that associates to each (φ, θ) ∈ L1(Q)2, first, the

unique solution uϵ to (23)–(24) with ν (φϵ, θϵ) replaced by ν (φ, θ); then, the unique

solution θϵ to (21)–(22) with gϵ replaced by T 1/ϵ(ν (φ, θ)D(uϵ) : D(uϵ)) and κ(φϵ, θϵ)replaced by κ(φ, θ); finally, the unique solution φϵ to (19)–(20).

In view of the results in Sections 2 and 3, Λϵ : L1(Q)2 → L1(Q)2 is well-defined.

Furthermore, it is continuous. Indeed, let (φ, θ) and the (φn, θn) be given in L1(Q)2,

let us set

(φ, θ) = Λϵ(φ, θ), (φn, θn

) = Λϵ(φn, θn)

and let us assume that (φn, θn) → (φ, θ) strongly in L1(Q)2. Then

1. ν (φn, θn) → ν (φ, θ) strongly in all the spaces L p(Q)2 with 1 ≤ p < +∞,

2. The associated un converge strongly in L2(0, T ; V ),

3. θn → θ and θ

n

t → θt resp. strongly in L2(0, T ; H 10 (Ω)) and strongly

in L2(0, T ; H −1(Ω)) and

4. Finally, φn → φ and φnt → φt resp. strongly in L2(0, T ; H 10 (Ω)) and strongly

in L2(0, T ; H −1(Ω)).

The first of these assertions is evident. The third and the fourth assertions are

consequences of the usual energy estimates for parabolic equations. The second one

can be justified as follows.First, from the estimates in proposition 3, it is clear that un converges weakly

in L2(0, T ; V ) to u, the solution associated to (φ, θ). Secondly, taking into account

that ut and the unt belong to L2(0, T ; V ′), we find the energy identities

1

2∥u(T )∥2

2,Ω +

Q

ν (φ, θ) |D(u)|2 dxdt =

Q

f · udxdt +1

2∥u0∥2

2,Ω

and

1

2∥un(T )∥2

2,Ω +

Q

ν (φn, θn) |D(un)|2 dxdt =

Q

f · un dxdt +1

2∥u0∥2

2,Ω

for all n ≥ 1. Consequently,

limn→+∞

[12∥un(T )∥2

2,Ω +

Q

ν (φn, θn) |D(un)|2 dxdt]

=1

2∥u(T )∥2

2,Ω +

Q

ν (φ, θ) |D(u)|2 dxdt.

Page 13: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 13/79

Renormalized solutions and non-isothermal solidification models 13

But this yields the strong convergence of (un(T ), ν (φn, θn)1/2 D(un)) in the

product space H × L2(Q)N ×N . Since ν (φn, θn) converges a.e. and is uniformly

bounded from above and from below, we deduce that D(un) also converges strongly

in L2(Q)N ×N . In view of Korn’s inequality, this is equivalent to the strong

convergence of ∇u

n

in the same space, that is, the strong convergence of u

n

in L2(0, T ; V ).

Notice that Λϵ maps the whole space L1(Q)2 into a compact set.

Indeed, let us set

W = ϕ ∈ L2(0, T ; H 10 (Ω)) : ϕt ∈ L2(0, T ; H −1(Ω)) .

Recall that, endowed with its natural norm, W is a Hilbert space such that the

embedding W → L1(Q) is compact. Let (φ, θ) be given in L1(Q)2 and let us set

(φ, θ) = Λϵ(φ, θ). Then, the assumptions on ν and κ in (H) and the fact that ϵ is fixed

yield uniform bounds of the norms of φ and θ in W . But this means that (φ, θ) belongs

to a fixed compact set of L1(Q)2.

Consequently, we can apply Schauder’s theorem to Λϵ and deduce that this

mapping possesses at least one fixed point.This provides a solution to (19)–(24) and ends the proof.

4.2 Some a priori estimates

We will now deduce some a priori estimates for the solutions to (19)–(24), uniform

with respect to ϵ.

To this end, we start by applying proposition 3 to (23)–(24) and we obtain:

∥uϵ∥L2(0,T ;V ) + ∥uϵ∥L∞(0,T ;H )+∥uϵ,t∥Lσ(0,T ;V ′) ≤ C. (25)

A first consequence is that the gϵ = T 1/ϵ(ν (φϵ, θϵ)D(uϵ) : D(uϵ)) are uniformly

bounded in L1(Q).

In view of the results in [2], the following estimates hold for θϵ:θϵ ∈ bounded set in L∞(0, T ; L1(Ω)) ∩ L1(0, T ; L p(Ω))for all 1 < p < +∞ if N = 2 and for all 1 < p < 3 if N = 3.

(26)

Furthermore, arguing as in [6], we see that there exists M such that Q

|∇T R(θϵ)|2 dxdt ≤ M R and1

n

n≤|θϵ|≤2n

|∇θϵ|2 dxdt ≤ M (27)

for all R > 0 and n ≥ 1.

From lemma 1, we get:

θϵ

∈bounded set in Lq(0, T ; W 1,q0 (Ω))

∀1 < q <

N + 2

N + 1

. (28)

Combining (26), (28) and the embedding result (6), we deduce that

θϵ ∈ bounded set in Lτ (Q) ∀ 1 < τ <N + 2

N . (29)

Page 14: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 14/79

14 E. Fernandez-Cara, C. Vaz

On the other hand, from the PDE satisfied by θϵ, the fact that gϵ is uniformly

bounded in L1(Q), (25) and (26), one has:

θϵ,t ∈ bounded set in L1(0, T ; W −1,a(Ω)) ∀ 1 < a < a, (30)

where a = 4/3 if N = 2 and a = 6/5 if N = 3.Indeed, from the usual interpolation results, it is clear from (26) that

θϵ ∈ bounded set in Lr(0, T ; Ls(Ω)) ∀ 1 < r < +∞, ∀1 < s < s(r),

where s(r) = r/(r − 1) if N = 2 and s(r) = 3r/(3r − 2) if N = 3. On the other

hand, (25) implies

uϵ ∈ bounded set in Lρ(0, T ; Lσ(Ω)) ∀ 2 < ρ < +∞, ∀1 < σ < σ(ρ),

where σ(ρ) = 2ρ/(ρ − 2) if N = 2 and σ(ρ) = 6ρ/(3ρ − 4) if N = 3. Consequently,

we see from Holder’s inequality that

uϵ · ∇θϵ = ∇ · (θϵuϵ) ∈bounded set in

L

1

(0, T ; W

−1,a

(Ω)) ∀ 1 < a < a,(31)

where a = 2 if N = 2 and a = 6/5 if N = 3.

We also have

∇ · (κ(φϵ, θϵ)∇θϵ) ∈ bounded set in Lq(0, T ; W −1,q(Ω)) ∀ 1 < q <N + 2

N + 1(32)

and

L1(Ω) → W −1,a(Ω) ∀ 1 < a <N

N − 1,

whence

gϵ ∈ bounded set in L1(0, T ; W −1,a(Ω)) ∀ 1 < a <N

N −

1. (33)

Taking into account (31), (32) and (33) together, we find (30).

Furthermore, we can use proposition 4 with h = θϵ, since we have (26). This

yields: φϵ ∈ bounded set in L2(0, T ; H 10 (Ω)) ∩ L∞(0, T ; L2(Ω)),φϵ,t − θϵ ∈ bounded set in Lσ(0, T ; H −1(Ω)).

(34)

Some consequences of these estimates are the following:

• φϵ ∈ compact set in L2(Q)

(because one has (34) and (29) and the embedding H 10 (Ω) → L2(Ω) is

compact).

• θϵ ∈ compact set in Lq(0, T ; Lb(Ω)) ∀ 1 < q < N +2N +1

, 1 < b < NqN −q

(because one has (28) and (30) and the embedding W 1,q0 (Ω) → Lb(Ω) is

compact).

Page 15: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 15/79

Renormalized solutions and non-isothermal solidification models 15

• uϵ ∈ compact set in L2(0, T ; H )

(because one has (25) and the embedding V → H is also compact).

Therefore, at least for a subsequence, we have:

φϵ → φ weakly in L2(0, T ; H 10 (Ω)), strongly in L2(Q) and a.e., (35)

θϵ→θ weakly in Lq(0,T ; W 1,q0 (Ω)), strongly in Lq(0,T ; Lb(Ω)) and a.e., (36)

uϵ → u weakly in L2(0, T ; V ), strongly in L2(Q)N and a.e., (37)

φϵ,t → φt weakly in Lσ(0, T ; H −1(Ω)), (38)

uϵ,t → ut weakly in Lσ(0, T ; V ′). (39)

for all 1 < q < (N + 2)/(N + 1) and 1 < b < Nq/(N − q ).

4.3 Passage to the limit and conclusions

The convergence properties (35)–(39) are enough to prove that we can pass to the limit

in the equations and initial conditions satisfied by uϵ and φϵ. This is well known.We will now show that θ solves (2) in the renormalized sense. In fact, it is just here

where we have to begin to assume that N = 2.

Since N = 2, we have u ∈ L2(0, T ; V ′) and, therefore,

1

2∥u(t2)∥2

2,Ω − 1

2∥u(t1)∥2

2,Ω +

t2t1

Ω

ν (φ, θ)|D(u)|2 dxdt =

t2t1

Ω

f · udxdt

for all t1, t2 ∈ [0, T ].

One of the delicate points of the argument is to prove that D(uϵ) → D(u) strongly

in L2(Q)2×2. To this purpose, we argue as in the proof of proposition 5 (but now

letting ϵ → 0+).

We first notice thatuϵ(T ) → u(T ) weakly in H and

ν (φϵ, θϵ)1/2D(uϵ) → ν (φ, θ)1/2D(u) weakly in L2(Q)2×2.(40)

Then, we multiply the regularized motion equation (23) by uϵ and we integrate over

Ω × (0, T ). Using Green’s formula, the fact that div uϵ ≡ 0 and Holder’s and Young’s

inequalities, we deduce that

1

2∥uϵ(T )∥2

2,Ω +

Q

ν (φϵ, θϵ) |D(uϵ)|2 dxdt =

Q

f · uϵ dxdt +1

2∥u0∥2

2,Ω.

From (35), we get

limϵ→0

[12∥uϵ(T )∥2

2,Ω +

Q

ν (φϵ, θϵ) |D(uϵ)|2 dxdt]

=

Q

f · udxdt +1

2∥u0∥2

2,Ω.

Page 16: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 16/79

16 E. Fernandez-Cara, C. Vaz

On the other hand, u is a solution of (3), whence

1

2∥u(T )∥2

2,Ω +

Q

ν (φ, θ) |D(u)|2 dxdt =

Q

f · udxdt +1

2∥u0∥2

2,Ω

and

limϵ→0

[12∥uϵ(T )∥2

2,Ω +

Q

ν (φϵ, θϵ) |D(uϵ)|2 dxdt

]

=1

2∥u(T )∥2

2,Ω +

Q

ν (φ, θ) |D(u)|2 dxdt.

(41)

From (40), (41) and the a.e. convergence of φϵ and θϵ, the desired strong convergence

of D(uϵ) is ensured.

A consequence is that

gϵ→ ν (φ, θ)D(u) : D(u) strongly in L1(Q). (42)

Now, it can be shown that θϵ is a Cauchy sequence in C 0([0, T ]; L1(Ω)) and,

moreover,

limϵ→0+

Q

(T − t) |ν (φϵ, θϵ)∇T R(θϵ) − ν (φ, θ)∇T R(θ)|2 dxdt = 0

for every R > 0. In particular, T R(θϵ) converges strongly to T R(θ) in

L2(0, T ′; H 10 (Ω)) for every R > 0 and every T ′ < T . All this is implied by (26),

(27) and (42), but is not immediate; For more details, we refer for instance to [18,

Appendix E].

This shows that there exists a subsequence, still indexed with ϵ, such that we have

the following for any β ∈ W 2,∞(R) such that Supp β ′ ⊂ [−R, R]:

θϵ → θ and β (θϵ) → β (θ) weakly in Lq(0, T ; W 1,q0 (Ω)) ∩ Lτ (Q), (43)

T R(θϵ) → T R(θ) strongly in L2(0, T ; H 10 (Ω)). (44)

Furthermore, by multiplying (21) by β ′(θϵ), we also see that

β (θϵ)t − k∆β (θϵ) + div(uϵβ (θϵ)) + k β ′′(θϵ)|∇θϵ|2 = β ′(θϵ)gϵ in Q. (45)

Let us multiply (45) by a test function η ∈ C 1([0, T ]; H 10 (Ω)) ∩ L∞(Q) such

that η(t) = 0 in a neighborhood of T and let us integrate over Q. After some usual

integrations by parts, using (22) and observing the properties of η, we get:

Q

β (θϵ) ηt dxdt +

Q

κ(φϵ, θϵ)∇β (θϵ) · ∇ηdxdt

+

Q

κ(φϵ, θϵ)∇θϵ · ∇β ′(θϵ) ηdxdt+

Q

(uϵ · ∇β (θϵ)) ηdxdt

=

Q

β ′(θϵ)gϵ ηdxdt +

Ω

β (θ0) η(x, 0) dx.

Thanks to (42) and (43)–(44), we can take ϵ → 0 in this identity. This gives (7)

for functions η of this kind. By a standard dnsity argument, we deduce (7) for all

η ∈ C 1([0, T ]; H 10 (Ω)) ∩ L∞(Q) with η|t=T = 0.

This ends the proof of theorem 2.

Page 17: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 17/79

Renormalized solutions and non-isothermal solidification models 17

Remark 1 The existence of weak-renormalized solutions to other related systems has

been established in other papers; see for instance [?] and [2]; see also [15] for the case

of a viscous, compressible and heat conducting fluid.

Remark 2 If we neglect convection and we omit the transport term (u· ∇

)u in the

motion equation (3), the argument used in the proof of theorem 2 remains valid for

N = 3. On the other hand, the uniqueness of the weak-renormalized solution to (1)–

(5) is unknown even when N = 2 and the coefficients ν and κ are constant.

Remark 3 It is readily seen that the previous proof of theorem 2 does not work in

the case N = 3. Indeed, the strong convergence in L2(Q)3×3 of the gradients

of the approximate velocity fields is out of scope; in fact, this is a major difficulty

even for similar approximations to the Navier-Stokes equations. Unfortunately, we

do need this convergence to take limits in the equation for θϵ if we are looking

for a weak-renormalized solution in the sense of definition 1. Hence, in the three-

dimensional case, it seems appropriate to reformulate the problem, perhaps in terms

of other variables; see [16] for some partial results for three-dimensional flows; see

also [9], where a three-dimensional problem close to (1)–(5) with Fourier-Navier (slip)conditions on u has been solved satisfactorily.

References

[1] D.M Anderson, A phase-field model of solidification with convection, Phys. D

135, 175–194.

[2] A. Attaoui, Weak-renormalized solution for a nonlinear boussinesq system,

Differ. Integral Equations 22 (5–6), 465–494.

[3] C. Beckermann, Modeling melt convection in phase-field simulations of

solidification, J. Comp. Phys. 154, 468.

[4] D. Blanchard, Existence and uniqueness of a renormalized solution for a fairly

general class of nonlinear parabolic problems, J. Differ. Equatios 177 (2), 331–

374.

[5] D. Blanchard, A few result on coupled systems of thermomechanics, In “On the

notions of solution to nonlinear elliptic problems: results and developments”

Vol. 23, Quaderni di Matematica, 145–182.

[6] D. Blanchard, Truncations and monotonicity methods for parabolic equations,

Nonlinear. Analys., Theory, Methods & Applications 21, No. 10, 725–743.

[7] L. Boccardo, Nonlinear elliptic and parabolic equations involving measure data ,

J. Func. Analysis 87, 149–169.

[8] N. Bruyere, Existence et unicit e de la solutions faible-renormalis´ e pour un

syst eme non lin´ aire de boussinesq, C. R. Math. Acad. Sci. Paris 346, No. 9–10,

521–526.

Page 18: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 18/79

18 E. Fernandez-Cara, C. Vaz

[9] M. Bulıcek, A navier-stokes-fourier system for incompressible fluids with

temperature dependent material coefficients, Nonlinear Anal. Real World Appl.

10, no. 2, 992–1015.

[10] G. Caginalp, An analysis of phase field model of a free boundary, Arch. Rat.

Mech. Anal. 92, 205–245.

[11] G. Caginalp, A derivation and analysis of phase field models of thermal alloy,

Annals of Phys 237, 66–107.

[12] R. J. DiPerna, On the cauchy problem for boltzmann equations:, Ann. Math. 130,

321–366.

[13] R. J. DiPerna, On the fokker-plank-boltzmann equation, Comm. Math. Phys. 120,

1–23.

[14] L. C. Evans, Partial differential equations, vol. 19, American Mathematical

Society, Paris.

[15] E. Feireisl, On the motion of a viscous, compressible, and heat conducting fluid ,Indiana Univ. Math. J. 53, no. 6, 1705–1738.

[16] J. Frehse, Renormalized estimates for solutions to the navier-stokes equation ,

Funct. Approx. Comment. Math. 40, part 1, 11–32.

[17] J.-L. Lions, Quelques m ´ ethodes de r´ esolution de problemes aux limites non

lin´ eaires, Dunod, Gauthiers-Villars, Paris.

[18] P.-L. Lions, Mathematical topics in fluid mechanics:, Clarendon Press Oxford 1.

[19] M.T.G. Montesinos, Renormalized solutions to a nonlinear parabolic-elliptic

system, SIAM J. Math. Anal. 36, No. 6, 1991–2003.

[20] C. Morosanu, A generalized phase-field system, J. Math. Anal. Appl. 273, 515–540.

[21] L. Nirenberg, On elliptic partial differential equation, Ann. Scuola Norm. Sup.

13, 116–162.

[22] R. Temam, Navier-stokes equations:, American Mathematical Society.

Page 19: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 19/79

S eMA Journal

no59(2012), 19–35

AN ASYMPTOTIC STUDY TO EXPLAIN THE ROLE OF ACTIVETRANSPORT IN MODELS WITH COUNTERCURRENT EXCHANGERS

MAGALI TOURNUS

UPMC Univ Paris 06, CNRS UMR 7598, Laboratoire Jacques-Louis Lions, F-75005, Paris and

INSERM UMRS872, and CNRS ERL7226, Laboratoire de genomique, physiologie et

physiopathologie renales, Centre de Recherche des Cordeliers, 75006, Paris, France

[email protected]

Abstract

We study a solute concentrating mechanism that can be represented bycoupled transport equations with specific boundary conditions. Our motivation for

considering this system is urine concentrating mechanism in nephrons. The model

consists in 3 tubes arranged in a countercurrent manner. Our equations describe

a countercurrent exchanger, with a parameter V which quantifies the active

transport. In order to understand the role of active transport in the mechanism,

we consider the limit V −→ ∞. We prove that when V goes to infinity, the

system converges to a profile which stays uniformly bounded in V and which

presents a boundary layer at the border of the domain. The effect is that the

solute is concentrated at a specific point in the tubes. When considering urine

concentration, this is physilogically optimal because the composition of final urine

is determined at this point.

Mathematics Subject Classification : 34B18, 34E 99, 92C 30

Key-words : Countercurrent, active transport, asymptotic analysis, boundary layer,urine concentration, kidney physiology.

1 Introduction

Many problems occurring in biology or physiology [14, 9] can be described by

transport equations with different propagation velocities that combine together to

produce specific effects. For instance, typical cases are propagation of particules

waves in neurones [5], or diphasic propagation arising in chemical engineering, which

can describe chromatography or distillation [7]. Usually, the stationary states are of

interest and their study constitute a classical field of analysis (see for instance [13] and

the references therein).

Received: November 9, 2011. Accepted: May 31, 2012.

19

Page 20: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 20/79

20 M. Tournus

Here, we study a model of a countercurrent exchanger combined with an active

transport pump [6]. Countercurrent exchanges accross parallel tubes can be used for

building up concentration or heat gradient. Our equations come from the modelisation

of kidney nephrons, in which a concentration gradient is amplified by an active

transport pump [16], which plays a fundamental role for urine concentration [11]. Inthis particular study, we investigate the effects of active transport using a limiting case.

I

Tube 1 Tube 2 Tube 3

Fluid flow

Solute movements across the wall

x=0

x=L

Active pump

G H

Figure 1: Representation of the 3-tube architecture in which the fluid circulates.

Tubes are water-impermeable but can exchange solutes with the bath. C 10 , C 20 and

C 3(L) = C 2(L)

The model consists in a fluid circulating at a constant velocity in 3 tubes arranged

in a countercurrent architecture. The 3 tubes are bathing in a common bath in which

no solute can accumulate. Each tube can exchange solute with the bath and solute

transport accross tubes wall is driven by diffusion in all tubes and by an active pump in

tube 3. This active pump extracts solute from tube 3 and carries it into the bath and is

assumed to follow Michaelis-Menten kinetics. We call V the maximum rate achieved

by the pump at saturating concentrations. We call C i(x) the solute concentration in

tube i at depth x. The nonlinearity V C 3V

1 + C 3V represents the effect of active pumps

along tube 3. The fluid enters tube 1 with a concentration value C 10 and tube 2 with

a concentration value C 20 . The outlet of tubes 1 and 3 are open at x = L and we

have C 2(L) = C 3(L). See Figure 1 for a drawing of the system. The stationary

state is of particular interest in renal physiology, given that the kidney acts to preserve

homeostasis.

Page 21: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 21/79

Asymptotic study to the role of active transport with countercurrent exchangers 21

The differential system satisfied by C 1, C 2, C 3 is written as

dC 1(x)

dx=

1

3C 1(x) + C 2(x) + C 3(x) + V

C 3(x)

1 + C 3(x)− C 1(x),

dC 2(x)dx

= 13

C 1(x) + C 2(x) + C 3(x) + V C 3(x)

1 + C 3(x)

− C 2(x),

−dC 3(x)

dx=

1

3

C 1(x) + C 2(x) + C 3(x) + V

C 3(x)

1 + C 3(x)

− C 3(x)

− V C 3(x)

1 + C 3(x),

C 1(0) = C 10 , C 2(0) = C 20 , C 3(L) = C 2(L).

(1)

The specific boundary conditions relate the solutions at different points and make this

system not a mere ordinary differential equation. We call C V = (C 1V , C 2V , C 3V ) the

solution of (1). We wish to explain through the analysis of the system for large values of

V , why this combination of active pump and boundary conditions (tube arrangement) is

performing well the task of concentrating the solute at x = L, where the composition

of final urine is determined. We already know [17] that each C iV is continuous and

nonnegative on [0, L]. The question we want to answer is : How do the solutions of (1)

behave when V tends to ∞?

Other asymptotic studies have been done for similar systems in the context of

hyperbolic relaxation where a parameter is assumed to be small in comparison to the

typical size of the problem [12, 8, 7]. This approach comes from the concept of mean

free path in Boltzmann equation [3]. For example, in [5], the length of the domain is

large, and they establish the asymtotic behavior of the solution in the limiting case of

an infinite domain.

In our case, for answering our question, we prove that C V converges toward a limit

C = (C 1, C 2, C 3) that we calculate. Our analysis uses only direct a priori estimatesand weak limits obtained by compact injections which do not use the specific smooth

form of the non-linearity and makes it very general. We identify completely the limit

as V −→ ∞ including boundary layers. Compact injections give us the convergence

of some particular subsequences, but as we point out that the limit only depends on the

problem data, we are able to prove that the whole sequence converges. The boundary

layers are coming from the particular boundary conditions in the model, which can be

seen as reflection conditions and make the problem specific and interesting.

We state our main results in next section. Section 3 and 4 are devoted to the proofs

of the asymptotic results. Numerical illustrations are given in section 5.

2 Main results

It is possible to identify completely the profiles of the limiting values for the solutions

C 1, C 2, C 3 almost everywhere as V −→ ∞. This is stated in the

Page 22: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 22/79

22 M. Tournus

Theorem 1 (Asymptotics) Solutions to (1) satisfy

C 1V −→V −→+∞

C 1, C 2V −→V −→+∞

C 2, C 3V −→V −→+∞

C 3, L p(1 ≤ p < ∞), a.e.,

(2)

with

C 1(x) =C 10 + C 20

2+

C 10 − C 202

e−x, C 2(x) =C 10 + C 20

2+

C 20 − C 102

e−x,

C 3(x) = 0 a.e.

(3)

This result is somewhat sharp since we will see that a boundary layer occurs and thus

the convergence does not hold in L∞. To state our next result, we need to define the

quantity

M = ess inf 1

C 3V (x); x ∈ [0, L], V ∈ R+. (4)

We prove in the next section that M > 0. The second result is more accurate and states

that C 3 decreases exponentially fast to zero. We describe also the boundary layer that

appears at x = L.

Theorem 2 (The boundary layer) The limits of the boundary values are

C 1V (L) −→V −→+∞

C 10 + C 20 ,

C 2V (L) = C 3V (L) −→V −→+∞

C 10 + C 20 + (C 20 − C 10 )e−L,(5)

The behavior of C 3V for x ≃ L is given by the inequalities

C 3V (x) ≤ C 3V (L)exp

− 2

3V M (L − x)

+

K

V

1 − exp

− 2

3V M (L − x)

, (6)

C 3V (x) ≥ C 3V (L)exp− 2

3V (L − x)

+ K

V

1 − exp

− 23

V (L − x)

, (7)

where K and K are two constants which do not depend on V .

The next section is dedicated to the proof of these results.

3 Proof of the asymptotic results(Theorem 1)

First step: Uniform bounds on the solution.

Lemma 3 There is a constant K depending only on C 10 , C 20 but not on V such that

C 1V (L) ≤ K, C 3V (0) ≤ C 10 + C 20 , (8) L

0

C iV (x)dx ≤ K ; V

L

0

C 3V (x)dx ≤ K ;

L

0

|dC 3V dx

(x)|dx ≤ K ; 0 ≤ C iV ≤ K.

(9)

Page 23: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 23/79

Asymptotic study to the role of active transport with countercurrent exchangers 23

Proof . To prove (8), we sum the three lines of (1), and we obtain a quantity which

does not depend on x,

C 1V (x) + C 2V (x) − C 3V (x) =: K (V ). (10)

Using the boundary values, we find uniform bounds on K (V )

K (V ) = C 10 + C 20 − C 3V (0) ≤ C 10 + C 20 ,

K (V ) = C 1V (L) + C 2V (L) − C 3V (L) = C 1V (L) ≥ 0,(11)

and thus

0 ≤ K (V ) ≤ C 10 + C 20 . (12)

The combination of (11) and (12) proves (8).

Then, we prove the first two bounds in (9). The first equation can be written

dC 1V (x)

dx+ C 1V (x) = QV (x) ≥ 0, (13)

with

QV (x) =1

3

C 1(x) + C 2(x) + C 3(x) + V

C 3(x)

1 + C 3(x)

.

Therefore we also haved

dx

C 1V (x)ex

= QV (x)ex.

By integration over [0, L], we obtain L0

QV (x)dx ≤ L

0

QV (x)exdx = C 1V (L)eL − C 10 ≤ (C 10 + C 20 )eL. (14)

We conclude that L

0

V C 3V (x)1 + C 3V (x)

dx,

L

0

C iV (x)dx, i = 1, 2, 3, (15)

are uniformly bounded by (C 10 + C 20 )eL. Then, by injecting equation (14) in (13) and

because the C iV are positive, we have L0

|dC iV dx

(s)|ds ≤ (C 10

+ C 20

)eL i = 1, 2, 3. (16)

We finally prove that the functions

C iV (x)V

are uniformly bounded in V . We write

|C iV (x)| = |C iV (0) +

x0

dC iV dx

(s)ds| ≤ |C iV (0)| +

L0

|dC iV dx

(s)|ds.

Thanks to (16) and (8), we conclude

Page 24: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 24/79

24 M. Tournus

∥C iV ∥∞ ≤ (C 10 + C 20 )(eL + 1). (17)

The upper bound (17) on C

3

V gives us that M > 0.Second step: The behaviour of C iV when V −→ ∞Lemma 4 After extraction of a subsequence,

C 1V −→V −→+∞

C 1, C 2V −→V −→+∞

C 2, C 3V −→V −→+∞

0, L p(1 ≤ p < ∞), a.e.,

(18)

and C 1 + C 2 = K 0 for some constant K 0.

Proof . We know from Lemma 3 that

C iV

V

is bounded in BV , then, using the

Rellich-Kondrachov compact injection [4]

C

i

V −→V −→+∞ C

i

,in

L

p

(1 ≤ p < ∞)and

a.e.(19)

On the other hand, we have thanks to (15), L0

C 3V (x)

1 + C 3V (x)dx −→

V −→+∞0,

and thus

C 3 ≡ 0 a.e. (20)

Combining (10) with (20), we have C 1 + C 2 = K 0 for some constant K 0.

Third step : The behavior of dC 3V

dx. We define M1[0, L] the set of Radon measures

on [0, L], taken with the weak convergence of measures.

Lemma 5 There exists a nonnegative constant B such that, after extraction,

C 3V (L) −→V −→+∞

B,dC 3V

dx−→

V −→+∞Bδ x=L in M1[0, L],

Proof . The information (16) implies thatdC 3V

dx

V

is bounded in L1[0, L], then

[2] there exists µ ∈ M1[0, L] a Radon measure so that, after extraction,

dC 3V dx

−→V −→+∞

µ in the sense of measures. (21)

For all functions ϕ ∈ C 1[0, L] such as ϕ(0) = ϕ(L) = 0, we have using (18)

L0

ϕ(x)dC 3V

dx(x)dx =

L0

C 3V (x)dϕ

dx(x)dx −→

V −→+∞0, (22)

Page 25: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 25/79

Asymptotic study to the role of active transport with countercurrent exchangers 25

which means,

µ = 0 on ]0, L[. (23)

Therefore, we can write in the sense of measures

dC 3V dx

−→V −→+∞

βδ x=L + αδ x=0. (24)

It remains to compute α and β . To do so, we notice that

C 3V (L)V

and

C 3V (0)V

are both real value bounded sequences, so, there are two nonnegative real numbers

A, B such that, after extraction,

limV →∞

C 3V (L) = B ≥ 0, limV →∞

C 3V (0) = A ≥ 0. (25)

For ϕ ∈ C 1

[0, L]

, we compute

L

0

ϕ(x) dC 3V dx

(x)dx =

C 3V (L)ϕ(L) − C 3V (0)ϕ(0) − L

0

C 3V (x)dϕ

dx(x)dx −→

V −→+∞Bϕ(L) − Aϕ(0),

which means

dC 3V dx

−→V −→+∞

Bδ x=L − Aδ x=0 in the sense of measures. (26)

We still have to prove A = 0. To do so we use the system of equations (1) which

gives us dC 3V dx

≥ −1

3(C 1V + C 2V ).

As we know from Lemma 3 that C iV is uniformly bounded from above by K , we also

have,

dC 3V dx

≥ −2

3K,

which implies A = 0.

Fourth step: The limiting equation.

Lemma 6 In the limit V −→ ∞ , we have

C 1(x) + C 2(x) = C 10 + C 20 , V C 3V

1 + C 3V − C 1V + C 2V

2−→ 3

2Bδ L

in the sense of measures.

Page 26: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 26/79

26 M. Tournus

Proof . We deduce from (19) and (26), by injecting in the third line of (1) that

V C 3V

1 + C 3V − C 1V + C 2V

2− 3

2Bδ L −→ 0 in the sense of measures. (27)

Reinjecting in the first lines of (1), we find the limit equations on C 1 et C 2

dC 1

dx= −1

2C 1 +

1

2C 2 +

1

2Bδ L,

dC 2

dx= −1

2C 2 +

1

2C 1 +

1

2Bδ L,

C 1(0) = C 10 , C 2(0) = C 20 .

(28)

Then, summing the two lines,

d(C 2 + C 1)

dx = Bδ L,

(C 1 + C 2)(0) = C 10 + C 20 .

(29)

By integrating this differential equation, we deduce [15] that

C 1(x) + C 2(x) = C 10 + C 20 a.e.

Indeed, the weak formulation of (29) is

∀ϕ ∈ C1[0, L],

L0

dx(x)[C 1 + C 2](x)dx + ϕ(0)[C 10 + C 20 ] = 0. (30)

By choosing ϕ such as ϕ(0) = 0, we obtain C 1 + C 2 ≡ α a.e., for some constant α.and then, by choosing any ϕ ∈ C1[0, L], we have that α = C 10 + C 20 .

The limit equation on C i then becomes

dC i

dx(x) = −C i(x) +

C 10 + C 202

+1

2Bδ L(x), i = 1, 2,

C i(0) = C i0.

Fifth step: Explicit solution for the limit Using the variation of parameters, we

compute easily C 1 and C 2. We find

C 1(x) = C 10 + C 202

+ C 10 − C 202

e−x, C 2(x) = C 10 + C 202

+ C 20 − C 102

e−x. (31)

In particular, (C 1, C 2, C 3) are C∞ functions.

Page 27: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 27/79

Asymptotic study to the role of active transport with countercurrent exchangers 27

4 Proof of theorem 2

The limiting profiles are C∞ in [0, L], nevertheless, the Dirac mass at x = L indicates

a boundary layer. The derivatives of the profiles for V = ∞ are given by

dC 1

dx=

12

(C 20 − C 10 )e−x + Bδ L

,

dC 2

dx=

1

2

(C 10 − C 20 )e−x + Bδ L

, in the sense of measures,

dC 3

dx= Bδ L.

(32)

First Step: The limiting values of C at x = L

Lemma 7

C 1V (L) −→V −→+∞

C 10 + C 20 ,

C 2V (L) = C

3V (L) −→V −→+∞ C

10 + C

20 + (C

20 − C

10 )e

−L

.

(33)

Proof . We already have defined in Lemma 5

B = limV →∞

C 2V (L) = limV →∞

C 3V (L).

We know that the C iV (L) are bounded real numbers, then we define

B′ = limV →∞

C 1V (L).

Our first task is to determine B. We compute for all ϕ ∈ C1[0, L],

L0

ϕ(x)dC 2V (x)

dx(x)dx = ϕ(L)C 2V (L) − ϕ(0)C 2V (0) −

L0

ϕ

dx(x)C 2V (x)dx

which converges when V −→ +∞ toward

Bϕ(L) − C 20 ϕ(0) − L

0

ϕ

dx(x)[

C 10 + C 202

+C 20 − C 10

2e−x]dx

= Bϕ(L) − C 20 ϕ(0) − C 10 + C 202

ϕ(L) +C 10 + C 20

2ϕ(0)

−C 20 − C 10

2e−Lϕ(L)

−ϕ(0) + L

0

e−xϕ(x)= Bϕ(L) − C 10 + C 20

2ϕ(L) +

C 20 − C 102

e−Lϕ(L) +C 20 − C 10

2

L0

e−xϕ(x)dx.

(34)

Page 28: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 28/79

28 M. Tournus

On the other hand, thanks to (32)

L0

ϕ(x)dC 2V (x)

dx(x)dx −→

V −→+∞

C 20 − C 102

L0

e−xϕ(x)dx +B

2ϕ(L). (35)

By equalizing (34) and (35), we find

B = C 10 + C 20 + (C 20 − C 10 )e−L, (36)

which is the unique limit of C 2V (L) and C 3V (L). In particular, B > 0. Our second task

is to obtain B’. We perform the same computation for C 1V . On the one hand, for all

ϕ ∈ C1[0, L],

L0

ϕ(x)dC 1V (x)

dx(x)dx = ϕ(L)C 1V (L) − ϕ(0)C 1V (0) −

L0

ϕ

dx(x)C 1V (x)dx

which converges when V −→ +∞ toward

B′ϕ(L) − C 10 + C 202

ϕ(L) +C 10 − C 20

2e−Lϕ(L) +

C 10 − C 202

L

0

e−xϕ(x)dx, (37)

and on the other hand, L0

ϕ(x)dC 1V (x)

dx(x)dx

converges toward

C 20 − C 102

L0

e−xϕ(x)dx +B

2ϕ(L). (38)

This gives us

B′ = C 10 + C 20 , (39)

and ends the proof of Lemma 7.

We proved in passing that the limits of the subsequences we deal with are only

determined by the problem data and do not depend on the subsequence we choose.

Thus, the whole sequences converge.

Second step: The boundary layer.

Lemma 8 For all x ∈ [0, L] the inequalities hold

C 3V (x) ≤ C 3V (L)exp

− 2

3V M (L − x)

+

K

V

1 − exp

− 2

3V M (L − x)

, (40)

C 3V (x) ≥ C 3V (L)exp− 2

3V (L − x)

+ K

V

1 − exp

− 23

V (L − x)

, (41)

which means that C 3V (x) relaxes exponentially fast with V to zero, away from the

boundary layer at X = L.

Page 29: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 29/79

Asymptotic study to the role of active transport with countercurrent exchangers 29

Proof . We can write the third line of (1) as

−dC 3V (x)

dx+

2

3V

C 3V (x)

1 + C 3V (x)=

1

3

C 1(x) + C 2(x) − 2C 3(x)

. (42)

We multiply this equation by the exponential factor

F (x) = exp

− 2

3V

xL

1

1 + C 3V (s)ds

, (43)

−dC 3V (x)

dxF (x)+

2

3V

C 3V (x)

1 + C 3V (x)F (x) =

1

3

C 1V (x)+C 2V (x)−2C 3V (x)

F (x), (44)

and we obtain

d

dx

C 3V (x)F (x)

= −1

3

C 1V (x) + C 2V (x) − 2C 3V (x)

F (x).

Integrating this equation between L and x, we find,

C 3V (x)F (x)

−C 3V (L) =

x

L

1

3

C 1V (x) + C 2V (x) − 2C 3V (x)

exp

− 2

3V

u

L

1

1 + C 3V (s)ds

du.

By Lemma 3, we have,

C 3V (x)exp

− 2

3V

xL

1

1 + C 3V (s)ds

C 3V (L) + K ′ Lx

exp

− 2

3V

uL

1

1 + C 3V (s)ds

du.

We can now complete our calculation. We estimate

C 3V (x) ≤ C 3V (L)exp− 2

3V

L

x

1

1 + C 3V

(s)ds

+ K ′ L

x

exp

− 2

3V

uL

1

1 + C 3V (s)ds

exp

− 2

3V

Lx

1

1 + C 3V (s)ds

du

= C 3V (L)exp

− 2

3V

Lx

1

1 + C 3V (s)ds

+ K ′ Lx

exp

− 2

3V

ux

1

1 + C 3V (s)ds

du

≤ C 3V (L)exp

− 2

3V

Lx

1

1 + C 3V (s)ds

+ K ′ Lx

exp

− 2

3V M (u − x)

du

= C 3V (L)exp− 2

3V

L

x

1

1 + C 3

V

(s)ds +

K

V 1 − exp− 2

3V M (L − x).

where in the last equality we have used that 0 < M <1

1 + C 3V and (40) is proved.

We can prove in the same way the second part of Lemma 8.

Page 30: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 30/79

30 M. Tournus

5 Numerical results

5.1 The numerical algorithm

Numerical simulations illustrate the solute concentration mechanism at x = L that

is proved in the theoretical result. The system (1) can be seen as the stationary stateassociated with the following dynamical problem

∂C 1

∂t(x, t) +

∂C 1

∂x(x, t) =

1

3

C 1(x, t) + C 2(x, t) + C 3(x, t) + V

C 3(x, t)

1 + C 3(x, t)

− C 1(x, t),

∂C 2

∂t(x, t) +

∂C 2

∂x(x, t) =

1

3

C 1(x, t) + C 2(x, t) + C 3(x, t) + V

C 3(x, t)

1 + C 3(x, t)

− C 2(x, t),

∂C 3

∂t(x, t) − ∂C

3

∂x(x, t) =

1

3

C 1(x, t) + C 2(x, t) + C 3(x, t) + V

C 3(x, t)

1 + C 3(x, t)

− C 3(x, t)

− V C 3(x, t)

1 + C 3(x, t),

C 1(0, t) = C 10 , C 2(0, t) = C 20 , C 3(L, t) = C 2(L, t),

(45)

which we complete with nonnegative initial concentrations C 1(x, 0), C 2(x, 0),

C 3(x, 0) in BV [0, L]. We proved in [17] that for every nonnegative initial condition,

the system relaxes in L1 toward the unique solution to the stationary system (1),

which is written, by denoting here C = (C 1, C 2, C 3) the solution to (45) andC = (C 1, C 2, C 3) the solution to (1),

∥C (x, t) − C (x)∥L1 t→∞

0. (46)

To obtain this result, we prove that this is the case when the initial condition of

(45) is a sub- or super-solution to (1), then we remark that every initial condition inBV [0, L]

3can be stuck between a sub and a super-solution and we conclude with

a comparison principle and with an argument of monotony. In [17], we developp an

algorithm to solve (45). This algorithm is based on a finite volume type method [1, 10].

We use a time step ∆t and a mesh of size ∆x = L/N with N the number of cells

Qk

= (xk−1/2

, xk+1/2

) (that means x1/2

= 0 and xN +1/2

= L). The discrete times

are denoted by tn = n∆t. We discretize the initial states as

C j,0k =1

∆x

Qk

C j(x, 0)dx, i = 1, 2, 3, k = 1,...,N. (47)

Page 31: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 31/79

Asymptotic study to the role of active transport with countercurrent exchangers 31

We call C j,nk the discrete solution at time tn in tube i that approximates equation

(45), for k ∈ [0, N ]. We use the scheme

C

1,n+1

k = C

1,n

k −∆t

∆x (C

1,n

k − C

1,n

k−1)

+ ∆t1

3

C 1,nk + C 2,nk + C 3,nk + V

C 3,nk

1 + C 3,nk

− C 1,nk

,

C 2,n+1k = C 2,nk − ∆t

∆x(C 2,nk − C 2,nk−1)

+ ∆t1

3

C 1,nk + C 2,nk + C 3,nk + V

C 3,nk

1 + C 3,nk

− C 2,nk

,

C 3,n+1k = C 3,nk +

∆t

∆x(C 3,nk+1 − C 3,nk )

+ ∆t1

3C 1,nk + C 2,nk + C 3,nk + V C 3,nk

1 + C 3,n

k− C 3,nk − V

C 3,nk

1 + C 3,n

k.

(48)

For boundary conditions, at each time we choose: C 1,n0 = C 10 , C 2,n0 = C 20 ,

C 3,nN +1 = C 2,nN .

Because this is an explicit scheme, departing from (47), we obtain directly the

solution C 1,n+1k at time tn+1 from that at time tn.

Thus, we can reach the solution to (1) by iterating this scheme for n large enough.

The stability condition which ensures the positivity of the scheme is given by

∆t

≤3∆x

3 + 2∆x(1 + V )

, (49)

and detailed in [17]. This CFL condition becomes a tough constraint on ∆t when we

choose V large. The constraint on ∆x also depends on V because it is function of the

size of the boundary layer. For each V , to be accurate enough around the boundary

layer point L, we discretize the space in N V cells and we validate a posteriori that

this number of cells is high enough since we know from the analytical solution the

behaviour of the solution for large values of V .

5.2 Concentration profiles for different V

We present in figure 2 concentration profiles for V = 1, V = 10, V = 50 and

V = 100.

Page 32: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 32/79

32 M. Tournus

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

3.5

4

Vm=1

C1

C2

C3

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

3.5

4

Vm=10

C1

C2

C3

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

3.5

4

Vm=50

C1

C2

C3

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

3.5

4

Vm=100

C1

C2

C3

Figure 2: Concentration profiles for V = 1, 10, 50, 100, on a domain of length L = 4.

When the value of the rate V has the same order of magnitude as the parametersof (1), concentration is hardly increasing in tubes 2 and 3, but decreasing in tube 1. It

comes from the fact that we chose C 10 > C 20 , but it would have been the contrary in the

opposite case. With low values of V , the difusive part of the system (1) is paramount

and concentrations tend to homogenize along the tubes. If we increase the pump rate

by a factor 10, the concentration tends to approach zero in tube 3 and is abruptly

increasing from L = 3 and it achieves at L = 4 a value greater than max(C 10 , C 20 ). We

clearly observe the limit profiles and the boundary layer appear for V ≥ 50.

Illustration of Theorem 2. We want to illustrate that the bounds from above and from

below found in Theorem 2 give an accurate descrition of the qualitative behavior of

C 3.

Page 33: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 33/79

Asymptotic study to the role of active transport with countercurrent exchangers 33

1.99 1.992 1.994 1.996 1.998 2

x 104

0

0.5

1

1.5

2

2.5

3

3.5

4

C1

C2

C3

Figure 3: Zoom on the interval [0.99L, L] of figure 2 for V = 1000.

1.99 1.991 1.992 1.993 1.994 1.995 1.996 1.997 1.998 1.999 2

x 104

0

0.5

1

1.5

2

2.5

3

3.5

4

C3

f

g

Figure 4: The curve in the middle represents C 31000 on [0.99L, L]. The upper curve

represents the upperbound f for C 3 found in Lemma 8 and the the lower curve

represents g, the bound from below.

Figure 3 displays a zoom of the numerical aproximation of C 31000(L) that we

will denote C 3num,1000(L). This approximative value enables us to define M num =1

C 3num,1000(L). To illustrate Lemma 8, we define

f (x) = C 3num,1000(L)exp

− 2 × 1000

3M num(L − x)

and

g(x) = C 3num,1000(L)exp

− 2

3× 1000(L − x)

.

In Figure 4, we depict C 31000 on the interval [99L

100 , L] and the profiles of the two

functions f and g which control C 31000. We observe that g ≤ C 3num,1000 ≤ f , as

expected in theorem 2, and then that the component C 3V decreases exponentially to

zero.

Page 34: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 34/79

34 M. Tournus

6 Conclusion

Motivated by renal flows, we have studied a concentration mechanism with an active

pump characterized by a parameter V . As expected, for V large enough, a large axial

solute concentration gradient appears in all tubes. The result of our analysis is that

the concentrations are uniformly bounded in V for all x ∈ [0, L], and so are their

derivatives, except at x = L. In the limit V = ∞, the concentration gradient converges

to a Dirac profile at x = L. We obtain a limit concentration profile in all tubes which

presents a boundary layer at x = L. In the urine concentrating model, we are mostly

interested on the behaviour at x = L, because it is at this depth that the composition of

final urine is determined. Therefore, our analysis explains why active transport plays

a very specific role, which is to increase solute concentration at x = L and only at

x = L.

Acknowledgements

I thank Benoıt Perthame for numerous helpful discussions.

Funding for this study was provided by the program EMERGENCE (EME 0918)

of the Universite Pierre et Marie Curie (Paris Univ. 6).

References

[1] F. Bouchut. Non linear stability of finite volume methods for hyperbolic

conservation laws and well balanced schemes for sources. Birkhauser-Verlag,

2004.

[2] H. Brezis. Functional Analysis. Collection of Applied Mathematics for the

Master’s Degree. Masson, Paris, 1983. Theory and applications.

[3] C. Cercignani. The Boltzmann equation and its applications, volume 67 of Applied Mathematical Sciences. Springer-Verlag, New York, 1988.

[4] C. M. Dafermos. Hyperbolic conservation laws in continuum physics,

volume 325 of Grundlehren der Mathematischen Wissenschaften [Fundamental

Principles of Mathematical Sciences]. Springer-Verlag, Berlin, third edition,

2010.

[5] A. Friedman and G. Craciun. A model of intracellular transport of particles in an

axon. J. Math. Biol., 51(2):217–246, 2005.

[6] B. Hargitay and W. Kuhn. The Multiplication Principle as the Basis for

Concentrating Urine in the Kidney(with comments by Bart Hargitay and S.

Randall Thomas). J. Am. Soc. Nephrol., 12(7):1566–1586, 2001.

[7] Francois James. Convergence results for some conservation laws with a reflux

boundary condition and a relaxation term arising in chemical engineering. SIAM

J. Math. Anal., 29(5):1200–1223 (electronic), 1998.

Page 35: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 35/79

Asymptotic study to the role of active transport with countercurrent exchangers 35

[8] M. A. Katsoulakis and A. E. Tzavaras. Contractive relaxation systems and the

scalar multidimensional conservation law. Comm. Partial Differential Equations,

22(1-2):195–233, 1997.

[9] J. Keener and J. Sneyd. Mathematical physiology. Vol. II: Systems physiology,

volume 8/ of Interdisciplinary Applied Mathematics. Springer, New York, second

edition, 2009.

[10] R. J. LeVeque. Finite volume methods for hyperbolic problems. Cambridge

University Press, 2002.

[11] L. C. Moore and D. J. Marsh. How descending limb of henle’s loop permeability

affects hypertonic urine formation. Am.J.Physiol, (239):F57–F71, 1980.

[12] R. Natalini. Convergence to equilibrium for the relaxation approximations of

conservation laws. Comm. Pure Appl. Math., 49(8):795–823, 1996.

[13] K. Pakdaman, B. Perthame, and D. Salort. Dynamics of a structured neuron

population. Nonlinearity, 23(1):55–75, 2010.

[14] B. Perthame. Transport equations in biology. Frontiers in Mathematics.

Birkhauser Verlag, Basel, 2007.

[15] L. Schwartz. Th´ eorie des distributions. Hermann, Paris, 1966.

[16] J. L. Stephenson. Urinary concentration and dilution: Models. Oxford University

Press, New-York, 1992.

[17] M. Tournus, A. Edwards, N. Seguin, and B. Perthame. Analysis of a simplified

model of the urine concentration mechanism. Preprint , 2011.

Page 36: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 36/79

Page 37: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 37/79

S eMA Journal

no59(2012), 37–51

A HOLDER CONTINUOUS NOWHERE IMPROVABLE FUNCTION WITH

DERIVATIVE SINGULAR

DISTRIBUTION

NIKOLAOS I. KATZOURAKIS

BCAM - Basque Center for Applied Mathematics, Alameda de Mazarredo 14, E-48009,

Bilbao, Spain

[email protected]

Abstract

We present a class of functions K in C 0(R) which is variant of the Knopp

class of nowhere differentiable functions. We derive estimates which establish

K ⊆ C 0,α

(R) for 0 < α < 1 but no K ∈ K is pointwise anywhere improvableto C 0,β for any β > α. In particular, all K ’s are nowhere differentiable

with derivatives singular distributions. K furnishes explicit realizations of the

functional analytic result of Berezhnoi [7].

Recently, the author and simulteously others laid the foundations of

Vector-Valued Calculus of Variations in L∞ [17, 18, 19], of L∞-Extremal

Quasiconformal maps [11, 20] and of Optimal Lipschitz Extensions of maps [26].

The “Euler-Lagrange PDE” of Calculus of Variations in L∞ is the nonlinear

nondivergence form Aronsson PDE with as special case the ∞-Laplacian.

Using K, we construct singular solutions for these PDEs. In the scalar case,

we partially answered the open C 1 regularity problem of Viscosity Solutions to

Aronsson’s PDE [16]. In the vector case, the solutions can not be rigorously

interpreted by existing PDE theories and justify our new theory of Contact

solutions for fully nonlinear systems [21]. Validity of arguments of our new theory

and failure of classical approaches both rely on the properties of K.

Key words: Nowhere differentiable continuous functions, Distributional derivatives,

H older continuity, singular PDE solutions, Calculus of Variations in L∞ , Aronsson PDE System,

∞-Laplacian, Viscosity Solutions

1 Introduction

Let α ∈ (0, 1) and ν ∈ N be fixed parameters. We define the continuous function

K α,ν : R −→ [0, 1] by

K α,ν (x) :=∞

k=0

2−2ανkϕ(22νkx), (1)

Received: August 31, 2011. Accepted: June 5, 2012.

37

Page 38: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 38/79

38 N. I. Katzourakis

where ϕ is a sawtooth function, given by ϕ(x) := |x| when x ∈ [−1, 1] and extended

on R as a periodic function by setting ϕ(x + 2) := ϕ(x). Explicitly,

ϕ(x) =+∞

i=−∞

x − 2iχ(i−1,i+1](x). (2)

Formulas (1), (2) introduce a parametric family in the space of Holder continuous

functions C 0,α(R) which are not differentiable at any point of R. The first examples

of nowhere differentiable continuous functions given by Weierstrass, Bolzano and

Cellerier have been followed by numerous functions well behaved with respect to

continuity but very singular with respect to differentiability. Our example of K α,ν is a variant of the Knopp function [22] (see also [6] and [12]) and relates directly

to several other examples existing in the literature, for example the Takagi-Van der

Waerden functions, as well as the McCarthy function [24].

Our explicit class of functions gives a simple realization of the abstract functional

analytic result of Berezhnoi [7], who proved that every infinite-dimensional Banach

space of functions which enjoy some degree of regularity, contains an infinite-

dimensional closed subspace of functions “nowhere improvable”, namely not smoother

than the least smooth function in the space.

It is worth-noting that examples of continuous nowhere differentiable functions still

attract mathematical interest. Recently, Allart and Kawamura [1] characterized the sets

at which “improper” infinite derivatives exist for the Takagi function, while Lewis [23]

studies probabilistic aspects of the Katsuura function. For a thorough historical review

and an extended list of references, we refer to Thim [27].

Herein we derive suitable precise estimates which establish that K α,ν is in theHolder space C 0,α(R) for all ν ∈ N, but if ν is sufficiently large (2ν > 1/(1 − α))

the function is at no point improvable to a H older continuous C 0,β function for any

β ∈ (α, 1]. A function f ∈ C 0(R) is called H older continuous C 0,β at x ∈ R if there

exist r,C > 0 such that

|f (y) − f (x)| ≤ C |y − x|β , (3)

for all y ∈ [x−r, x + r]. As a consequence, for β = 1 we deduce that K α,ν is nowhere

differentiable since the pointwise derivative does not exist anywhere.

Since BV functions are differentiable almost everywhere, the distributional

derivative DK α,ν in D′(R) is a singular first order distribution and can not be realized

by a signed measure. In particular, for any α, β ∈ (0, 1), β > α and ν large enough,

the C 0,α-function K α,ν is neither a BV nor a C 0,β function no matter how “close” to

Page 39: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 39/79

Holder Continuous Nowhere Improvable Function with Derivative... 39

the Lipschitz space C 0,1(R) it might be.

Fig. 1: Simulation of K α,ν over (0, 2) with ν = 2 , α = 1/2 (50 terms, Mathematica)

Fig. 2: Simulation of K α,ν over (0, 2) with ν = 4 , α = 1/8 (50 terms, Mathematica)

Fig. 3: Simulation of K α,ν over (0, 2) with ν = 4 , α = 5/8 (50 terms, Mathematica)

The necessity to construct pathological functions which bear the specific properties of

K α,ν originates from the theory of nonlinear partial differential equations, especially

regularity theory of degenerate 2nd order elliptic partial differential equations and

systems including the celebrated ∞-Laplacian

∆∞u := Du ⊗ Du : D2u = 0, u : Rn −→ R (4)

(that is ∆∞u = Diu Dju D2iju, with the summation convention employed), as well as

the more general Aronsson equation

A∞u := H p(Du) ⊗ H p(Du) : D2u = 0, (5)

for a Hamiltonian H ∈ C 1(Rn) and H p( p) := DH ( p). The Aronsson PDE is the

“Euler-Lagrange PDE” of Calculus of Variations in the space L∞ for the supremal

Page 40: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 40/79

40 N. I. Katzourakis

functional

E ∞(u, Ω) := ess supΩ

H (Du), u ∈ W 1,∞(Ω). (6)

The celebrated ∞-Laplacian corresponds to the model functional ∥Du∥L∞(Ω) when

we chose as Hamiltonian H the Euclidean norm. The name “∞-Laplacian” originatesin its first derivation in the limit of the p-Laplacian

∆ pu := Div(|Du| p−2Du) = 0 (7)

as p → ∞ by Aronsson. The p-Laplacian is the Euler-Lagrange PDE of the p-Dirichlet

functional

E p(u, Ω) :=

Ω

|Du| p, u ∈ W 1,p(Ω). (8)

When passing to the limit p → ∞, divergence structure is lost and, unlike ∆ p, the

operator ∆∞ is quasilinear but not in nondivergence form. Hence, standard weak

and distributional solution approaches of modern PDE theory do not work. In [3]

and [4] Aronsson constructed singular solutions to ∆∞u = 0, while the general C 1

regularity problem related to ∆∞ and A∞ is still open, except for the dimension n = 2([25, 14, 15, 28, 13]). In the Author’s work [16], by employing the function of this

paper we gave a partial negative answer to this conjecture by showing that there exist

Hamiltonians for which the Aronsson PDE admits non-C 1 solutions.

The general vector case of the ∞-Laplacian for maps is much more intricate and its

study started only recently in [17, 18, 19, 20], where the foundations of Vector-Valued

Calculus of Variations in L∞ and its “Euler-Lagrange PDE system” have been laid.

Related simultaneous results appeared also in [11, 26]. Capogna and Raich in [11]

simultaneously but independently used as Hamiltonian the so-called trace distortion

|Du|n/ det(Du) defined on local diffeomorphisms and developed a parallel to the

Author’s approach for Extremal L∞-Quasiconformal maps. Also, Sheffield and Smart

in [26], developed the related subject of Vector-Valued Lipschitz Extensions by using

as Hamiltonian the operator norm∥

.∥

on the space RN

⊗Rn of gradients Du of maps

u : Rn −→ RN .

For general maps u : Rn −→ RN , the ∞-Laplacian in the vector case reads

∆∞u := Du ⊗ Du : D2u + |Du|2[Du]⊥∆u = 0. (9)

Here [Du(x)]⊥ is the projection on the nullspace N (Du(x)⊤) of the (transpose of the

gradient) operator Du(x)⊤ : RN −→ Rn. In index form reads

Diuα Djuβ D2ijuβ + |Du|2[Du]⊥αβD2

iiuβ = 0 (10)

and was first derived in [17]. The general Aronsson PDE system corresponding to a

rank-one convex Hamiltonian H ∈ C 2(RN ⊗ Rn) is

A∞u :=

H P ⊗ H P + H [H P ]⊥H PP

(Du) : D2u = 0. (11)

Here [H P (Du(x))]⊥ is the projection on the nullspace N (H P (Du(x))⊤) of the

operator H P (Du(x))⊤ : RN −→ Rn. For details we refer to Section 3 and [17].

Page 41: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 41/79

Holder Continuous Nowhere Improvable Function with Derivative... 41

The vector case of (9) and (11) is extremely difficult. The main reason is the

existence of singular “solutions” constructed by means of the functions in this paper

which show that under the current state-of-art in PDE theory, such systems can not be

studied rigorously and we can not even interpret appropriately their singular solutions.

It is a similar problem to that of interpretation of the Dirac δ in Quantum Mechanicsbefore measure theory.

Moreover, (9) and (11) are nonlinear, nonmonotone and in nondivergence form

and have discontinuous coefficients even for C ∞ solutions: the normal projections

[Du]⊥ is discontinuous when the rank of Du changes. This is a genuinely vectorial

phenomenon and happens because there exist smooth ∞-Harmonic maps whose rank

of the gradient is not constant: such an example is given by

u(x, y) := eix − eiy , u : |x ± y| < π ⊆ R2 −→ R

2. (12)

Indeed, (12) is ∞-Harmonic on the rhombus and has rk(Du) = 1 on the diagonal

x = y, but has rk(Du) = 2 otherwise and the projection [Du]⊥ is discontinuous

(for more details see [17]).

3 projections on R3 of the graph of u(x, y) = eix − eiy, its range on R2 and its “covering sheets”.

In general, ∞-Harmonic maps present a phase separation, with a certain hierarchy.

On each phase the dimension of the tangent space is constant and these phases

are separated by interfaces whereon the rank of Du “jumps” and [Du]⊥ gets

discontinuous.

The related problems of the scalar case were unsolved for some years and were

finally settled in the ’90s with the advent of Viscosity Solutions. However, viscosity

solutions apply only to scalar PDEs and monotone PDE systems. In [21] we have

intoduced an appropriate systematic theory which applies to fully nonlinear PDE

systems and in particular allows to study (9) and (11) rigorously and effectively. This

theory extends Viscosity Solutions in the vector case of systems and is based on the

discovery of an extremality principle which applies to maps. Contact Solutions bear

stability properties similar to their scalar counterparts of Viscosity Solutions and this

feature renders them extremely efficient when trying to prove existence results via

approximation.

Page 42: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 42/79

42 N. I. Katzourakis

This paper is organized as follows: in Section 2 we present the basic material

about our singular class of functions and in Section 3 we present some material related

to singular PDE solutions.

2 The Singular Function K .

The following Theorem lists the properties of K α,ν .

Theorem 1 (i) The function K α,ν is in C 0,α(R) for all ν ∈ N. Moreover, we have

the uniform bound 0 ≤ K α,ν ≤ 1/(1 − 2−2να) and if x, y ∈ R with |x − y| ≤ 2 , we

have the estimate K α,ν (x) − K α,ν (y) ≤ C (α, ν ) |x − y|α, (13)

where

C (α, ν ) :=1

1 − 2−2ν (1−α)+

2

22ν (α−1) − 2−2ν . (14)

(ii) If α ∈ (0, 1) and 2ν > 1/(1 − α) , then K α,ν is pointwisely nowhere improvable toC 0,β on R for any β ∈ (α, 1]. Moreover, for any x ∈ R , m ∈ N , we haveK α,ν (x + tm(x)) − K α,ν (x)

|tm(x)|β ≥ K (m,ν,α,β ), (15)

where

K (m,ν,α,β ) :=2β−1(22ν (1−α) − 2)

22ν (1−α) − 1

(22ν (β−α)

)m(16)

and tm : R −→ ±2−2νm−1 is the step function given by

tm(x) : = 2−2νm−1+∞

i=−∞

χ(i,i+ 12

](22νmx) −χ(i+ 1

2,i+1](22νmx). (17)

.

As noted earlier, nowhere differentiable functions have genuine distributional

derivatives, since, BV functions must necessarily be differentiable almost everywhere.

As a corollary of the previous theorem, we provide a lower bound on the total variation

of the difference quotients which establishes this fact without employing the fine

properties of BV functions.

Proposition 2 If α ∈ (0, 1) and 2ν > 1/(1 − α) , for any M ≥ 1 , m ∈ N , we have the

following lower bounds in L1loc(R) for the difference quotients

1

2M +M

−M

K α,ν (x + 2−2νm−1)

−K α,ν (x)

2−2νm−1 dx ≥

1

4 K (m,ν,α, 1). (18)

The previously obtained estimates readily imply the following

Page 43: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 43/79

Holder Continuous Nowhere Improvable Function with Derivative... 43

Corollary 3 For any α ∈ (0, 1) and 2ν > 1/(1 − α) with ν ∈ N , x ∈ R , β ∈ (α, 1]and M ≥ 1 , we have

limsupt→0

K α,ν (x + t) − K α,ν (x)

|t|β

= +∞. (19)

Hence, the C 0,α function K α,ν is nowhere improvable to C 0,β . In particular, if β = 1then the function is pointwisely nowhere differentiable on R. Also,

limsupt→0

1

2M

+M

−M

K α,ν (x + t) − K α,ν (x)

t

dx = +∞. (20)

Hence, the difference quotients are unbounded in L1loc(R) and the distributional

derivative of K α,ν does not exist as a signed measure.

The first part of Corollary 3 is immediate, while the second follows by estimate (18)

and application of the folklore fact that an L1loc(R) function is of Bounded Variation if

and only if the difference quotients converge weakly∗ in the sense of measures.

Proof of Theorem 4. (i) We begin by observing that (2) implies |ϕ| ≤ 1 and hence the

bound

0 ≤ K α,ν ≤∞k=0

(2−2αν

)k=

1

1 − 2−2να. (21)

Let now p, q ∈ N with p < q and x ∈ R. Again by (2), we haveq

k=0

2−2ανkϕ(22νkx) − pk=0

2−2ανkϕ(22νkx)

≤q

k= p+1

(2−2αν

)k, (22)

which tends to 0 as p, q −→ ∞. By (22), (1) defines a continuous function:

K α,ν ∈ C 0(R). Fix now x, y in R with x

= y and choose t ≥ 1 and p ∈ N such

that |x|, |y| ≤ 22ν −1t (23)

andt

22νp≤ |y − x| ≤ t

22ν ( p−1). (24)

Since by (2) ϕ is non-expansive, that is |ϕ(t) − ϕ(s)| ≤ |t − s| and also |ϕ| ≤ 1, we

estimateK α,ν (x) − K α,ν (y)

|x − y|α ≤ (25)

≤ |x − y|−α p−1

k=0

2−2ανk

ϕ(2νkx) − ϕ(2νky) + 2

k= p

2−2ανk

≤ |x − y|−α p−1k=0

22νk(1−α)|x − y| + 2

∞k= p

2−2ανk

. (26)

Page 44: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 44/79

44 N. I. Katzourakis

Hence, by (24), estimate (25) givesK α,ν (x) − K α,ν (y)

|x − y|α ≤ (27)

≤ |x − y|−α p−1k=0

22νk(1−α)|x − y| + 2 2−2ναp

1 − 2−2να

= |x − y|−α[

22νp(1−α) − 1

22ν (1−α) − 1|x − y| + 2

2−2ναp

1 − 2−2να

](28)

≤ |x − y|−α[

22νp(1−α) − 1

22ν (1−α) − 1|x − y| + 2

2−2ναp

1 − 2−2να

22νp

t|x − y|

]

≤ |x − y|1−α

[22νp(1−α) − 1

22ν (1−α) − 1+ 2

22νp(1−α)

1 − 2−2να

].

Again by (24), estimate (55) gives

K α,ν (x) − K α,ν (y)

|x − y|α ≤ (29)

≤ (t2−2ν ( p−1)

)1−α22νp(1−α)

[1 − 2−2νp(1−α)

22ν (1−α) − 1+

2

1 − 2−2να

]

= t1−α22ν (1−α)

[1 − 2−2νp(1−α)

22ν (1−α) − 1+

2

1 − 2−2να

](30)

≤ t1−α22ν (1−α)

[1

22ν (1−α) − 1+

2

1 − 2−2να

].

By estimate (23), we have t ≥ max1, 21−2ν |x|, 21−2ν |y|. By minimizing (29) with

respect to all such t’s, we obtainK α,ν (x) − K α,ν (y)

|x − y|α ≤

≤ (max|x|, |y|, 22ν )1−α

[1

22ν (1−α) − 1+

2

1 − 2−2να

].

(31)

By periodicity of K α,ν , we may further assume that |x|, |y| ≤ 1. Hence, estimate (31)

leads directly to (13) and (51).

(ii) Fix x ∈ R and m ∈ N. Let tm : R −→ R be the step function given by formula

(17), which we reformulate as

tm(x) =

+2−2νm−1, i2−2νm < x ≤ i2−2νm + 2−2νm−1, i ∈ Z.−2−2νm−1, i2−2νm + 2−2νm−1 < x ≤ i2−2νm + 2−2νm , i ∈ Z.

(32)

Page 45: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 45/79

Holder Continuous Nowhere Improvable Function with Derivative... 45

We observe that since |tm(x)| = 12

2−2νm and

22νm(x + tm(x)) − 22νmx

=

1

2, (33)

tm is defined in such a way that no integer lies between 22νmx and 22νm(x + tm(x)).

By (1), we can first estimate from below the difference quotient(K α,ν (x + tm(x)) −

K α,ν (x))

/tm(x) for β = 1 as

K α,ν (x + tm(x)) − K α,ν (x)

tm(x)

≥ (34)

∞k=m+1

2−2ανk

ϕ(22νk (x + tm(x))) − ϕ(22νkx)

tm(x)

+

+ 2−2ανm ϕ(22νm(x + tm(x))) − ϕ(22νmx)

tm(x) − (35)

−m−1k=0

2−2ανk

ϕ(22νk (x + tm(x))) − ϕ(22νkx)

tm(x)

.

We will derive estimate (15) by estimating each term of (34). First, observe that the

sum∑∞

k=m+1 in (34) vanishes, since by (2) ϕ is 2-periodic: indeed, for k ≥ m + 1,

we have

ϕ(22νk (x + tm(x))) − ϕ(22νkx) = ϕ(

22νkx ± 22ν (k−m)−1) − ϕ

(22νkx

)= ϕ

(22νkx ± 2 22(ν (k−m)−1)

)− ϕ

(22νkx

)(36)

= 0,

the last equality being obvious since 22(ν (k−m)−1) ∈ N. Next, the sum∑m−1

k=0 in (34)

can be estimated as

m−1k=0

2−2ανk

ϕ(22νk (x + tm(x))) − ϕ(22νkx)

|tm(x)| ≤ (37)

≤m−1k=0

2−2ανk

22νk (x + tm(x)) − 22νkx

|tm(x)|

=

m−1k=0

22ν (1−α)k (38)

=1 − 22ν (1−α)m

1 − 22ν (1−α).

Page 46: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 46/79

46 N. I. Katzourakis

Finally, by the definition of tm and the fact that ϕ is piecewise affine with unit slope

along the segments between integers, the remaining middle term of (34) gives

2−2ανm ϕ(22νm(x + tm(x))) − ϕ(22νmx)

tm(x) = (39)

= 2−2ανm

22νm(x + tm(x)) − 22νmx||tm(x)

(40)

= 22ν (1−α)m.

By utilizing equations (36), (37) and (39), estimate (34) implies

K α,ν (x + tm(x)) − K α,ν (x)

tm(x)

≥ 22ν (1−α)m − 1 − 22ν (1−α)m

1 − 22ν (1−α)(41)

≥ (22ν (1−α) − 2)

22ν (1−α)

−1 (22ν (1−α)

)m

.

By (41), if β ∈ (α, 1] then by employing that |tm(x)| = 2−2νm−1, we have

K α,ν (x + tm(x)) − K α,ν (x)

|tm(x)|β ≥ 2−(2νm+1)(1−β) (22ν (1−α) − 2)

22ν (1−α) − 1

(22ν (1−α)

)m= K (m,ν,α,β ), (42)

which is equivalent to (15) and (16).

Proof of Proposition 2. Let M

≥1. We fix m

∈N and set

E m :=

x ∈ R tm(x) > 0

. (43)

By (17), tm is a Borel measurable function and hence E m is a Borel set. By integrating

(42) on (−M 2

, M 2

) for β = 1, we obtain

M (22ν (1−α) − 2)

(22ν (1−α)

)m22ν (1−α) − 1

≤ (44)

≤ M

2

−M 2

K α,ν (x + tm(x)) − K α,ν (x)

tm(x)

dx

=

(−M 2,M 2

)∩E m

K α,ν (x + tm(x))

−K α,ν (x)

tm(x)dx+ (45)

+

(−M

2,M 2

)\E m

K α,ν (x + tm(x)) − K α,ν (x)

tm(x)

dx.

Page 47: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 47/79

Holder Continuous Nowhere Improvable Function with Derivative... 47

Hence, by (43) and (32), (44) gives

M (22ν (1−α) − 2)

(22ν (1−α)

)m22ν (1−α) − 1

≤ (46)

≤ (−M

2,M 2

)∩E m

K α,ν (x + 2

−2νm−1

) − K α,ν (x)2−2νm−1dx+

+

(−M

2,M 2

)\E m

K α,ν (x − 2−2νm−1) − K α,ν (x)

−2−2νm−1

dx. (47)

By a change of variables in the second integral, we obtain

(22ν (1−α) − 2)(

22ν (1−α))m

22ν (1−α) − 1≤ (48)

≤ 1

M

(−M

2,M 2

)∩E m

K α,ν (x + 2−2νm−1) − K α,ν (x)

2−2νm−1

dx+ (49)

+

1

M

((−M 2,M 2

)\E m)−2−2νm−1K α,ν (x + 2−2νm−1)

−K α,ν (x)

2−2νm−1

dx.

Hence, since M ≥ 1 and 2−2νm−1 ≤ 12

, we conclude

(22ν (1−α) − 2)

22ν (1−α) − 1

(22ν (1−α)

)m ≤ 2

M

M

−M

K α,ν (x + 2−2νm−1) − K α,ν (x)

2−2νm−1

dx

(50)

and (50) equals (18).

3 Singular PDE Solutions of the Aronsson System and the ∞-Laplacian.

3.1 Singular Viscosity Solutions of the Aronsson PDE

We recall here a result established in [16] by employing the singular function K α,ν .We proved that when n ≥ 2 and H ∈ C 1(Rn) is a Hamiltonian such that some level

set contains a line segment, the Aronsson equation D2u : H p(Du) ⊗ H p(Du) = 0admits explicit entire viscosity solutions. They are superpositions of a linear part plus

a Lipschitz continuous singular part which in general is non-C 1 and nowhere twice

differentiable. In particular, we supplemented the C 1 regularity result of Wang and Yu

[28] by deducing that strict level convexity is necessary for C 1 regularity of solutions.

Theorem 4 (cf. [16]) We assume that H ∈ C 1(Rn) , n ≥ 2 and there exists a straight

line segment [a, b] ⊆ Rn along which H is constant. Then, for any F ∈ W 1,∞loc (R)

satisfying ∥F ′∥L∞(R) < 1 , the formula

u(x) :=b + a

2· x + F

b − a

2· x

, x ∈ Rn, (51)

defines an entire viscosity solution u ∈ W 1,∞loc (Rn) of the Aronsson equation (5).

Page 48: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 48/79

48 N. I. Katzourakis

Here the notation “·” denotes inner product. By employing the particular choice

F :=∫

K α,ν and variants of this, we provided the following partial answer to the

regularity problem:

Corollary 5 Strict level convexity of the Hamiltonian H is necessary to obtain C 1 and

C 1,β regularity of viscosity solutions to the Aronsson PDE (5) in all dimensions n ≥ 2.

The idea of the proof of Theorem 4 is the following: suppose first u is smooth.

Then, formula (51) is devised in such a way that the image of the gradient Duis contained into the segment [a, b]. Hence, H (Du) is constant because Du(Rn)is contained into a level set of H . By rewritting Aronsson’s PDE with contracted

derivatives, we get

A∞u = H p(Du) · D(

H (Du))

= 0. (52)

Hence, (51) defines a solution of Aronsson’s PDE. However, the previous argument

fails when we chose as F the primitive of K α,ν . For the general case of viscosity

solutions, we can use techniques of calculus of the so-called Semijets which are

the pointwise generalized derivatives of viscosity solutions to obtain the result.Alternatively, we can use the stability properties of viscosity solutions under limits

which claim that local uniform approximation of viscosity solutions produces viscosity

solutions to obtain Theorem 4.

3.2 Singular ∞-Harmonic Local Diffeomorhisms

Now we follow [17] and recall further constructions of singular solutions.

Let K ∈ C 0(R) and define u : R2 −→ R2 by

u(x, y) :=

x0

eiK (t)dt + i

y0

eiK (s)ds. (53)

(53) defines a 2-dimensional ∞-Harmonic Map, which is singular if K ∈ C 1

(R

).

Proposition 6 (cf. [17, 21]) Suppose ∥K ∥C 0(R) < π4

and let u be given by (53). Then,

u is a C 1(R2)2-local diffeomorphism and everywhere solution of the PDE system (9)

with contracted derivatives, that is of

Du D

1

2|Du|2

+ |Du|2[Du]⊥ Div (Du) = 0. (54)

In the special case where u is smooth, the main idea is that u equals the sum

of two unit speed curves on R2 in separated variables and as such the Euclidean

(Frobenious) matrix norm |Du| =√

DiuαDiuα of the gradient is constant. Moreover,

in zero-codimension the orthogonal projection [Du]⊥ vanishes identically. Hence, uis a planar

∞-Harmonic map. Since the partial derivatives are linearly independent

everywhere, the map is both an immersion and a submersion. Hence, by the inverse

function theorem it is a local diffeomorphism.

In the case K is nonsmooth and in particular for K := 14

K α,ν , we obtain a

C 1,α(R2)2 nowhere improvable ∞-Harmonic local diffeomorphism which can not

Page 49: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 49/79

Holder Continuous Nowhere Improvable Function with Derivative... 49

be rigorously justified, since the previous argument fails. There appear contracted

derivatives which can not be expanded and also multiplications of distributions with

nonsmooth functions which are not well-defined.

As we prove in [17, 21], (53) can not be rigorously justified as solution classically,

strongly, weakly, distributionally or in any other sense. However, u is a ContactSolution of the ∞-Laplacian for sufficient exponents (α > 12

). This latter property

of our class of functions is essential for the validity of the novel calculus of our

Vectorial Contact Jets (which are the pointwise weak derivatives of the vector case)

of the “weak” theory of Contact Solutions. The latter applies to general fully nonlinear

PDE systems.

3.3 Singular Aronsson Maps

Again in [17], we have proved that under the assumption that some level set of H contains a straight line segment of rank-one matrices and has constant gradient H Pthereon, there exist singular Aronsson maps, that is, singular solutions of (11). More

precisely, given a, b ∈ Rn, η ∈ RN and f ∈ C 1(R), we define u : Rn −→ RN by

u(x) :=

x⊤

b + a

2

η + f

x⊤

b − a

2

η. (55)

Proposition 7 (cf. [17, 21]) Let H ∈ C 2(RN ⊗ Rn) and suppose that there exist

a, b ∈ Rn , η ∈ RN , c ∈ R and C ∈ RN ⊗Rn such that

[η ⊗ a, η ⊗ b] ⊆ H = c ∩ H P = C . (56)

Let u be given by (55) and suppose ∥f ′∥C 0(R) < 1. Then, u is an everywhere solution

in C 1(Rn)N of the Aronsson PDE system (11) contracted, that is of

H P(Du) D

(H (Du)

)+ H (Du)[H P(Du)]⊥ Div

(H P(Du)

)= 0. (57)

We refrain from presenting more details in this case, since the main ideas relate to

the ones of the previous cases.

If N > 1, (11) is a quasilinear nonmonotone system in nondivergence form and

generally does not possess classical, strong, weak, measure-valued, distributional or

viscosity solutions. In the paper [21] we introduce our new PDE theory and among

other things prove that (53), (55) are appropriately interpreted solutions.

We emphasize that the specific properties of our class of singular functions imply

both the validity of the tools of Viscosity-Contact Solution theories, as well as the

inability to rigorously interpret these solutions by means of existing PDE theories in

the vector case. In the scalar case, they furnish a regularity counterexample to an

important open problem. In the brand new vector case which is at its birth, render the

rigorous study of the “Euler-Lagrange PDEs” of Vector-Valued Calculus of Variations

in L∞

almost impossible without an efficient PDE approach like the one proposed in[21].

Acknowledgement. A primitive version of sections 1 and 2 of this work was written

when the Author was a doctoral student at the Department of Mathematics, University

Page 50: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 50/79

50 N. I. Katzourakis

of Athens, Greece and had been financially supported by the Grant Heraclitus II

- Strengthening human research potential through the implementation of doctoral

research.

References

[1] P. C. Allart, K. Kawamura, The improper infinite derivatives of Takagi’s nowhere

differentiable function, J. Math. Anal. Appl. 372 (2010) 656 - 665.

[2] G. Aronsson, On the partial differential equation u2xuxx + 2uxuyuxy + u2

yuyy =0, Arkiv f ur Mat. 7 (1968), 395 - 425.

[3] G. Aronsson, On Certain Singular Solutions of the Partial Differential Equation

u2xuxx + 2uxuyuxy + u2

yuyy = 0, Manuscripta Math. 47 (1984), no 1-3, 133 -

151.

[4] G. Aronsson, Construction of Singular Solutions to the p-Harmonic Equation and

its Limit Equation for p =∞

, Manuscripta Math. 56 (1986), 135 - 158.

[5] E. N. Barron, L. C. Evans, R. Jensen, The Infinity Laplacian, Aronsson’s Equation

and their Generalizations, Transactions of the AMS, Vol. 360, Nr 1, Jan 2008,

electr/ly published on July 25, 2007.

[6] A. Bauche, S. Dubuc, A Unified approach for nondifferentiable functions, J.

Math. Anal. Appl. 182 (1994) 134 - 142.

[7] E. I. Berezhnoi, A Subspace of H older Space Consisting Only of Nonsmoothest

Functions, Mathematical Notes, Vol. 74, No. 3, 2003, pp. 316 - 325.

[8] T. Bhattacharya, On the Behaviour of ∞-Harmonic Functions Near Isolated

Points, Nonlinear Analysis 58, 333 - 349, (2004).

[9] P. Billingsley, Van Der Waerden’s Continuous Nowhere Differentiable Function,

Amer. Math. Monthly 89 (1982), no. 9, 691.

[10] J.B. Brown, G. Kozlowski, Smooth Interpolation, H older Continuity and the

Takagi-Van der Waerden Function, Amer. Math. Monthly, 110, 142 - 147, (2003).

[11] L. Capogna, A. Raich, An Aronsson type approach to extremal quasiconformal

mappings, J.Differential Equations (2012), to appear.

[12] F. S. Cater, Remarks on a Function without Unilateral Derivatives, J. Math. Anal.

Appl. 182 (1994) 718 - 721.

[13] M. G. Crandall A Visit with the ∞-Laplacian, in Calculus of Variations and Non-

Linear Partial Differential Equations, Springer Lecture notes in Mathematics1927, CIME, Cetraro Italy 2005.

[14] L. C. Evans, O. Savin, C 1,α Regularity for Infinity Harmonic Functions in Two

Dimensions, Calc. Var. 32, 325 - 347, (2008).

Page 51: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 51/79

Holder Continuous Nowhere Improvable Function with Derivative... 51

[15] R. Jain, B. R. Nagaraj, C 1,1/3 Regularity in the Dirichlet Problem for ∆∞, Calc.

Var. 32, 325 - 347, (2008). Computer and Mathematics with Applications 53, 277

- 394 (2007).

[16] Explicit Singular Viscosity Solutions of the Aronsson Equation, Comptes Rendus

Mathematique, C. R. Acad. Sci. Paris, Ser. I 349 (2011) 1173 - 1176.

[17] L∞ Variational Problems for Maps and the Aronsson PDE System, J. Differential

Equations (2012), to appear.

[18] N. Katzourakis, ∞-Minimal Submanifolds, ArXiv preprint, 2012.

[19] N. Katzourakis, On the Structure of ∞-Harmonic Maps, ArXiv preprint, 2012.

[20] N. Katzourakis, Extremal ∞-Quasiconfromal Immersions, manuscript, 2012.

[21] N. Katzourakis, Contact Solutions for Nonlinear Systems of Partial Differential

Equations, manuscript, 2012.

[22] K. Knopp, Ein einfaches Verfahren zur Bil ¨ ung stetiger nirgends differenziebarer functionen, Math. Z. 2 (1918), 1 - 26.

[23] T. M. Lewis, A probabilistic property of Katsuura’s continuous nowhere

differentiable function, J. Math. Anal. Appl. 353 (2009) 224 - 231.

[24] J. McCarthy, An everywhere continuous nowhere differentiable function, Amer.

Math. Monthly 60 (1953), 709.

[25] O. Savin, C 1 Regularity for Infinity Harmonic Functions in Two Dimensions,

Arch. Rational Mech. Anal. 176, 351 - 361, (2005).

[26] S. Sheffield, C.K. Smart, Vector Valued Optimal Lipschitz Extensions, Comm.

Pure Appl. Math., Vol. 65, Issue 1, January 2012, 128 - 154.

[27] J. Thim, Continuous nowhere differentiable functions, Master’s thesis, Lulea

University of Technology, December 2003, available at http://epubl.ltu.se/1402-

1617/2003/320/index-en.html.

[28] C. Wang, Y. Yu C 1 Regularity of the Aronsson Equation in R2, Ann. Inst. H.

Poincare, AN 25, 659 - 678, (2008).

Page 52: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 52/79

Page 53: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 53/79

S eMA Journal

no59(2012), 53–78

A MODULAR LATTICE BOLTZMANN SOLVER FOR GPU COMPUTING

PROCESSORS

M. ASTORINO∗, J. BECERRA SAGREDO∗, A. QUARTERONI†

∗CMCS, Chair of Modelling and Scientific Computing, MATHICSE, Mathematics Institute of

Computational Science and Engineering, Ecole Polytechnique Federale de Lausanne, Station 8,

CH-1015 Lausanne, Switzerland,†MOX, Modeling and Scientific Computing, Department of Mathematics, Politecnico di

Milano, Via Bonardi 9, 20133 Milano, Italy

†Corresponding author:[email protected]

Abstract

During the past two decades, the lattice Boltzmann method (LBM) hasbeen increasingly acknowledged as a valuable alternative to classical numerical

techniques (e.g. finite elements, finite volumes, etc.) in fluid dynamics. A

distinguishing feature of LBM is undoubtedly its highly parallelizable data

structure. In this work we present a general parallel LBM framework for graphic

processing units (GPUs) characterized by a high degree of modularity and memory

efficiency, still preserving very good computational performance. After recalling

the essential programming principles of the CUDA C language for GPUs, the

details of the implementation will be provided. The data structure here presented

takes into account the intrinsic properties of the Gauss-Hermite quadrature rules

(on which the LBM is based) to guarantee a unique and flexible framework for

two- and three-dimensional problems. In addition, a careful implementation of a

memory efficient formulation of the LBM algorithm has allowed to limit the high

memory consumption that typically affects this computational method. Numerical

examples in two and three dimensions illustrate the reliability and the performanceof the code.

Keywords: lattice Boltzmann method, GPU programming, CUDA, parallel

computing, Navier-Stokes equations

1 Introduction

The lattice Boltzmann method (LBM) is a kinetic-based approach for the numerical

simulation of fluid dynamics problems [37]. In the last two decades, this method

Received: October 17, 2011. Accepted: May 28, 2012.

53

Page 54: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 54/79

54 M. Astorino, J. Becerra Sagredo, A. Quarteroni

has proven to be quite efficient in the simulation of various transport phenomena (see

[1] for a review) and can now be considered a valuable complementary approach to

more classical and consolidated techniques (e.g. finite differences, finite elements or

finite volumes) [8, 28, 36, 20, 23]. Differently from these techniques that directly

stem from a continuum mechanics formulation of flow problems, LBM adopts abottom-up approach in which the macroscopic behavior is described by modeling

the interactions of moving particles at a mesoscopic level and then retrieving the

macroscopic quantities as weighted sums of the corresponding particle distribution

functions. Depending on the way the particles propagate (or stream) and interact (or

collide), different macroscopic behaviors can be described.

From the computational point of view, LBM is recognized to be both computational

expensive and memory demanding [25]. However the explicit nature of the method and

its noteworthy spatial locality of data access (in fact, for each lattice node only nearest

neighbor information are needed) make LBM an excellent candidate for parallel

implementations.

With the recent advances in parallel computing capabilities of the contemporary

graphics processing units (GPUs), there has been a growing interest of computational

scientists in using this hardware to accelerate computationally demanding numericalsimulations. Nowadays two different programming languages for GPUs are available:

the Compute Unified Device Architecture (CUDA C) developed and supported by

NVIDIA [10] and the Open Computing Language (OpenCL) [31].

In the framework of LBM, efficient CUDA C implementations have been recently

proposed in various works, see e.g. [18, 38] and [39, 3, 4, 29] respectively for two-

and three-dimensional fluid problems. In order to fully benefit of the computational

capabilities of GPU devices, most of the mentioned works describe implementations

that are highly optimized for a given type of macroscopic problem and lattice structure.

As a consequence, they are characterized by a limited flexibility.

In this work a modular and efficient implementation of the LBM for GPU is

presented. The implementation is based on the CUDA C programming language and

offers a flexible structure that allows for easy modification. Our lattice BoltzmannGPU solver is characterized by various novelties with respect to similar existing

published work. First and most important the modular framework, that exploits the

mathematical properties of the Gauss-Hermite quadrature rules to guarantee a general

implementation that can be used for any kind of two- or three-dimensional lattice

structure. As a matter of fact, most of the works presented in literature are characterized

by ad hoc implementations that are optimized for a specific lattice structure (see

references [39, 3, 18, 38, 29] as well as recent works such as [14]).

Besides, the high degree of modularity that we advocate in this paper allows

to easily account for (by adding and modifying) new functions and capabilities,

differently than what done, e.g., in [39, 18, 38] where the whole lattice Boltzmann

algorithm (initialization, boundary conditions, streaming and collision) is practically

implemented in a unique CUDA kernel.

A new memory layout based on three structure-of-arrays is used to store the

unknown particle distribution functions and their access is controlled by a semi-

indirect addressing scheme. The resulting code is also optimized in terms of memory

requirements adopting the swapping technique proposed in [24], that reduces the

Page 55: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 55/79

A modular lattice Boltzmann solver for GPU computing processors 55

memory requirements for a given lattice size by a factor of two. The proposed

implementation exploits very well the characteristics of the GPU architecture still

preserving a general formulation of the LBM. As a matter of fact, even though the

focus is given on the simulation of fluid flows, our general formulation allows the (re-

)use of the same code for different kind of macroscopic problems at the expense of aslight reduction in performance.

This paper is organized as follows. In Section 2 we briefly review the LBM and

the corresponding discretization, focusing in particular on the model for fluid flows. In

Section 3 we provide the basic notions of GPU architectures as well as the essential

features of the CUDA C programming language. We present the implementation

details and the optimization strategy of our lattice Boltzmann framework in Section 4.

Test cases in two and three dimensions are reported in Section 5 in order to validate the

code and discuss its performance. With the aim of helping comparison with other

published works (a quite challenging task since, at the best of our knowledge, no

public release of the associated LB codes is available), in our paper we provide full

information on the data of our test cases, on the graphic card used, and on the GPU

time expressed in Million of Lattice node Updates Per Second (MLUPS), a common

unit adopted in the LB community. Finally, some concluding remarks are given inSection 6.

2 The lattice Boltzmann method

Historically the lattice Boltzmann method has been derived from the lattice gas

automata (LGA) [26]. The fundamental idea behind LGA is that the physics of

macroscopic phenomena can be retrieved by defining suitable interactions among

microscopic particles, which essentially consist in the propagation and the collision

of particles lying on a discrete regular grid (the lattice).

Although relying on the same idea, differently from LGA, the LBM has its

roots in the kinetic theory of the Boltzmann-Maxwell equation [6]. Because of the

strengthened mathematical foundations of kinetic theory, LBM remedies to some of the shortcomings of LGA, while preserving the same local nature and simplicity.

The lattice Boltzmann equation provides a minimal discrete form of the

Boltzmann-Maxwell equation

(∂ t + e · ∇x + F · ∇e

)f (e,x, t) = J (f )(x, t), (1)

a conservation equation for the particle distribution function f (e,x, t), so that

f (e,x, t)dedx gives the total mass of particles inside the infinitesimal volume element

dedx at a fixed time t, position x and velocity e. The quantityF represents the external

force while the term J , also called collision operator , takes into account the effects of

inter-particle collisions.

The classical form of lattice Boltzmann equation (see [26]) can be retrieved

from (1) upon discretization of the time and the phase space (position and velocity).

Following [36], first the velocity space is approximated by projecting the distribution

function f onto a Hilbert subspace H N spanned by the first N Hermite polynomials,

where the order N is dictated by the macroscopic behavior one wants to recover (e.g.

Page 56: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 56/79

56 M. Astorino, J. Becerra Sagredo, A. Quarteroni

N = 2 for the Navier-Stokes equations). Then the resulting discrete velocity equation

is integrated along the characteristics. An approximation with finite differences leads

to the following lattice Boltzmann equation

f i(x + eiδx,t + δt)

−f i(x, t) = J i(f )(x, t) + δtF i

∀i = 0,...,Q, (2)

where δx and δt are respectively the space- and time-step, while E = e0,..., eQdenotes the discrete velocity set obtained by the projection onto H N .

Remark 1 Note that in Equation (2) the evolution of the particle distributions f i is

restricted to velocities that belong to the pre-assigned discrete set E . In other words

the particles move always on the lattice nodes and only along the links defined by

the discrete set of velocities e0, ...,eQ. For this reason the set of velocities is also

commonly addressed as the lattice structure of the model.

On the right hand side, the quantity F i is a discrete forcing term, approximating the

external force F. The form of F i depends on the particular macroscopic behavior we

want to simulate, or equivalently on the choice of the lattice structure. The term J i(f )defines a general collision operator and represents a discrete approximation for J . The

most common choice for J i(f ) is undoubtedly the well known single relaxation timeBhatnagar-Gross-Krook (BGK) model [33], where

J i(f ) ≈ J BGK i (f ) = − 1

τ (f i − f eqi ) ∀i = 0,...,Q, (3)

τ being the so-called relaxation time and f eqi an appropriate discrete approximation

of the Maxwell-Boltzmann equilibrium distribution [36]. The BGK model has been

considered throughout this work, nonetheless other collisional operators have been

proposed in literature. For instance, Equation (3) can be considered a particular

approximation of the quasilinear form

J i(f ) ≈ J QLi (f ) = −Aij(f j − f eqj ) ∀i = 0,...,Q, (4)

where summation convention over repeated indices is implied [15]. In the last equation,Aij is the quasilinear scatter matrix defined as

Aij =∂J i(f )

∂f j|f eq .

Other choices for J i(f ) are the multiple relaxation times (MRT) model [11], the

entropic model [2] and the regularized model [21].

Depending on the choice of the discrete velocity set E and on the discrete collision

operator J i(f ) different macroscopic behaviors can be modeled [20]. Let us consider

for example the BGK model for isothermal low Mach number fluid flows in two and

three dimensions. In these cases the equilibrium distribution function in J BGK (f ) and

the forcing term F i take respectively the forms

f eqi = ρwi

1 + 1

c2s

ei · u + 12c4

s

(eiei − c2sI ) : uu

∀i = 0,...,Q, (5)

F i =

1 − 1

wi

ei − u

c2s

+ei · u

c4s

ei

· F ∀i = 0,...,Q, (6)

Page 57: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 57/79

A modular lattice Boltzmann solver for GPU computing processors 57

where ρ and u are the density and the velocity of the fluid, respectively. The speed of

sound cs is defined as β δxδt and represents the sound-propagation velocity of the model.

The weights wi and β are suitable constants that depend on the choice of the discrete

velocity set E .

In two dimensions, the nine-velocity square lattice structure –also referred asD2Q9– is undoubtedly the most popular choice for the discrete velocity set E ,Figure 1.a. In three dimensions, various choices of the discrete velocity set are

available, among them we recall for instance the 15 velocities lattice structure

(D3Q15), the 19 velocities lattice structure (D3Q19) (also reported in Figure 1.b) and

the 27 velocities lattice structure (D3Q27). We refer to [1] for more details on the

mentioned lattice structures and on the associated coefficients (the weights wi and the

speed of sound cs). In Appendix 7 the lattice structures implemented in our code are

given.

(a) D2Q9 (b) D3Q19

Figure 1: Examples of lattice structures. Dots represent the lattice nodes xi, ∀i =0,...,Q in the unit cell. The arrows identify the links ei, ∀i = 0,...,Q in the lattice

structure, their lengths are given by ∥x0 − xi∥, ∀i = 0,...,Q.

In the limit of low Mach numbers (u/cs small enough), assuming small space

and time discretizations, the numerical model defined by Equations (2), (3), (5),

(6) recovers asymptotically the dimensionless Navier-Stokes equations with dynamic

shear viscosity1

ν d = c2s

τ − 1

2

δt. (7)

Note finally that the macroscopic variables, such as density and velocity, can be locally

1The dynamic shear viscosity ν d in the dimensionless Navier-Stokes equation corresponds to the quantity

1/Re, where Re = u0l0/ν is the Reynolds number, l0 and u0 being respectively the reference length and

velocity and ν the dynamic shear viscosity in physical units.

Page 58: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 58/79

58 M. Astorino, J. Becerra Sagredo, A. Quarteroni

recovered as moments of the distribution functions:

ρ(x, t) =

Qi=0

f i(x, t), (8)

u(x, t) =1

ρ

Qi=0

eif i(x, t) +δt

2F

, (9)

and that the fluid pressure scales linearly with the density by the ideal gas law

p = c2sρ + p0, p0 being the reference pressure. A comprehensive presentation of

these results can be found in [9, 41, 37].

From a computational point of view, one can think of Equation (2) being split into

two parts:

collision step : f i(x, t) = f i(x, t) + J i(f )(x, t) + δtF i ∀i = 0,...,Q, (10)

streaming step : f i(x + eiδx,t + δt) = f i(x, t) ∀i = 0,...,Q. (11)

The collision step is a local update of the distribution functions on each lattice node,while the streaming step moves the data across the lattice. This set of equations

is eventually supplemented with appropriate initial and boundary conditions for the

distribution functions, for which multiple formulations exist (see [27, 17, 22] and

references therein). The treatment and implementation of complex initial and boundary

conditions goes beyond the scope of this work. For this reason in the numerical

experiments of Section 5, we simply initialize the computations with the equilibrium

distribution f eq and we adopt the full-way bounce back rule for solid fixed walls:

f i(x, t + δt) = f i(x, t − δt), ∀i = 0,...,Q, (12)

f i denoting the distribution anti-parallel to f i. Equilibrium distribution boundaries,

zero-gradient boundaries and periodic boundaries have also been implemented. An

accurate description and analysis of these and other boundary conditions for LBM canbe found in [7].

3 An overview on GPUs and NVIDIA CUDA

This section briefly introduces the basics of GPU architecture and CUDA programming

in order to provide the reader with the essential understanding of this particular

computational framework. The focus is given on NVIDIA cards since we adopted the

CUDA C programming language. However, the more recent initiative OpenCL can be

used to program CPUs, GPUs and other devices for different vendors. Furthermore, it

results easy to convert a CUDA program into a OpenCL one with minor modifications,

although CUDA is still more efficient for Nvidia GPUs.

3.1 NVIDIA GPU architecture

In Figure 2 we illustrate the most important architectural elements of a GPU device.

NVIDIA GPUs are made of several Streaming Multiprocessors (SMs), each of which

Page 59: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 59/79

A modular lattice Boltzmann solver for GPU computing processors 59

consists of a certain number of Scalar Processors (SPs). There are also an instruction

unit and three fast access memories: the shared memory, the constant memory and the

texture memory. All these memories have streaming multiprocessor scope however

the last two are read-only. The GPU device has also a fourth memory, the device (or

global) memory, and it is common to all the SMs. Differently from the others thismemory can be accessed by the CPU and it is characterized by a higher capacity but

slower access.

Figure 2: NVIDIA GPU architecture. Source [10].

From a practical point of view, the parallel computing capabilities of a device areoften identified with the number of GPU cores, which is given by the total number of

SPs on the device. The number of scalar processors per SM as well as the number of

streaming multiprocessors depend on the series and model of the device.

Each multiprocessor runs in parallel with the others and it is responsible for

creating, managing, and dispatching concurrent groups of threads on the associated

scalar processors. These groups of threads are named blocks and are actually

executed on a multiprocessor in subgroups of 32 threads (what NVIDIA calls a warp).

Depending on the number of cores per SM, a different number of clock cycles may be

required to complete the operations on the warp. As soon as all the threads in a block

terminate, a new block is launched on the vacated multiprocessors until all the blocks

have been executed.

3.2 CUDA programming

The CUDA programming model is built around the multithreaded structure presented

above. A CUDA code consists in functions that can be mainly classified in two groups:

Page 60: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 60/79

60 M. Astorino, J. Becerra Sagredo, A. Quarteroni

functions run by the CPU –the host – and functions run by the GPU –the device.

Functions running on the device are also called kernels. When a CUDA program on

the host CPU invokes a kernel, a grid of threads is generated. This grid is made of

blocks and each block is made of threads; a sketch of it is given in Figure 3.

Figure 3: Threads, blocks and grids in CUDA. Source [10].

The layout of the grid (i.e. the number of blocks and the number of threads) is

specified at run time during the kernel execution. The syntax adopted in the CUDA C

programming language for the kernel is the following

kernelName <<< gridSize , blockSize >>>

(inputParameters);

where gridSize and blockSize prescribe respectively the dimensions of the gridand of the block. The blocks in the grid can be organized in a one or two-dimensional

layout while the threads in the block may have up to three dimensions. Considering for

example the grid in Figure 3 we have six blocks organized in a two-dimensional layout

and within each block a two-dimensional arrangement of twelve threads. The number

of blocks per grid and of threads per block may vary according to the kernel and/or

the application of interest. It is however recommended to set the number of threads

per block as a multiple of the half-warp size in order to avoid scalar processors being

idle [10]. The maximum number of blocks per grid is 65535 in each dimension, while

the maximum number of threads vary according to the compute capability of the GPU

device.

It is worth to notice that the development of a performant CUDA code depends

strongly on threads communications and memory access pattern. Threads can

efficiently communicate within a block by passing data through the shared memory.

However inter-blocks communications are much slower, since they are based on global

memory. It is therefore important to limit them whenever possible. Optimal memory

access patterns are also critical, as a matter of fact memory bandwidth may degrade

Page 61: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 61/79

A modular lattice Boltzmann solver for GPU computing processors 61

up to one order of magnitude if they aren’t properly optimized. In order to avoid these

dramatic effects on GPU performance various programming rules are recommended in

[10]. Here we summarize a few of them:

•Concurrent reading/writing of threads on the same memory address has to be

avoided in order to prevent code serialization.

• Threads should be organized in groups of 16 (half-warp size) so that all the

threads within the group perform the very same operations. In this way,

reading/writing operations on the group are done into a single memory access.

When threads perform different operations within the same group, the so-called

thread divergence, a reduction of performance is experienced.

• Data should be aligned in such a way that their reading/writing can be coalesced

into a continuous aligned memory access: the N-th thread of a block should

access the N-th element at address BaseAddress+N (see Figure 4.a). Index

N starts from zero and it is local within a block, BaseAddress is the memory

address of the zero-th thread. Two examples where this is not respected are givenin Figure 4.b and Figure 4.c.

Remark 2 As it will be described in the next section, a naive implementation of the

streaming step on GPUs may lead to misaligned accesses (Figure 4.c).

4 A modular and memory efficient GPU framework for LBM

In this section we describe the main features of our GPU framework for the lattice

Boltzmann method. Differently from other works (e.g. [39, 18, 29, 38]) which provide

a very specific and efficient GPU implementation of the method, here we presenta memory saving framework that offers a high level of generality with very good

computational performances.

The code, based on a set of GPU procedures written in CUDA C language, adopts

a modular structure that allows for easy modification. The data structure proposed

exploits the symmetry properties of the Gauss-Hermite quadrature rules in order to be

valid for any kind of two- or three-dimensional lattice structure. As a consequence

a specific semi-indirect addressing scheme is adopted to easily store and access the

unknown particle distribution functions. The resulting code is also optimized in terms

of memory requirements adopting a swapping technique that reduces the memory

requirements for a given lattice size by a factor of two.

The presentation of the framework is divided in three parts. First the modular

implementation for a basic LBM algorithm is illustrated. Then the data layout and

the corresponding CUDA grid structure are described. Finally a description of the

optimizations made on the routines follows. Comments on the algorithmic similarities

and novelties with respect to other GPU implementations are also pointed out through

the section.

Page 62: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 62/79

62 M. Astorino, J. Becerra Sagredo, A. Quarteroni

(a) Coalesced

( b) U nco alesced: n on-

sequential

(c) Uncoalesced: misaligned

Figure 4: Different accesses to GPU memory. The BaseAddress corresponding to

the zero-th thread is 122.

LBM algorithm. In a classical LBM algorithm at least six milestones routines can

be identified:

• initProblem for problem initialization,

• computeMacro for computation of macroscopic quantities from the particledistributions,

• collideParticles for particle collision,

• streamParticles for particle streaming,

• applyBCs for the enforcement of boundary conditions,

• exportResults to export results.

In our code these procedures have been implemented according to Algorithm 1.

In the routine initProblem the initialization procedure is implemented.

As already mentioned, in this work we consider for the sake of simplicity

an initialization based on the values of the equilibrium distribution f eq,

nonetheless other approaches exist in literature (see [5] for a review). The

functions computeMacro and collideParticles are implemented within the

same routine computeMacroAndCollideParticles since the macroscopic

Page 63: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 63/79

A modular lattice Boltzmann solver for GPU computing processors 63

quantities are locally needed for the evaluation of the equilibrium distribution (5).

For each node, first the density and velocity computations are carried out according

to Equations (8)-(9), then the collision step (10) follows. Eventually, the procedure

streamParticles implements step (11) and the routine applyBCs enforces the

various boundary conditions (e.g. Equation (12)).

Algorithm 1: Basic LBM algorithm

1 geoData = input(GeometryData);

2 physicsData = input(PhysicsData);

3 initProblem(geoData , physicsData);

4 for time ← 0 to finalTime do

5 computeMacroAndCollideParticles() ;

6 streamParticles();

7 applyBCs();

8 if time = postprocessTime then

9 exportResults();

10 end

11 end

Each one of the routines reported in Algorithm 1 has been implemented in

independent CUDA kernels and wrapped in C++ functions. Algorithm 2 provides

an example of the C++ wrapper for the streamParticlesKernel routine. Note

that in literature other implementations propose a unique GPU kernel containing all the

different routines (e.g. collision, streaming and boundary treatment). This approach

exploits at best the memory resources in the GPU limiting communications across

global memory, however its modularity is somehow limited.

Remark 3 Higher modularity is not the only advantage of our approach. As a matter

of fact, it will be clear from our presentation that the use of separate kernels for

different routines allows us to independently select for each one the grid layout,

yielding the best performance.

Algorithm 2: streamParticles wrapper

1 void streamParticles()2 ... // code executed on the host;

3 streamParticlesKernel <<<gridSize, blockSize>>>

(inputParameter1, inputParameter2); // code executed

on the device;

4 ... // code executed on the host;

5

Page 64: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 64/79

64 M. Astorino, J. Becerra Sagredo, A. Quarteroni

Data layout and CUDA grid organization. The data layout of the particle

distributions and the grid structure of the kernels have substantial impact on the

performance of GPU codes. Even though the two are intrinsically related, the

data layout remains in general unique in the whole code, while the grid structure

may be different among the kernel functions. The flexibility in the selection of the grid structure represents an important advantage of our approach that hasn’t

been exploited in other works because of a too specialized implementation (see e.g.

[39, 3, 18, 38, 29]).

In literature two main data layouts can be identified [40]. The first, conventionally

called “Array-of-Structures” (AoS) arrangement, stores contiguously the Q+1 particle

distribution functions for each lattice node (Figure 5.a). The second arrangement,

named “Structure-of-Arrays” (SoA), stores contiguously the N lattice nodes for each

particle distribution functions of the lattice model (Figure 5.b). As pointed out in [40]

the two arrangements, AoS and SoA, are computationally optimized for two different

steps of the LBM algorithm, respectively collision and streaming. On the one hand, the

AoS layout allows for an optimized computation of the macroscopic quantities during

the computeMacroAndCollideParticles step. On the other hand, the SoA

layout guarantees a consecutive memory access during the distributions update in thestreamParticles step.

(a) Array-of-Structures (b) Structure-of-Arrays

Figure 5: Example of Array-of-Structures and Structure-of-Arrays layout. Different

colors identify different particle distribution functions (i.e. f 0, f 1,..., f Q).

For each one of the arrangements different addressing schemes for accessing the

cell’s variables can be considered. Among the most common there are the directaddressing and the indirect addressing [25]. The former is easier to implement since

the whole computational domain (fluid nodes and boundary nodes) and the associated

particle distributions are accessed by enumeration (Figure 6.b). In this first approach

an additional phase variable per node is usually needed to apply the correct dynamics

to each cell. The latter, more involved, reduces memory consumption but requires

an extra algorithmic stage to reconstruct the connectivity matrix [35]. In this second

approach, having lost the natural ordering of the cell, one can take advantage to sort the

distribution in order to optimize data flow during the computations (see Figure 6.c).

The choice of the layout and of the addressing scheme is essentially based on

the specific needs of the final application/user and on the characteristics of the

computational architecture employed. In the particular case of GPU architectures,

the SoA layout is usually preferred since it attains better performance allowing for

coalesced access to global memory [34, 29].

In this work, in view of the development of a generically performant LBM

framework, we propose an arrangement where the single SoA structure of Q + 1 arrays

of Figure 5.b is split in three independent SoA structures. The first one is an array of N

Page 65: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 65/79

A modular lattice Boltzmann solver for GPU computing processors 65

(a) Lattice

(b) Direct a ddressing (c) Indirect addressing

Figure 6: Example of different addressing schemes for a lattice where fluid and

boundary nodes are respectively white and black filled.

elements (the N nodes) associated to the rest distribution f 0, the other two structures

are obtained by splitting the remaining distributions f i, ∀i ∈ 1,...,Q, in two groups

based on the central symmetry of the lattice structure. Considering for example the

D2Q9 lattice in Figure 7, distributions from f 1 to f 4 and from f 5 to f 8 belong to thefirst and second group, respectively. A similar separation in groups holds for the other

lattice structures and it is reported in Appendix 7.

The advantages of this new layout are twofolds. Firstly, this is very general

and admissible for all the lattice structures since the central symmetry is one of the

key properties of the Gauss-Hermite quadrature rules. Secondly, it is particularly

well suited for all the computations involving a swap of distributions across opposite

directions (e.g. full-way bounce back).

Figure 7: Example of layout for the D2Q9 lattice. Squares and circles identify the

group of distributions belonging to different SoA.

Concerning the addressing scheme, a semi-direct approach is adopted. The main

idea behind this formulation is to reconstruct the connectivity matrix only for boundary

nodes and to keep a direct addressing for the fluid ones. This could be efficiently

implemented on GPUs making use of the stream-compaction algorithm provided by

the open-source library Thrust [16].

Even though the total memory consumption is slightly increased (due to the

repetition of the boundary information), the proposed approach is advantageous in

Page 66: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 66/79

66 M. Astorino, J. Becerra Sagredo, A. Quarteroni

terms of implementation and performance. As a matter of fact, on the one hand we

exploit the simplicity of implementation of the direct addressing during the collision

and streaming steps, on the other hand we improve the performance during the

enforcement of boundary conditions.

Remark 4 Further advantages of the semi-direct addressing are expected in the

extension of the current LBM framework to deal with moving solids or for fluid-

structure interaction. In this context the connectivity matrix has to be reconstructed

every time-step due to the movement of the solid. A fully indirect approach would

require the reconstruction of the connectivity matrix for all the nodes (fluid and

boundary). On the contrary, our semi-direct approach limits the computation to the

solid nodes only, increasing computational efficiency.

Remark 5 Another semi-direct approach based on an exclusive enumeration of the

fluid nodes has been proposed in [25], we refer to that work for more details.

The choices of the distributions arrangement and of the addressing scheme have to

be followed by an ad hoc grid organization in the GPU kernels. As already observedin Remark 3, the use of independent kernels for the different routines allows for

independent grid organizations. In our implementation the grid layouts for the two-

and three-dimensional problems are illustrated in Figure 8. For each kernel the number

of threads in the grid coincides with the number of lattice nodes involved in the kernel

computation (i.e. one thread for one lattice node). On the other hand, the number of

threads per blocks and the number of blocks per grid may differ from one kernel to

another.

(a) Two-dimensional grid (b) Three-dimensional grid

Figure 8: Grid layout.

Code optimization. The optimization has been carried out aiming at improving the

performance and reducing the memory consumption.

Concerning the first aspect, the focus has been given to the specific improvement of

the various routines. As already observed in Section 2, the collision step is completely

local and therefore perfectly suited for GPU implementation. On the other hand, as

mentioned in Remark 2, a naive implementation of the streaming step may lead to

uncoalesced memory accesses, affecting negatively the code performance. In order to

Page 67: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 67/79

A modular lattice Boltzmann solver for GPU computing processors 67

avoid these uncoalesced operations an approach similar to the one proposed in [39]

has been adopted. In the streamParticles kernel, the grid has been organized in

such a way that one block contains all the N x nodes along one line in the x-direction,

this means that for each block the number of threads is equal to N x. Then the grid of

thread blocks is defined by the number of nodes N y and N z along the y- and z-direction(N z = 1 in two dimensions). The considered layout limits the maximum number of

admissible nodes along the x-direction to the maximum number of supported threads

of the device (e.g. 512 or 1024), but it has the main advantage to allow a coalesced

read and write of the distributions that propagates perpendicularly to the x-direction.

The remaining distributions –the ones propagating to cells with different x-indices–

are first buffered into shared memory and then transferred to the correct cell. Note

that, differently from the global memory, an uncoalesced filling of the shared memory

doesn’t affect the performance. Eventually, surrounding the lattice with a layer of

ghost cells, it has been possible to completely avoid in the same kernel the use of

conditional statements, which are typically needed to select the streaming directions on

boundary nodes. We remark that conditional statements may lead to thread divergence

and therefore to a reduction of performance.

Remark 6 The additional layer of ghost cells represents also a key element for the use

of domain decomposition techniques on multicore environments [32]. In particular,

together with the semi-direct addressing scheme, this is extremely useful in view of the

forthcoming implementation on hybrid CPU-GPU environments (see also [12]).

The reduction of memory requirements is the second aspect that we have

considered. As already mentioned, the lattice Boltzmann method is a highly

memory demanding numerical method: a standard implementation requires to store

2QN xN yN z scalar variables, that is all the f i and f i for each lattice node. The need

of this huge amount of unknowns per node may become critical in GPU devices,

as a consequence in our current framework we adopted the swapping technique

proposed in [24, 19] which allows to perform lattice Boltzmann simulations storingonly QN xN yN z variables (i.e. only the f i for each lattice node). We finally remark

that in [3] a different memory access technique with a similar characteristics in terms

of memory saving has been proposed and implemented on GPU devices.

5 Numerical experiments

In this section we present two numerical experiments respectively based on a two- and

a three-dimensional lid driven cavity. The former has the aim of validating our code,

the latter of evaluating its performance. The choice of these test cases has been made

in order to facilitate the comparsion with other published work (see e.g. [18, 29, 12]).

Remark 7 Note that the validation of the code is limited to the two-dimensional case

because of the generality of the implementation. As a matter of fact the routines used

are the same for all the dimensions, the only difference is related to the lattice structure

adopted.

Page 68: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 68/79

68 M. Astorino, J. Becerra Sagredo, A. Quarteroni

All the simulations are performed on the graphic device Nvidia GeForce GTX 480

using the CUDA Toolkit 4.0 RC2 [10]. It has 480 CUDA cores (15 SMs with 32 SPs

per multiprocessor) and its main features are: memory clock rate of 1848 Mhz,

total amount of global memory of 1.6 GBytes and 32768 registers per block. Even

though the compute capabilities of this card (= 2.0) allow for computations in single-and double-precision floating-point, here we will present only the single-precision

computations, which are sufficient for our validation purposes and more significant

for the comparison with previous works. Note however that, despite the higher

computational cost, in real-world applications the use of double-precision floating-

point is recommended in order to avoid any possible loss of accuracy.

5.1 Two-dimensional lid driven cavity

The two-dimensional lid driven cavity test is one of the most popular validation

problems for fluid flow simulations. In this test, the fluid is contained in a unitary

squared domain and it has Dirichlet boundary conditions on all sides: three stationary

sides and one (at the top) moving side, characterized by a unitary tangent velocity

(Figure 9).

Figure 9: Two-dimensional sketch of the lid driven cavity problem.

Two different flow conditions have been simulated, the first for a Reynolds number

of Re = 100, the other for Re = 1000. In both cases the D2Q9 lattice structure has

been adopted. Given the Reynolds number, the relaxation time that recovers the desired

macroscopic dynamics can be easily computed according to Equation (7). Choosing a

time-step δt = δx2 for both simulations, and a space-step δx = 1/128 in the first case

(i.e. 128 nodes per unit length) and δx = 1/256 in the second case (i.e. 256 nodes

per unit length), the two relaxation times are τ = 0.53 and τ = 0.503 respectively for

Re = 100 and Re = 1000.

The results of the two simulations are reported in Figure 10. The velocity profiles

in the horizontal and vertical midsections have been compared with those obtained by

a finite differences discretization of the incompressible Navier-Stokes equation in its

Page 69: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 69/79

A modular lattice Boltzmann solver for GPU computing processors 69

vorticity-stream function formulation [13]. From Figure 11 it can be observed that

they nicely match, validating on the one hand the capability of the lattice Boltzmann

method and on the other hand the correctness of our implementation.

Remark 8 It must be observed that the number of nodes used in [13] for the test case

at Re = 1000 (129 per unit length) is smaller than the one used here. The reason of

our choice is that a fine space discretization usually improves both the accuracy and

the stability of the method, especially when first order accurate boundary conditions

(like the ones used here) are adopted for high Re simulations. For more details on the

stability issues, we refer to [22, 7] where the stability of the LBM has been assessed

with respect to the Reynolds number for different types of boundary conditions.

(a) Re=100 (b) Re=1000

Figure 10: Velocity magnitude and streamlines for two flow conditions.

5.2 Three-dimensional lid driven cavity

In this section the three-dimensional lid driven cavity test is adopted to investigate

the performance of our LBM code. Similarly to the two-dimensional problem, the

three-dimensional case is characterized by a fluid flow in a cubic domain driven by

a tangential unitary velocity along one of the six boundary surfaces. Homogeneous

Dirichlet conditions are adopted on all the other boundaries. Among the different

lattice structures that can be adopted in three dimensions (see Appendix 7), here we

consider the D3Q15 and the D3Q19. An example of the computational results that can

be obtained with the D3Q15 is given in Figure 12, similar results can be retrieved by

using the D3Q19.

In order to evaluate the performance, two different tests have been analyzed. In

the first case, we considered a unitary cubic domain discretized with different regular

Page 70: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 70/79

70 M. Astorino, J. Becerra Sagredo, A. Quarteroni

(a) Re=100

(b) Re=1000

Figure 11: Comparison between the velocity profiles given by our LBM code and those

reported by Ghia et al. in [13] for the two-dimensional lid driven cavity problem.

lattices made of a number of nodes ranging between 643 and 2563. In the second case

the lengths of the domain in the y- and z-directions are assumed equal and uniformly

discretized with 128 nodes, while the discretization (and length) in the x-direction

varies between 128 and 1024 nodes. For both tests three distinct grid layouts have

been considered, based respectively on 64, 128 and 256 threads per block.

Remark 9 Note that a different number of threads per block implies performance

variations for all the kernels in the code but the one of thestreamParticles

routine. As a matter of fact, for the latter the number of threads per block is fixed

and equal to the number of nodes in the x-direction.

The performances, reported in Figure 13 for the first test and in Figure 14 for the

Page 71: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 71/79

A modular lattice Boltzmann solver for GPU computing processors 71

Figure 12: Velocity magnitude for the three-dimensional lid driven cavity at Re = 300.

second one, are expressed in Million of Lattice node Updates Per Second (MLUPS),

which indicates the total number of lattice collision and streaming steps performed inone second.

As expected, in both tests the performance of the D3Q15 lattice structure is higher

than that of the D3Q19, because of the smaller set of unknown needed per node. For

both lattice structures the peak performance are reached in the second test, when at least

512 nodes are used in the x-direction (i.e. 512 threads per block are used in the kernel

of the streamParticle routine). The D3Q15 and the D3Q19 structures attain

respectively a maximum of 496 MLUPS and 375 MLUPS. Concerning the choice

of the grid layout for the remaining kernels, it is interesting to observe that the best

performances are reached for a number of threads per block larger than 128. A worthy

remark is that the adopted memory-saving implementation allows the use of very fine

lattices: up to 16 Mio. nodes for the D3Q15 and up to 8 Mio. for the D3Q19. To

the best of authors knowledge this is a result that haven’t been obtained yet with other

single GPU implementations.

The performance of our code is comparable to the ones of [3, 4, 12], nonetheless

it must be observed that other works seem to reach even higher performance [39, 29].

This difference is likely due to our constraint in having a modular and memory saving

implementation, two elements that slightly deteriorate the performance of a code. It

is finally worth mentioning that in [30] a cavity test with 2573 nodes realized with

a single-precision parallel CPU code recorded a maximum of 16.7 MLUPS with 16

OpenMPI processes on a dual Xeon E5560 at 2.8 GHz. Even though in single-

precision, these results confirm that for the LB method GPU implementations are

competitive and cost-effective compared to CPU implementations.

6 Conclusions

We have presented a general modular framework for the lattice Boltzmann method

on GPUs. Our lattice Boltzmann GPU solver is characterized by various novelties

with respect to similar existing published work. First and most important the modular

Page 72: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 72/79

72 M. Astorino, J. Becerra Sagredo, A. Quarteroni

64 96 128 160 192 224 256

Number of nodes

340

360

380

400

420

440

460

480

M L U P S

64 threads per block

128 threads per block

256 threads per block

(a) D3Q15

64 96 128 160 192 224

Number of nodes

280

300

320

340

360

M L U P S

64 threads per block

128 threads per block

256 threads per block

(b) D3Q19

Figure 13: Performance of the code for different lattice discretizations of a unitary

cubic domain.

framework presented in Section 4, that exploits the mathematical properties of the

Gauss-Hermite quadrature rules to guarantee a general implementation that can be usedfor any kind of two- or three-dimensional lattice structure. The use of separate CUDA

kernels allows also to independently select for each routine the grid layout that offers

the best performance. Besides, the proposed implementation is characterized by an

ad hoc SoA formulation that facilitate the swapping of opposite particle distributions

and by a semi-indirect addressing scheme which allow an efficient computation of the

solution both on fluid and boundary nodes. Finally the swapping technique proposed

in [24] has been adopted to reduce the memory requirements for a given lattice size by

a factor of two.

The proposed implementation conjugates a high degree of modularity with the use

of efficient data structures. As a result the performance of the code remains satisfactory,

achieving on a NVIDIA GeForce GTX 480 more than 490 MLUPS and 370 MLUPS

respectively for the D3Q15 and the D3Q19 lattice structures.

Future works will include the extension of the code to multiple GPUs and hybrid

CPU-GPU clusters and its application to real-world problems.

Page 73: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 73/79

A modular lattice Boltzmann solver for GPU computing processors 73

0 128 256 384 512 640 768 896 1024

Number of nodes x-direction

400

420

440

460

480

500

M L U P S

64 threads per block

128 threads per block

256 threads per block

(a) D3Q15

128 256 384 512

Number of nodes x-direction

320

330

340

350

360

370

M L U P S

64 threads per block

128 threads per block

256 threads per block

(b) D3Q19

Figure 14: Performance of the code for different lattice discretizations of the x-

direction.

Acknowledgments

The authors would like to acknowledge the financial support of the Swiss Platformfor High-Performance and High-Productivity Computing (HP2C) and of the European

Research Council through the Advanced Grant Mathcard, Mathematical Modelling and

Simulation of the Cardiovascular System, Project ERC-2008-AdG 227058.

7 Lattice structures

Below we provide the various lattice structures implemented in our lattice Boltzmann

code, following the formulation given in [19]. For each lattice structure the speed of

sound cs, the lattice weights wi and the lattice velocities ei are reported.

The lattice velocities are grouped according to their euclidean norm under the

corresponding lattice weight. For the sake of simplicity the structures are given in

lattice units, i.e. assuming a unitary dimensionless space and time discretizations

(δt = 1 and δx = 1).

Page 74: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 74/79

74 M. Astorino, J. Becerra Sagredo, A. Quarteroni

D2Q5

c2s = 1

2

w0 = 0 wl =

1

4with l = 1, 2, 3, 4

e0 = (0, 0) e1 = (−1, 0)e2 = (0, −1)e3 = (1, 0)e4 = (0, 1)

D2Q9

c2

s =1

3

w0 = 49 wl = 1

9 wm = 136

with l = 2, 4, 6, 8 with m = 1, 2, 3, 4

e0 = (0, 0) e2 = (−1, 0) e1 = (−1, 1)e4 = (0, −1) e3 = (−1, −1)e6 = (1, 0) e5 = (1, −1)e8 = (0, 1) e7 = (1, 1)

D3Q15

c2s = 1

3

w0 = 29 wl = 1

9 wm = 172

with l = 1, 2, 3, 8, 9, 10 with m = 4, 5, 6, 7,11, 12, 13, 14

e0 = (0, 0, 0) e1 = (−1, 0, 0) e4 = (−1, −1, −1)e2 = (0, −1, 0) e5 = (−1, −1, 1)e3 = (0, 0, −1) e6 = (−1, 1, −1)e8 = (1, 0, 0) e7 = (−1, 1, 1)e9 = (0, 1, 0) e11 = (1, 1, 1)e10 = (0, 0, 1) e12 = (1, 1, −1)

e13 = (1, −1, 1)e14 = (1, −1, −1)

Page 75: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 75/79

A modular lattice Boltzmann solver for GPU computing processors 75

D3Q19

c2s = 1

3

w0 =

1

3 wl =

1

18 wm =

1

36with l = 1, 2, 3, 10, 11, 12 with m = 4, 5, 6, 7, 8, 9, 1314, 15, 16, 17, 18

e0 = (0, 0, 0) e1 = (−1, 0, 0) e4 = (−1, −1, 0)e2 = (0, −1, 0) e5 = (−1, 1, 0)e3 = (0, 0, −1) e6 = (−1, 0, −1)e10 = (1, 0, 0) e7 = (−1, 0, 1)e11 = (0, 1, 0) e8 = (0, −1, −1)e12 = (0, 0, 1) e9 = (0, −1, 1)

e13 = (1, 1, 0)e14 = (1, −1, 0)e15 = (1, 0, 1)

e16 = (1, 0, −1)e17 = (0, 1, 1)e18 = (0, 1, −1)

D3Q27

c2s = 1

3

w0 = 8

27

wl = 2

27

wm = 1

54

wn = 1

216l = 1, 2, 3, m = 4, 5, 6, 7, 8, n = 10, 11, 12,14, 15, 16 9, 17, 18, 19, 20, 21, 22 13, 23, 24, 25, 26

e0 = (0, 0, 0) e1 = (−1, 0, 0) e4 = (−1, −1, 0) e10 = (−1, −1, −1)e2 = (0, −1, 0) e5 = (−1, 1, 0) e11 = (−1, −1, 1)e3 = (0, 0, −1) e6 = (−1, 0, −1) e12 = (−1, 1, −1)e14 = (1, 0, 0) e7 = (−1, 0, 1) e13 = (−1, 1, 1)e15 = (0, 1, 0) e8 = (0, −1, −1) e23 = (1, 1, 1)e16 = (0, 0, 1) e9 = (0, −1, 1) e24 = (1, 1, −1)

e17 = (1, 1, 0) e25 = (1, −1, 1)e18 = (1, −1, 0) e26 = (1, −1, −1)e19 = (1, 0, 1)

e20 = (1, 0, −1)e21 = (0, 1, 1)e22 = (0, 1, −1)

Page 76: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 76/79

76 M. Astorino, J. Becerra Sagredo, A. Quarteroni

References

[1] C.K. Aidun and J.R. Clausen. Lattice-Boltzmann method for complex flows.

Annu. Rev. Fluid Mech., 42:439–472, 2010.

[2] S Ansumali. Minimal kinetic modeling of hydrodynamics. PhD thesis, ETH

Zurich, 2004.

[3] P. Bailey, J. Myre, S.D.C. Walsh, D.J. Lilja, and M.O. Saar. Accelerating

lattice Boltzmann fluid flow simulations using graphics processors. In 2009

International Conference on Parallel Processing, pages 550–557. IEEE, 2009.

[4] M. Bernaschi, M. Fatica, S. Melchionna, S. Succi, and E. Kaxiras. A flexible

high-performance lattice Boltzmann GPU code for the simulations of fluid flows

in complex geometries. Concurrency Computat.: Pract. Exper., 22(1):1–14,

2010.

[5] A. Caiazzo. Analysis of lattice Boltzmann initialization routines. Journal of

statistical physics, 121(1):37–48, 2005.

[6] C. Cercignani. Mathematical methods in kinetic theory. Plenum Press New York,

1969.

[7] P. Chen. The lattice Boltzmann method for fluid dynamics: theory and

applications. Master’s thesis, EPFL, 2011.

[8] S. Chen and G.D. Doolen. Lattice Boltzmann method for fluid flows. Ann. Rev.

Fluid Mech., 30:329–364, 1998.

[9] B. Chopard and M. Droz. Cellular automata modeling of physical systems.

Cambridge University Press Cambridge, UK, 1998.

[10] NVIDIA CUDA. Programming Guide, Version 4.0, 2011.

[11] D. d’Humieres, I. Ginzburg, M. Krafczyk, P. Lallemand, and L.S. Luo. Multiple-

relaxation-time lattice Boltzmann models in three-dimensions. Trans. R. Soc.

Lond. A, 360(1792):437–451, 2002.

[12] C. Feichtinger, J. Habich, H. K’ostler, G. Hager, U. Rude, and G. Wellein. A

flexible patch-based lattice boltzmann parallelization approach for heterogeneous

gpu-cpu clusters. Parallel Computing, 37(9):536 – 549, 2011.

[13] U. Ghia, K.N. Ghia, and C.T. Shin. High-Re solutions for incompressible flow

using the Navier-Stokes equations and a multigrid method. J. Comput. Phys.,

48(3):387–411, 1982.

[14] J. Habich, T. Zeiser, G. Hager, and G. Wellein. Performance analysis and

optimization strategies for a d3q19 lattice boltzmann kernel on nvidia gpus using

cuda. Advances in Engineering Software, 42(5):266–272, 2011.

Page 77: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 77/79

A modular lattice Boltzmann solver for GPU computing processors 77

[15] F.J. Higuera and J. Jimenez. Boltzmann approach to lattice gas simulations. EPL

(Europhysics Letters), 9:663–668, 1989.

[16] J. Hoberock and N. Bell. Thrust: A parallel template library, 2010. Version 1.3.0.

[17] P.H. Kao and R.J. Yang. An investigation into curved and moving boundarytreatments in the lattice Boltzmann method. J. Comput. Phys., 227(11):5671–

5690, 2008.

[18] F. Kuznik, C. Obrecht, G. Rusaouen, and J.J. Roux. LBM based flow simulation

using GPU computing processor. Comput. Math. Appl., 59(7):2380–2392, 2010.

[19] J. Latt. How to implement your DdQq dynamics with only q variables per node

(instead of 2q). Technical report, Tufts University, June 2007.

[20] J. Latt. Hydrodynamic limit of lattice Boltzmann equations. PhD thesis, Geneva

University, 2007.

[21] J. Latt and B. Chopard. Lattice Boltzmann method with regularized non-

equilibrium distribution functions. Math. Comp. Sim., 72:165–168, 2006.

[22] J. Latt, B. Chopard, O. Malaspinas, M. Deville, and A. Michler. Straight velocity

boundaries in the lattice Boltzmann method. Phys. Rev. E , 77:056703, 2008.

[23] O. Malaspinas. Lattice Boltzmann method for the simulation of viscoelastic fluid

flows. PhD thesis, EPFL, 2009.

[24] K. Mattila, J. Hyvaluoma, T. Rossi, M. Aspnas, and J. Westerholm. An efficient

swap algorithm for the lattice Boltzmann method. Comput. Phys. Commun.,

176(3):200–210, 2007.

[25] K. Mattila, J. Hyvaluoma, J. Timonen, and T. Rossi. Comparison of

implementations of the lattice-Boltzmann method. Comput. Math. Appl.,

55(7):1514–1524, 2008.

[26] G.R. McNamara and G. Zanetti. Use of the Boltzmann equation to simulate

lattice-gas automata. Phys. Rev. Lett., 61(20):2332–2335, Nov 1988.

[27] R. Mei, L.-S. Luo, P. Lallemand, and D. d’Humieres. Consistent initial conditions

for lattice Boltzmann simulations. Comp. Fluids, 35:855–862, 2006.

[28] R.R. Nourgaliev, T.N. Dinh, T.G. Theofanous, and D. Joseph. The

lattice Boltzmann equation method: theoretical interpretation, numerics and

implications. Int. J. Multiphase Flow, 29(1):117 – 169, 2003.

[29] C. Obrecht, F. Kuznik, B. Tourancheau, and J.J. Roux. A new approach to the

lattice Boltzmann method for graphics processing units. Comput. Math. Appl.,

2010.

[30] C. Obrecht, F. Kuznik, B. Tourancheau, and J.J. Roux. The thelma project: A

thermal lattice boltzmann solver for the gpu. Computers and Fluids, 54(0):118 –

126, 2012.

Page 78: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 78/79

78 M. Astorino, J. Becerra Sagredo, A. Quarteroni

[31] openCL. http://www.khronos.org/opencl/.

[32] B. Palmer and J. Nieplocha. Efficient algorithms for ghost cell updates on

two classes of mpp architectures. In S.G. Akl and T. Gonzalez, editors, PDCS

International Conference on Parallel and Distributed Computing Systems, pages

197–202. ACTA Press, Anaheim, CA, United States(US), 2002.

[33] Y.H. Qian, D. d’Humieres, and P. Lallemand. Lattice BGK models for Navier-

Stokes equation. EPL (Europhysics Letters), 17:479, 1992.

[34] S. Ryoo, C.I. Rodrigues, S.S. Baghsorkhi, S.S. Stone, David B. Kirk, and W.W.

Hwu. Optimization principles and application performance evaluation of a

multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN

Symposium on Principles and practice of parallel programming, PPoPP ’08,

pages 73–82, New York, NY, USA, 2008. ACM.

[35] M. Schulz, M. Krafczyk, J. Tolke, and E. Rank. Parallelization strategies and

efficiency of CFD computations in complex geometries using lattice Boltzmann

methods on high-performance computers. In High performance scientificand engineering computing: proceedings of the 3rd International FORTWIHR

Conference on HPSEC, Erlangen, March 12-14, 2001, page 115. Springer Verlag,

2002.

[36] X. Shan, X.-F. Yuan, and H. Chen. Kinetic theory representation of

hydrodynamics: a way beyond the Navier-Stokes equation. J. Fluid Mech.,

550:413–441, 2006.

[37] S. Succi. The lattice Boltzmann equation for fluid dynamics and beyond . Oxford

University Press, USA, 2001.

[38] J. Tolke. Implementation of a Lattice Boltzmann kernel using the Compute

Unified Device Architecture developed by nVIDIA. Comput. Visual. Sci.,

13(1):29–39, 2010.

[39] J. Tolke and M. Krafczyk. TeraFLOP computing on a desktop PC with GPUs for

3D CFD. Int. J. Comput. Fluid. Dynam., 22(7):443–456, 2008.

[40] G. Wellein, T. Zeiser, G. Hager, and S. Donath. On the single processor

performance of simple lattice Boltzmann kernels. Computers & Fluids, 35(8-

9):910–919, 2006.

[41] D.A. Wolf-Gladrow. Lattice-gas cellular automata and lattice Boltzmann

models: an introduction, volume 1725 of Lecture Notes in Mathematics.

Springer, 2000.

Page 79: Journal 59 Sema

7/30/2019 Journal 59 Sema

http://slidepdf.com/reader/full/journal-59-sema 79/79