universit a degli studi di bologna - unibo.it · universit a degli studi di bologna facolta di...

Universita degli Studi di Bologna

FACOLTA DI SCIENZE MATEMATICHE, FISICHE E

NATURALI

Dottorato di Ricerca in Fisica, XVIII Ciclo

in cotutela con

l’Universite du Sud Toulon-Var, France

Tesi di Dottorato di Ricerca

Poincare Recurrences in

Mixed Dynamical Systems

and in Genomic Sequences

Luca Rossi

Direttore di Tesi presso l’Universita di Bologna

Prof. Giorgio Turchetti

Direttore di Tesi presso l’Universite du Sud Toulon-Var

Prof. Sandro Vaienti

Coordinatore del Corso di Dottorato in Fisica

Prof. Roberto Soldati

Settore disciplinare di afferenza: FIS/01 – MAT/07

Bologna, marzo 2006

In this Ph.D. thesis I present the results obtained from the study of Poincare

recurrences for mixed dynamical systems, that is systems composed of in-

variant regions, and from an application of such recurrences to the analysis

of coding and noncoding genomic sequences. These results have been also

published in the papers [1], [2], [3] and [4].

In this respect, I would like to clarify that when I use terms like “we” or

“our,” I implicitly refer, beside me, to Giorgio Turchetti and Sandro Vaienti,

who have been my supervisors and the persons I mainly collaborated with.

iii

Contents

1 Introduction 1

1.1 Poincare recurrences . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Shear flow 11

2.1 Skew map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Noteworthy properties . . . . . . . . . . . . . . . . . . 11

2.2 Statistics of first return times . . . . . . . . . . . . . . . . . . 13

2.3 Irrational rotations . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Existence of limit laws . . . . . . . . . . . . . . . . . . 15

2.4 Distributions of the number of visits . . . . . . . . . . . . . . 17

2.4.1 Successive returns for irrational rotations . . . . . . . 17

2.4.2 Distributions for the skew map . . . . . . . . . . . . . 19

3 Mixed dynamical systems 27

3.1 Distributions of the number of visits . . . . . . . . . . . . . . 27

3.2 Coupling of mixing maps . . . . . . . . . . . . . . . . . . . . 29

3.3 Coupling of a regular and of a mixing map . . . . . . . . . . 33

3.3.1 Recurrences for domains of finite size . . . . . . . . . . 33

3.3.2 Limit distributions of the number of visits . . . . . . . 35

3.4 Recurrences for the standard map . . . . . . . . . . . . . . . 38

3.4.1 Domains on the regular region . . . . . . . . . . . . . 39

3.4.2 Domains on the chaotic sea . . . . . . . . . . . . . . . 39

3.4.3 Domains intersecting the stochastic layer . . . . . . . 41

3.5 Dissipative Henon map . . . . . . . . . . . . . . . . . . . . . . 43

v

vi Contents

4 Poincare recurrences and genomic sequences 49

4.1 Genome and genetic information . . . . . . . . . . . . . . . . 50

4.2 Extraction of genomic sequences . . . . . . . . . . . . . . . . 51

4.3 Preliminary statistical analyses . . . . . . . . . . . . . . . . . 51

4.3.1 Frequency distributions . . . . . . . . . . . . . . . . . 52

4.3.2 Multifractal analysis . . . . . . . . . . . . . . . . . . . 54

4.4 Recurrence statistics . . . . . . . . . . . . . . . . . . . . . . . 57

Bibliography 61

Chapter 1

Introduction

During his study of the three-body problem, Poincare gave the proof of the

following theorem, which appeared in the famous memoir [5] of 1890:

If a flow preserves volume and has only bounded orbits then for

each open set there exist orbits that intersect the set infinitely

often.

It is well known that it played a crucial role in the development of statisti-

cal mechanics at the end of the nineteenth century. The opponents of the

atomistic hypothesis considered this theorem as one of the strongest argu-

ment against the possibility to think the matter as a collection of particles

moving according to Newton’s laws of dynamics. In fact, this would have

led to contradict the laws of thermodynamics, well confirmed by the expe-

rience. However, a few years later Boltzmann was able to reconcile these

two apparent incompatible conceptions of matter, and this helped to clarify

some of the most controversial conceptual issues present in the framework

of statistical mechanics.

Recently, the study of Poincare recurrences has received a growing atten-

tion, above all by the theory of dynamical systems (see Refs. [6] and [7] for

an overview). This is mainly due to the fact that Poincare recurrences may

be used to investigate the ergodic and statistical properties concerning the

global dynamics of a system over wide regions of the phase space (see, for

example, Refs. [8] and [9]). In this respect, changes in the recurrence statis-

tics have been observed for transitions from normal to anomalous transport

(see Refs. [10] and [11]). Thanks to Poincare recurrences it is also possi-

1

2 Introduction

ble to compute the metric entropy of a system equipped with an ergodic

measure [12]. Moreover, they seem to be connected, at least for particular

kinds of dynamical systems, to other quantities used to describe the fractal

properties of the dynamics [13].

During the last years the statistics of first return times has been ex-

tensively studied, and rigorous results have been obtained in two different

situations: for systems with strong mixing properties and for zero-entropy

systems like irrational rotations. For these two cases the global features of

recurrence are fairly well understood.

In fact, for a wide class of strongly mixing systems it has been proved

(see Refs. [8] and [14]–[21]) that the limit recurrence statistics is exponential,

even if they are not uniformly hyperbolic. However, this result was obtained

by choosing the shrinking set as a ball whose radius goes continuously to

zero, or as a cylinder originating from a dynamical partition of the phase

space. Moreover, the convergence to the function e−t holds for almost every

points of the phase space of these systems: by taking a point which is

not generic (for example a periodic point), there are proofs that the limit

statistics can be different.

Contrary to the behaviour enjoyed by the strongly mixing systems, for

the one-dimensional irrational rotations of the circle there are at most three

possible first returns in each subset [22, 23]. This prevents the existence of

any limit recurrence statistics, unless the shrinking sets are chosen in a very

particular class of intervals with strong arithmetic properties [24, 25].

A recent paper [26] shows that for all aperiodic ergodic dynamical sys-

tems any kind of distribution can be obtained, provided that the decreasing

sequence of sets is chosen suitably around all points, but in general such sets

will not be balls or cylinders.

Unfortunately, for systems of higher physical importance, like low di-

mensional Hamiltonian systems, very little is known, despite the interest

in obtaining analytical results. In this respect, by studying a model of the

hyperbolic part of the phase space of an Hamiltonian system near a hierar-

chical islands structure, lower and upper bounds were produced for the limit

statistics of first return times in terms of a power law [27]. This example

worked out a self-similar structure of the phase space, in the same spirit

as the model proposed in Refs. [28]–[30] for the dynamics of sticky sets in

1.1 Poincare recurrences 3

Hamiltonian systems.

Although it is difficult to obtain rigorous results, interesting indications

about the behaviour of significant systems may come by performing care-

ful numerical investigations. In this regard, the intense numerical studies

performed by several authors suggest that, in the thin stochastic layer sur-

rounding a chain of islands, the decay of Poincare recurrences could follow a

power law due to the sticking phenomenon, which is believed to be respon-

sible for the anomalous diffusion modeled by Levy like processes [11, 31].

Furthermore, a mixture of exponential and power law decays has been ob-

served in a model of stationary flow with hexagonal symmetry, when the

transport is anomalous [10].

1.1 Poincare recurrences

Before introducing the notion of Poincare recurrences, I would like to briefly

recall some of the basic definitions concerning dynamical systems.

In this respect, let us consider a dynamical system (Ω, T, µ), where T is a

transformation defined on the phase space Ω, and µ represents a probability

measure, that is µ(Ω) = 1.

Definition 1.1 (invariant measure) Given a dynamical system (Ω, T ,

µ), the measure µ is said to be invariant with respect to T if for any mea-

surable set A ⊆ Ω it holds µ(T−1(A)) = µ(A).

The following theorem, due to Poincare, applies to such dynamical sys-

tems, which are therefore also called recurrent systems.

Theorem 1.1 (Poincare) Let be (Ω, T , µ) a dynamical system whose

measure µ is T -invariant. Then, for any measurable set A ⊆ Ω, µ-almost

every point x ∈ A returns infinite times in A, that is there exist infinite

positive integer numbers k such that T k(x) ∈ A.

Among the possible definitions of ergodicity, the one that developed from

statistical mechanics and that was known as Boltzmann’s ergodic hypothesis

is probably the most significant from a physical point of view.

Definition 1.2 (ergodic system) Let be (Ω, T , µ) a dynamical system

whose probability measure µ is T -invariant. If for any integrable function f

4 Introduction

we have

limN→∞

1

N

N−1∑

k=0

f(T k(x)) =

∫

Ωf(x) dµ (1.1)

µ-almost everywhere, the system is said to be ergodic.

An equivalent definition of ergodicity is the following.

Definition 1.3 (ergodic measure) Given a dynamical system (Ω, T , µ)

whose probability measure µ is T -invariant, µ is said to be T -ergodic if for

any f, g ∈ L2(Ω) we have:

limN→∞

1

N

N−1∑

k=0

∫

Ωf(

T k(x))

g(x) dµ =

∫

Ωf(x) dµ

∫

Ωg(x) dµ. (1.2)

Finally, there is a class of systems, characterized by a rapid decay of the

correlations, which has played an important role in the recent development

of the theory of Poincare recurrences.

Definition 1.4 (strongly mixing system) A dynamical system (Ω, T ,

µ), whose probability measure µ is T -invariant, is said to be strongly mixing

if for any f, g ∈ L2(Ω) it holds:

limk→∞

∫

Ωf(

T k(x))

g(x) dµ =

∫

Ωf(x) dµ

∫

Ωg(x) dµ. (1.3)

Let us consider now the statistics of Poincare recurrences, also known as

statistics of first return times. To this end, let (Ω, T , µ) be a dynamical sys-

tem equipped with a T -invariant probability measure. Taking a measurable

set A ⊆ Ω and a point x ∈ A, the first return time of x into A is defined as

τA(x) = min(

k ∈ N : T k(x) ∈ A

∪ +∞)

. (1.4)

Thus, τA(x) is a positive integer number or, if x does not return (that is,

T k(x) 6∈ A for any k ∈ N), τA(x) = ∞. The mean return time into A is

given by:

〈τA〉 =

∫

AτA(x) dµA, (1.5)

where µA denotes the conditional measure with respect to A: µA(B) =

µ(B∩A)/µ(A), for any measurable B ⊆ Ω. For ergodic systems Kac proved

the following important result.

1.1 Poincare recurrences 5

Theorem 1.2 (Kac) If the dynamical system (Ω, T , µ) is ergodic, then

for any set A ⊆ Ω, with µ(A) > 0, it holds:

〈τA〉 =1

µ(A). (1.6)

We may then introduce the statistics of first return times as

FA(t) = µA

(

x ∈ A : τA(x)/〈τA〉 > t)

. (1.7)

One of the main questions is whether the limit statistics,

Fx(t) = limµ(A)→0

FA(t), (1.8)

exists when the set A shrinks toward a given point x ∈ Ω. Note that, from

now on, I will drop the dependence on the point x in the notation of the

limit statistics of first return times, writing it simply as F (t), since it will

be clear from the context which is the point considered.

Beside the recurrence statistics, it is possible to define the distribution

of first return times:

Gr,A(t) = µA

(

x ∈ A : τA(x)/〈τA〉 ≤ t)

, (1.9)

denoting with Gr(t) the corresponding limit distribution for µ(A) → 0, when

it exists. It is easy to see that Gr,A(t) = 1 − FA(t).

An extension of the notion of statistics of first return times is repre-

sented by the distributions of the number of visits, whose definition is based

on successive return times. For this purpose, let us consider the following

quantity:

ξA(t;x) =

b〈τA〉 tc∑

j=1

(χA T j

)(x), (1.10)

where χA is the characteristic function of a measurable set A ⊆ Ω, while

the symbol b . c represents the integer part function. It is easy to verify

that ξA(t;x) measures how many times a point x ∈ A returns into A after

b〈τA〉 tc iterations of the map T .

We will be mainly interested in the distributions of the number of visits

in A,

Fk,A(t) = µA

(

x ∈ A : ξA(t;x) = k)

, (1.11)

6 Introduction

in the limit for µ(A) → 0, denoting the limit distributions, whenever they

exist, by

Fk(t) = limµ(A)→0

Fk,A(t). (1.12)

Of particular interest is the distribution of order k = 0; in this case, in fact,

Eq. (1.11) gives the statistics (with respect to t) of first return times as

defined in Eq. (1.7). In this respect, I would like to remark that I will refer

to the recurrence statistics equally as FA(t) or F0,A(t).

1.2 Main results

The main purpose of the present work has been to study Poincare recur-

rences for systems where regular and chaotic motions coexist, trying to give

rigorous results when possible or, otherwise, performing accurate numerical

investigations.

To this end, G. Turchetti, S. Vaienti and I started by considering, as a

model of the dynamics of systems showing regular behaviours, the following

skew map defined on the cylinder C = T × [0, 1],

R :

x′ = x + y mod 1,

y′ = y,(1.13)

which is area-preserving with respect to the usual Lebesgue measure, and

has zero entropy. Perturbing this simple map leads, according to the KAM

theory, to a transformation that is integrable only for a subset of C whose

Lebesgue measure approaches one as the amplitude of the perturbation van-

ishes. As an example, the standard map is reduced to R when the coupling

parameter goes to zero. Moreover, this transformation has been used to

describe the flow on a square billiard [32].

It is interesting to observe that this map describes a shear flow in which,

for almost all the ordinates y, the dynamics along the fiber placed at y is

given by the irrational rotation: x′ = x + y mod 1. Since the velocity of

rotation is different for each invariant torus T, this map is also referred to

as “anisochronous rotations” on the two-dimensional cylinder.

Despite the fact that irrational rotations are ergodic, this does not hold

for R. However, it enjoys a sort of local mixing property, caused by fil-

1.2 Main results 7

amentation, which seems responsible for the existence of limit recurrence

statistics.

The first result I obtained for this map was to rigorously prove that the

statistics of first return times for a particular kind of domains of C exists

and follows an asymptotic power law like t−2. This allowed to prove also

the existence of the limit recurrence statistics for the fixed points of the

map. Furthermore, having developed an algorithm which reproduces the

dynamics of R and is characterized by an algebraic computational complex-

ity, I obtained strong numerical evidences that the result on the asymptotic

polynomial decay of the statistics is valid even for a generic subset of the

cylinder.

Subsequently, I investigated the distributions of the number of visits,

which, as seen, represent an extension of the statistics of first return times.

Despite R is horizontally almost everywhere foliated by irrational rota-

tions (which seem to admit piecewise constant limit distributions only if

the shrinking domain is chosen in a descending chain of renormalization in-

tervals), the analysis of the distributions of the number of visits suggests

the existence of the corresponding limit laws for domains that shrink in an

arbitrary way around points of the cylinder. In particular, for square sets

containing the fixed points of the map, the distributions present even in this

case an asymptotic decay like t−2.

Since the same features may be found for the distributions computed

by assuming that the differences between successive return times are inde-

pendent, we believe that, although our skew map is not ergodic, the local

mixing property it enjoys plays here some role. Moreover, through accurate

numerical investigations I verified that an asymptotic power law decay like

a t−β also holds for arbitrary rectangular domains not containing the fixed

points, although in this case the exponent β is usually greater than two and

seems to grow along with the order k.

This will be discussed in detail in Chapter 2.

Instead, in Chapter 3 I will present the results concerning Poincare re-

currences for mixed systems, that is systems composed of invariant regions

with respect to the dynamics. We proved that the distributions of the num-

ber of visits (which include, of course, the statistics of first return times)

for a domain A that intersects the boundary between two invariant regions

8 Introduction

is a linear superposition of the distributions characteristic of such regions,

weighted by coefficients equal to the relative size of the intersection of A with

each invariant region. Under a condition of continuity, this also holds in the

limit when A shrinks toward a point of the boundary. I checked numerically

this result for a system whose invariant components are two strongly mixing

maps, verifying that the general formula obtained describes very well the

behaviour of the distributions of the number of visits for domains crossing

the two invariant regions.

Such a formula allows also to understand why, by coupling a generic reg-

ular system and a mixing system together (whose distributions are assumed

to follow a polynomial and an exponential decay respectively), the regular

region asymptotically gives the main contribution, which appears as a power

law tail, to the distributions computed for domains of finite positive measure

containing the boundary. I would like to remark that this effect has been

already observed by other authors, without receiving however a theoretical

explanation.

Concerning the limit distributions for points belonging to the boundary,

the theorem assuring their existence may not be directly applied in the case

our skew map is coupled with an arbitrary strongly mixing transformation.

However, I was able to show, by proving it in a particular situation and

by numerical investigations when considering more general cases, that the

limit distributions are ruled out by the mixing component, despite their

expression differs from the Poissonian one found in the pure mixing case.

Although this two-dimensional system represents a rather simple model,

nevertheless the results obtained appear to describe well what happens for

systems of higher physical interest, such as the standard map, when we

consider domains where regular and chaotic motions coexist. In particular,

this model seems to provide a possible explanation for the existence of the

power law tails observed for the distributions of domains lying in the chaotic

“sea” far away from regular orbits.

In this respect, I investigated the standard map in a regime in which

the stochastic layer between the regular orbits and the chaotic sea may be

considered as a sharp boundary. The numerical analysis performed for do-

mains wholly contained in the regular region has shown that, as it happens

for the simple model represented by our skew map, the distributions asymp-

1.2 Main results 9

totically follow a power law decay with an exponent near 2, and there is

some evidence suggesting that the exponent slightly grows as the order k

increases.

For domains lying on the chaotic sea and far away from the integrable

region the distributions depart from the expected Poissonian behaviour and

still decay like a t−β, with β ' 2. This is reasonably due to the fact that

most of the orbits originating from points of the chaotic sea closely approach,

sooner or later, the regular region. Moreover, the distributions concerning

domains that intersect the stochastic layer are in agreement with the results

obtained for mixed systems.

The distributions of the number of visits appear therefore capable to

capture some of the fundamental features of the dynamics, and to provide

some information about the relative measures of the components where it

differs.

Finally, in Chapter 4 I will discuss the application of the statistics of

first return times to the genomic sequences regarded as a special kind of dy-

namical system. In particular, I tried to understand whether the capability

of Poincare recurrences to capture the different qualitative properties of the

dynamics could be used as a tool able to distinguish between the coding and

noncoding regions of genomes.

Unfortunately this does not happen, because the statistics of first return

times follows the same exponential behaviour for both coding and noncoding

sequences. However, taking into account that this behaviour is typical of

strongly mixing systems, it seems sensible to interpret the results obtained as

suggesting that if long-range correlations are present in the sequences, their

weight should be negligible, at least compared to the one of the short-range

correlations.

10 Introduction

Chapter 2

Shear flow

2.1 Skew map

My study of the recurrence properties of regular dynamics started by consid-

ering the following integrable skew map, which is defined over the cylinder

C = T × [0, 1],

R :

x′ = x + y mod 1,

y′ = y.(2.1)

It is an area preserving transformation, with respect to the usual Lebesgue

measure µ, and has zero entropy. This map describes a shear flow and its

behaviour is rather simple. In fact, each point (x, y) ∈ C is transformed

according to a one-dimensional rotation whose rotation number is y. Thus,

the cylinder C appears to be foliated by invariant tori, and the rotation

velocity changes along the y axis.

2.1.1 Noteworthy properties

Despite its simple behaviour, the map (2.1) presents some interesting fea-

tures. First, it enjoys a sort of local mixing property; G. Turchetti proved

(see Ref. [1] for more details) that in a particular, although important, case,

the autocorrelation decay goes like O(n−1), with n being the number of

iterations of the map R.

More precisely, let us consider the cylinder Cε = T× [0, ε], and define the

11

12 Shear flow

conditional measure µε as

µε(A) =µ(A)

µ(Cε)=

µ(A)

ε, A ⊆ Cε. (2.2)

Note that the cylinder Cε is invariant with respect to R, and µε is the in-

variant measure therein.

Proposition 2.1 Given the dynamical system (Cε, R, µε), the following prop-

erty holds for domains like Aε = [x, x+ε]× [0, ε], Aε ⊆ Cε, after n iterations

of the map R:∣∣µε(Aε ∩ Rn(Aε)) − µ2

ε(Aε)∣∣ = O(n−1). (2.3)

It seems sensible to expect that the same may be valid even for a generic

domain, although in this case it is not easy to give a proof. Of course this

result differs from the usual mixing condition, since it has a local character

and does not require the ergodicity of the system.

Such a local mixing property, caused by filamentation, appears to be

responsible for the existence of a continuous limit statistics of first return

times, despite the fact that for each irrational y coordinate the corresponding

one-dimensional rotation does not admit, in general, a limit statistics.

In this respect, the skew map R shows another interesting property. To

compute the recurrence statistics FA(t), one need to know the mean return

time 〈τA〉. But, since the transformation (2.1) is not ergodic, we can not

apply Kac theorem to replace 〈τA〉 with the inverse of the measure of the

set A. Nevertheless, a direct computation of the mean return time can be

performed as well, obtaining for 〈τA〉 a value which is very similar to Kac’s

formula. In fact, by defining with µx and µy the Lebesgue measure along

the x and y axes, respectively, the following result holds.

Proposition 2.2 For the map (2.1), the mean return time 〈τA〉 into an

arbitrary measurable domain A ⊆ T × [0, 1] of positive Lebesgue measure is

given by

〈τA〉 =µy(IA)

µ(A), (2.4)

where

IA = y ∈ [0, 1] : µx(Ay) > 0,and

Ay =(x′, y′) ∈ A : y′ = y

.

2.2 Statistics of first return times 13

In other words, the mean return time equals the ratio between the measure

of the part of cylinder “visited” by all the images Rn(A), for n → ∞ [that

is, µy(IA)µx(T) ≡ µy(IA)], and the measure of A itself.

Despite Eq. (2.4) may seem involved, in many cases the result is very

simple. If, for example, we choose A = [x, x+ε]× [y, y+ε], the mean return

time is 〈τA〉 = 1/ε.

2.2 Statistics of first return times

By considering the map (2.1), it is possible to rigorously prove that the

statistics of first return times exists for square domains of C whose lower

side is ‘placed’ on the fixed points of R.

At first, H. Hu had the idea to solve the problem by means of a geometric

construction. G. Turchetti developed subsequently a way to compute the

recurrence statistics for square domains of side ε = 1/m, with m ∈ N. I

extended this proof by finding initially the statistics of return times for

square subsets of side ε = n/m, with n,m ∈ N, and then in the case ε is

an arbitrary real number between 0 and 1. This last result allowed me to

obtain an explicit expression for the limit statistics when the square domain

shrinks continuously toward one of the fixed points of R.

Proposition 2.3 Given the map (2.1), if Aε = [x, x + ε] × [0, ε] ⊂ Cε, with

0 < ε < 1 and 0 ≤ x < 1, then the statistics FAεof first return times into

Aε is

FAε(t) =

1, if 0 ≤ t < t1,

1/2, if tn ≤ t < tn+1, tn < tn,

1

2

[

1 − (tn − 1 + ε)2

tn ε

]

, if tn ≤ t < tn+1, tn = tn,

(1 − ε)2

2 tn(tn − ε), if tn ≤ t < tn+1, tn > tn,

(2.5)

where n = b1/εc and tn = n/〈τAε〉 = n ε, n ∈ N. Furthermore, the limit

14 Shear flow

statistics exists and is given by

F (t) ≡ limε→0+

FAε(t) =

1, if t = 0,

1/2, if 0 < t < 1,

1/2 t−2, if t ≥ 1.

(2.6)

What about more general cases? Unfortunately, already for subsets like

A = [x, x + ε] × [y, y + ε] with y > 0, the use of the geometric method

is very involved, because the lower side of A is no longer invariant. How-

ever, to try to investigate the recurrence statistics in such situations, above

all the asymptotic behaviour of FA(t), it is possible to turn to numerical

computations of the statistics. So, I decided to develop a numerical algo-

rithm which reproduces the geometric construction itself. The advantage of

this approach over more conventional statistical methods is represented by

the fact that one may obtain, compared to the latter, very highly accurate

results with limited computational resources, both in memory space and

runtime. In fact, the final accuracy is only affected by the propagation of

round off errors.

In order to test the reliability of my numerical algorithm, I started by

checking the results of the program obtained for domains Aε = [x, x + ε] ×[0, ε], for which the analytical expression of the statistics is given by the law

(2.5). As an example, in the case of the set A = [0, ε]× [0, ε], with ε = 10−3,

and t from 0 to 1000 (corresponding to 106 iterations of R), the maximum

difference between the computed and exact value is less then 10−16.

Subsequently, I computed FA(t) for a large sample of different domains

A = [0, ε] × [y, y + ε], varying both ε and y. Since I were mainly interested

on the asymptotic behaviour of the recurrence statistics, I used a least-

squares method to fit the statistics obtained against the function a t−β, with

t sufficiently large. The best-fit value found for the exponent β is always

very near to 2, and in most cases β differs from 2 less than 5 × 10−4, as

shown in Fig. 2.1.

Thus, it seems sensible to conclude that there is a strong evidence for the

limit statistics F (t) to follow asymptotically a power law decay as F (t) ∼ t−2

(Fig. 2.2).

2.3 Irrational rotations 15

-0.0015

-0.0010

-0.0005

0

0.0005

0.0010

0.0015

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

∆

y

1

Figure 2.1: Values of ∆ = β − 2 obtained by fitting the statistics FA(t)

against the function a t−β , with 3000 < t < 5000, for domains A = [0, ε]×[y, y + ε], ε = 10−2.

2.3 Irrational rotations

It is interesting, at this point, to compare the results just obtained for the

skew map (2.1) with the behaviour shown by Poincare recurrences in the

case of irrational rotations. In this respect, let us consider the following map,

which represents a rotation by an angle α (the so called rotation number)

over the unitary circle T:

Rα : x′ = x + α mod 1. (2.7)

It is clear from the definition that we may take, without loss of generality,

0 ≤ α < 1.

2.3.1 Existence of limit laws

A first important result concerning Poincare recurrences for the transforma-

tion (2.7) was given by Slater [22] in 1967. He proved that for irrational

rotations — that is rotations whose rotation number α is an irrational real

number — there exist, at most, three different return times.

We could wonder, then, whether the statistics of first return times FAn(t),

obtained by fixing a point x ∈ T and by taking a sequence of intervals

16 Shear flow

-7

-6

-5

-4

-3

-2

-1

0

0 0.5 1 1.5 2 2.5 3

log 1

0 F

(t)

log10 t

Figure 2.2: Plot of the recurrence statistics FA(t), for the domain A =

[0, ε] × [y, y + ε], with ε = 10−2 and y = 10−1. The dashed line represents

the linear fit in the interval t ∈ [102, 103].

An ⊆ T that shrink toward x, converge to a limit one. In general, there

is not a limit law for an arbitrary sequence An, since the value of the

return times depends (usually in a rather involved way) on the subset An

considered. However, Z. Coelho and E. De Faria were able to show [24]

that limit distributions of first entry times Ge(t) exist for irrational rotation

when the shrinking subsets An are chosen in an appropriate way.

To construct the sequence An used in the proof of the existence of

Ge(t), they consider the continued fraction expansion of the rotation number

α, which may be written like α = [0, a1, a2, a3, . . .], if 0 < α < 1. The

truncated expansion of order n of α is then given by pn/qn = [0, a1, . . . , an],

where pn and qn verify the following recurrence relations,

pk = ak pk−1 + pk−2,

qk = ak qk−1 + qk−2,(2.8)

with p−2 = 0, p−1 = 1 and q−2 = 1, q−1 = 0. Now, choosing an arbitrary

point x on the circle T, they define An as the closed interval of endpoints

Rqn−1

α (x) and Rqn

α (x) containing x. This means that An = [Rqn

α (x), Rqn−1

α (x)]

if n is odd, and the contrary holds if n is even.

Since we were mostly interested in return times, starting from this work

of Coelho and De Faria we tried to demonstrate the existence of the limit

2.4 Distributions of the number of visits 17

statistics of first return times F (t) too. This was possible by using a recent

result [33], which establishes the following relation between Ge(t) and F (t):

Ge(t) =

∫ t

0F (s) ds. (2.9)

Thus, knowing the expression of Ge(t) as found by Coelho and De Faria, we

could explicitly compute the limit statistics,

F (t) =

1, if 0 ≤ t < ta,

1

1 + ω, if ta ≤ t < tb,

0, if t ≥ tb,

(2.10)

where

ta =ω(1 + θ)

1 + θω, tb =

(1 + θ)

1 + θω. (2.11)

The real numbers 0 < θ ≤ 1 and 0 ≤ ω < 1 are related to the coefficients ai

of the continued fraction expansion of the rotation number α.

More recently, Turchetti has independently given a simple proof [25]

concerning the existence of F (t) for irrational rotations when α is taken

as a quadratic irrational with all the coefficients of the continued fraction

expansion equal, that is α = [0, a, a, a, . . .].

2.4 Distributions of the number of visits

As noted in the Introduction, the statistics of first return times may be con-

sidered as the zero-order distribution of the number of visits. The next log-

ical step in our analysis of Poincare recurrences for the transformation (2.1)

has been therefore to investigate whether limit distributions of the number

of visits Fk(t) exist for a generic order k > 0. Since our skew map is horizon-

tally almost everywhere foliated by irrational rotations, it was firstly studied

the behaviour of such distributions in the case of irrational rotations.

2.4.1 Successive returns for irrational rotations

In order to investigate the limit distributions of the number of visits for

irrational rotations, we considered the same sequence of intervals An used to

18 Shear flow

0

0.2

0.4

0.6

0.8

1.0

1.2

0 1 2 3 4 5

Fk,

A20

(t)

t

k = 1k = 2k = 3

Figure 2.3: Distributions of the number of visits Fk,A20(t) of order k = 1,

2 and 3.

obtain the statistics of first return times, hoping that it could be appropriate

also for the distributions of higher order.

In this respect, we decided to take the rotation number equal to the

golden ratio γ = (√

5 − 1)/2, which exhibits the very simple continued

fraction expansion γ = [0, 1, 1, 1, . . .], thus allowing to get easily the intervals

An. I performed the analysis for several orders k, computing for each of them

the distribution Fk,An(t), with n from 10 to 20.

Although it is not possible to deal with limit distributions by means of

numerical methods, nevertheless the results obtained strongly suggest the

existence of the limit distributions Fk(t). I found in fact that the distribu-

tions Fk,An(t) with the same order k are very close to each other, regardless

of the value of n, and this despite the presence of statistical fluctuations

and effects due to the finite size of the intervals An (the measure of An

goes from about 2 × 10−2 for n = 10, to 2 × 10−4 for n = 20). In Figs. 2.3

and 2.4 are shown the distributions referring to the smaller interval only,

namely A20, since we are interested in the limit for µ(A) → 0. However,

the distributions of the same order computed for different values of n would

appear practically indistinguishable in the graph.

It is worthwhile to note some of the features of the distributions Fk,An(t)


0

0.2

0.4

0.6

0.8

1.0

1.2

2 3 4 5 6 7

Fk,

A20

(t)

t

k = 3k = 4k = 5

Figure 2.4: Distributions of the number of visits Fk,A20(t) of order k = 3,

4 and 5.

numerically obtained. First, their support is an interval and, for any k, it

can be partitioned in three subintervals I(l)k , I

(c)k and I

(r)k (the leftmost,

the central and the rightmost, respectively) in such a way that Fk,An(t) is

constant on each of these subintervals. In particular Fk,An(t) = 1 if t ∈ I

(c)k .

Moreover, the intervals I(r)k and I

(l)k+1 practically coincide (in this regard, the

distribution for k = 3 is reported in both figures to show clearly that this

is true even for I(r)2 , I

(l)3 and I

(r)3 , I

(l)4 ). For every distribution studied we

have that µ(I(l)k ) ' µ(I

(r)k ) ' 0.447 and µ(I

(c)k ) ' 0.724, except for k = 2

and k = 5, where µ(I(c)k ) ' 0.276. Interestingly enough, the measure of the

support of Fk,An(t) for k = 1, 3 and 4 is about 1.618, that is near to 1/γ.

Surely, it would be interesting to get an analytic proof about the exis-

tence of the limit distributions Fk(t) and a theoretical explanation of their

properties.

2.4.2 Distributions for the skew map

Irrational rotations, as seen, do not admit limit distributions of the number

of visits unless the shrinking neighborhoods are taken in a suitable way.

Nevertheless, we were confident that, similarly to the statistics of first return

20 Shear flow

times, it would have been possible to show the existence of limit distribution

for the skew map (2.1).

We started by considering the particular situation of square domains

like Aε = [0, ε] × [0, ε] whose side, of length ε, goes continuously to zero.

Unfortunately, it soon appeared clear that in this case a geometric proof,

as was performed for the first return times, was exceedingly complicated,

as well as the development of a reliable and efficient numerical algorithm

implementing the corresponding geometrical construction. The only viable

solution seemed therefore to recur to a statistical method. In this respect,

the numerical computations I performed suggest definitely that limit laws

exist, as I will show later.

However, to try to understand this fact from a theoretical point of view,

S. Vaienti and I developed an heuristic, but quantitative, argument which

provides predictions very close to the numerical observations.

Theoretical investigation

For this purpose, it is necessary to consider another equivalent characteriza-

tion of the distributions of the number of visits. Let us begin by introducing

the kth return time of a point x ∈ A in a subset A,

τkA(x) =

0, if k = 0,

τk−1A (x) + τA

(

T τk−1

A(x)(x)

)

, if k ≥ 1,(2.12)

[note that τ 1A(x) = τA(x)]. Subsequently, we may define the distribution of

the kth return time as

Pk,A(t) = µA

(

x ∈ A :τkA(x)

〈τA〉≤ t

)

. (2.13)

We then observe that Eq. (1.11) can be rewritten as

Fk,A(t) = µA

(

x ∈ A :τkA(x)

〈τA〉≤ t ∧ τk+1

A (x)

〈τA〉> t

)

= Pk,A(t) − Pk+1,A(t). (2.14)

Since

τkA = τA + (τ2

A − τA) + . . . + (τkA − τk−1

A ), (2.15)


it is also possible to consider the function Pk,A(t) as representing the distri-

bution of the sum of the differences, normalized by 〈τA〉−1, of consecutive

return times until the kth return. The distribution of the difference between

two consecutive return times (normalized by 〈τA〉−1) follows the same law

as the distribution of the first return (see Ref. [8]), because the measure µA

is invariant with respect to the induced application on A and because

τkA − τk−1

A = τA T τk−1

A . (2.16)

Now, if the variables τA/〈τA〉, (τ2A − τA)/〈τA〉, . . . , (τk

A − τk−1A )/〈τA〉

were identically independently distributed (i.i.d) with the same distribution

function Gr,A(t), then it is well known that the distribution function of their

sum would be the following convolution product:

Pk,A(t) = Gr,A(t) ∗ Gr,A(t) ∗ . . . ∗ Gr,A(t)︸︷︷︸

k times

. (2.17)

In the case of highly mixing systems [for instance φ-, α- and (φ, f)-mixing

systems] for which the limit distribution of first return times Gr(t) is almost

everywhere given by 1 − e−t, the differences of the normalized successive

return times become asymptotically independent when µ(A) → 0. The

strategy adopted in Ref. [8] to compute, for a suitable choice of the sets A,

the Poisson law

Pk,A(t) − Pk+1,A(t) −→ e−t tk

k!, (2.18)

was just based on this fact.

In this regard, although our skew map is not ergodic, nonetheless it

enjoys a sort of local mixing property. This suggested us to try to obtain

the distributions of the number of visits by assuming that even in such

a situation the differences of successive return times were asymptotically

independent. As seen before, the limit statistics of first return times F (t)

for the sets Aε, when ε → 0, is given by Eq. (2.6). With the corresponding

limit distribution being Gr(t) = 1 − F (t), under the preceding assumption

we can write

Pk(t) = Gr(t) ∗ Gr(t) ∗ . . . ∗ Gr(t)︸︷︷︸

k times

, (2.19)

and

Fk(t) = Pk(t) − Pk+1(t). (2.20)

22 Shear flow

In particular, it holds that

F1(t) = Gr(t) −∫ +∞

−∞Gr(t − s) dGr(s). (2.21)

A rather straightforward computation of the Stieltjes integral then gives:

F1(t) =

0, if t = 0,

1/4, if 0 < t < 2,

1

4t2+

1

4(t − 1)2+

3

2t3+

3 log(t − 1)

t4+

6 − 7t

4t3(t − 1)2, if t ≥ 2.

(2.22)

We may note that, when t is large, F1(t) behaves like 1/2 t−2. Through a

similar, but more cumbersome, computation, we could obtain F2(t) too,

F2(t) =

0, if t = 0,

1/8, if 0 < t < 1,

1

4− 1

8t2, if 1 ≤ t < 2,

O(t−2), if t 2.

(2.23)

Using a recursive argument, it is possible to show that

Fk(t) =

0, if t = 0,

1

2k+1, if 0 < t < 1,

O(t−2), if t 2.

(2.24)

I would like to remark two interesting features of the distributions Fk(t):

(i) for 0 < t < 1, the distributions present a plateau whose height is given

by 1/2k+1, and this is the only explicit dependence on k that we were

able to easily detect;

(ii) for t → ∞, all the Fk(t) exhibit the same behaviour whatever the

order k, decaying like 1/2 t−2.


10-8

10-6

10-4

10-2

100

10-1 100 101 102 103 104

F1(

t)

t

Figure 2.5: Distribution of the number of visits of order k = 1 computed

for the domain Aε = [0, ε]× [0, ε], with ε = 10−2. The dotted line represents

the function (2.22).

Numerical analysis

We compared the distributions Fk(t), computed under the assumption that

the differences of successive return times are asymptotically independent,

with the ones obtained through numerical investigations.

The qualitative features described above still seem to persist, although

there is some discrepancy between the two kind of distribution. In particular,

both show an initial plateau, even if the value of the actual one differs from

the expected. But, what is more interesting, all the numerical distributions

appear to decay like 1/2 t−2, at least after a transitory peak (see Figs. 2.5

and 2.6).

The discrepancy between the two kind of distribution is reasonably due

to the presence of some sort of weak correlation between the differences of

successive returns. Note that this could be in agreement with a result of

Z. Coelho and E. De Faria [24], showing that the limit joint distributions of

the differences of successive entry times are not given by the product of the

individual limit distributions.

I investigated also the behaviour of the distributions of the number of

24 Shear flow

10-8

10-6

10-4

10-2

100

10-1 100 101 102 103 104

F2(

t)

t


for the domain Aε = [0, ε]×[0, ε], with ε = 10−2. The dashed line represents

the function 1/2 t−2.

visits Fk,A(t) with A = [0, ε]×[y0, y0+ε] ⊂ C, y0 > 0, for several values of the

parameters y0 and ε. I found that an asymptotic power law decay preceded

by a peak seems to hold even for such domains, as shown in Fig. 2.7; note

how the peaks narrow and shift toward larger t when k increases, while their

height slightly decreases. So, in order to estimate the decay exponent β, I

used a least-squares method to fit the numerical distributions against the

function a t−β. In this more general case β is usually greater than 2, and

the mean value of its distribution appears to be positively correlated to the

order k (Fig. 2.8).

Even the distributions obtained for rectangular domains like A = [x0, x0+

ε] × [y0, y0 + δ] present similar features: in particular they decay following

a power law with an exponent greater, but near, to 2. Moreover, the mean

return time computed numerically is still 〈τA〉 = 1/ε.

In conclusion, the inverse square decay in t of the limit distributions,

whatever the order k, seems to be typical for the fixed points (which lie

along the x axis) of our skew map, while as soon as one considers other

points, the exponent β increases weakly with k. However, the difference

between the distributions of the number of visits for periodic and generic


10-6

10-4

10-2

100

100 101 102 103

Fk(

t)

t

1 2 5 10 20

Figure 2.7: Distributions of the number of visits of order k = 1, 2, 5, 10

and 20, computed for the domain A = [0, ε] × [y0, y0 + ε], with y0 = 0.35

and ε = 10−2. The dashed line represents the function 1/2 t−2.

points appears to be a general fact of recurrences, as we will also see in the

next chapters.

26 Shear flow

1.95

2.00

2.05

2.10

2.15

k = 1 k = 2 k = 3

β

Figure 2.8: Distributions of the best-fit parameter β, obtained through a

least-squares method in the range 30 ≤ t ≤ 100 from the distributions of the

number of visits of order k = 1, 2 and 3. For each order k, the distributions

of the number of visits have been computed for twenty domains of side

ε = 10−2. The arrows show the position of the mean value of β.

Chapter 3

Mixed dynamical systems

In this chapter I will consider the following situation. Suppose to take a

subdomain A of a measurable phase space Ω, which intersects two regions

of Ω that are invariant with respect to a transformation T acting on Ω. Upon

these two regions, the map T is defined so that it behaves in two different

ways, for example it may be mixing on one of the components and simply

ergodic (or not ergodic at all) on the other one.

In this case, one could wonder about the existence of a limit recurrence

statistics — or, more generally, of limit distributions of the number of visits

— when A shrinks around a point which belongs to the common boundary of

the two regions, in such a way to still intersect the two invariant components.

3.1 Distributions of the number of visits

Let us consider a measurable space Ω and a map T acting on it. Moreover,

be µ a T -invariant measure; note that it can be taken as a generic invariant

measure, not necessarily a Lebesgue measure. Suppose now that the dynam-

ical system (Ω, T, µ) is such that it splits into two subsystems (Ω1, T1, µ) and

(Ω2, T2, µ), where

Ω = Ω1 ∪ Ω2, µ(Ω1 ∩ Ω2) = 0. (3.1)

The maps T1 and T2, defined over Ω1 and Ω2 respectively, satisfy the fol-

lowing conditions:

T1 = T|Ω1\(Ω1∩Ω2), T2 = T|Ω2\(Ω1∩Ω2), (3.2)

27

28 Mixed dynamical systems

that is, they coincide with T except possibly on the zero measure boundary

Ω1 ∩ Ω2 of the invariant regions.

In order to study the behaviour of the distributions of the number of

visits for neighborhoods of points which belong to this common boundary,

let us take a neighborhood A of a point x ∈ Ω1 ∩ Ω2, such that µ(A) > 0.

Then, denote with A1 and A2 the two different components of A, that is,

A1 = A∩ Ω1 and A2 = A∩ Ω2; of course A = A1 ∪A2.

The sequence of domains A shrinking around x is chosen in such a way

that the relative weights

w1(A) =µ(A1)

µ(A), w2(A) =

µ(A2)

µ(A), (3.3)

have a finite limit when µ(A) → 0, namely we will assume that the following

limits exist and are different from zero,

w1 = limµ(A)→0

w1(A), w2 = limµ(A)→0

w2(A). (3.4)

It is easy to prove that the mean return time into A is related to those

in A1 and A2 as follows:

〈τA〉 = w1(A) 〈τA1〉 + w2(A) 〈τA2

〉. (3.5)

In fact, from the definition of mean return time and of conditional measure,

it holds that

〈τA〉 =

∫

AτA(x) dµA

=1

µ(A)

∫

AτA(x) dµ

=µ(A1)

µ(A)

∫

A1

τA1(x) dµA1

+µ(A2)

µ(A)

∫

A2

τA2(x) dµA2

= w1(A) 〈τA1〉 + w2(A) 〈τA2

〉.

Furthermore, by calling Fk,A1(t) and Fk,A2

(t) the distributions of the

number of visits in A1 and A2 respectively, it is possible to show that

Fk,A(t) = w1(A)Fk,A1(w′

1(A) t) + w2(A)Fk,A2(w′

2(A) t), (3.6)

3.2 Coupling of mixing maps 29

where

w′1(A) =

〈τA〉〈τA1

〉 , w′2(A) =

〈τA〉〈τA2

〉 . (3.7)

Then, assuming that the limits

Fk,1(t) = limµ(A)→0

Fk,A1(t), Fk,2(t) = lim

µ(A)→0Fk,A2

(t), (3.8)

and

w′1 = lim

µ(A)→0w′

1(A), w′2 = lim

µ(A)→0w′

2(A), (3.9)

are well defined, we proved the following theorem.

Proposition 3.1 Under the existence of the limits (3.4), (3.8) and (3.9),

the limit distributions of the number of visits, for points on the boundary

Ω1 ∩ Ω2, exist and are given by

Fk(t) = w1 Fk,1(w′1 t) + w2 Fk,2(w

′2 t), (k ≥ 0) (3.10)

at the points of continuity of both Fk,1 and Fk,2.

In other words, when the conditions of Proposition 3.1 are satisfied, the limit

distributions Fk(t) are a linear superposition of the limit distributions Fk,i

corresponding to the invariant components Ai of the shrinking neighborhood

A, weighted by the relative size of such components.

As an application of these results, I first studied a one-dimensional sys-

tem whose phase space is parted into two mixing invariant regions, and then

a more remarkable two-dimensional system obtained by coupling the skew

map (2.1) with the so-called Arnold’s “cat map”.

3.2 Coupling of mixing maps

In order to check Eq. (3.10), I considered the following one-dimensional,

sawtooth-like map defined over the interval Ω = [−1, 1],

T (x) =

T1(x), if − 1 ≤ x < 0,

T2(x), if 0 ≤ x ≤ 1,

(3.11)


where

T1(x) =

−3x − 3, if − 1 ≤ x < − 23 ,

3x + 1, if − 23 ≤ x < −1

3 ,

−3x − 1, if − 13 ≤ x ≤ 0,

(3.12)

and

T2(x) =

3x, if 0 ≤ x < 13 ,

2 − 3x, if 13 ≤ x < 2

3 ,

3x − 2, if 23 ≤ x ≤ 1.

(3.13)

Note that the two subsets Ω1 = [−1, 0] and Ω2 = [0, 1] are invariant as

regards T1 and T2, respectively.

The map T is piecewise linear and preserves the Lebesgue measure.

Moreover, it is continuous except at the point x = 0, which is a periodic

point of period two for T1 and a fixed point for T2. Thus, whenever we take

a ball of radius ε around x = 0, the statistics of first returns on the left and

on the right of x will follow, respectively, those around a periodic point of

period 2 and a fixed point; in this situation, the limit statistics for ε → 0

will not follow the exponential 1-law e−t as for generic points. Instead, it is

possible to use a result due to M. Hirata [15], according to which for one-

dimensional Markov maps, like T1 and T2, the statistics of first return times

in a ball of radius ε around a periodic point x of period P is given, in the

limit ε → 0, by the following formula:

F (t) = ρx e−ρxt, (3.14)

where

ρx = 1 − eu(x) + u(T (x)) + ...+u(T P−1(x)). (3.15)

Here u(x) is the potential associated to the invariant Gibbs measure; in the

present case, in which µ is a Lebesgue measure, it holds that

u(x) = ln1

|T ′1|

= ln1

|T ′2|

= ln(1/3) (3.16)

and, therefore, ρx is 8/9 for T1 and 2/3 for T2, consistently with the fact

that, in general, we expect ρx to approach 1 for periodic points of increasing

period.

3.2 Coupling of mixing maps 31

-4

-3

-2

-1

0

0 2 4 6 8 10 12

log 1

0 F

(t)

t

Figure 3.1: Statistics of first return times FA(t) obtained for the set A =

[−10−3, 10−3]. The dashed line represents the law (3.17).

Then, by using Proposition 3.1, the limit statistics of first return times

in x = 0 reads as

F0(t) = w1 F0,1(w′1 t) + w2 F0,2(w

′2 t) (3.17)

= w18

9e−

8

9w′

1t + w22

3e−

2

3w′

2t.

Since the maps T1 and T2 are mixing, one may immediately see, thanks to

Kac’s theorem, that 〈τAi〉 = µ(Ωi)/µ(Ai), i = 1, 2. So, under the existence

of the limits (3.4) and (3.9), we have w′i = [µ(Ω)/µ(Ωi)]wi. The prescribed

choice of the set A as a symmetric interval around the boundary point x

thus implies w1 = w2 = 1/2 and w′1 = w′

2 = 1. Replacing these values into

Eq. (3.17) gives a result which is well confirmed by the numerical computa-

tions, as shown, for example, in Fig. 3.1.

A similar situation occurs if we want to apply Proposition 3.1 to compute

the distributions of the number of visits, of order k > 0, in x = 0. That

is, we need to know which are the distributions around periodic points. In

this respect, since the two mixing maps T1 and T2 are conjugated with a

Bernoulli shift on three symbols with equal weights, it is possible to use

a general formula recently proved by N. Haydn and S. Vaienti which gives

the limit distribution of order k for cylinders Cn around periodic points of


100

10-1

10-2

10-3

10-4

10-5

10-6

0 5 10 15 20 25

Fk(

t)

t

1

2

Figure 3.2: Distributions of the number of visits of order k = 1

and 2, computed for the map given by Eq. (3.11) in the interval

A = [−ε, ε], with ε = 5 × 10−4. The dotted lines represent the

theoretical predictions; in particular the one for k = 1 corresponds

to formula (3.20).

period P ,

Fk(t) ≡ limn→∞

µCn

(

x ∈ Cn : ξCn(t;x) = k

)

= (1 − pP ) e−(1−pP )tk∑

j=0

(k

j

)pP (k−j) (1 − pP )2j

j!tj , (3.18)

where p = µ(Cn+1)/µ(Cn). Note that for k = 0 this formula coincides with

the one found by M. Hirata, since then it reads as

F0(t) = (1 − pP ) e−(1−pP ) t. (3.19)

I would like to note that the distributions (3.18) can be obtained, under the

assumption that the differences of successive return times are asymptotically

independent, by convoluting F0(t) according to the procedure described in

Sec. 2.4.2.

Considering, for example, a sequence of shrinking cylinders Cn centered

around the point x = 0 (so that w1 = w2 = 1/2 and, by Kac’s theorem,

3.3 Coupling of a regular and of a mixing map 33

w′1 = w′

2 = 1), the distribution of the number of visits for k = 1 should be

well described by the following function:

F1(t) =1

2e−(1−p2)t (1 − p2)

[p2 + (1 − p2)2 t

]+

+1

2e−(1−p)t (1 − p)

[p + (1 − p)2 t

], (3.20)

where p = 1/3 in the present case. As Fig. 3.2 shows, the agreement of

the numerically computed distributions with the theoretical expectations is

really good.

3.3 Coupling of a regular and of a mixing map

It is particularly interesting to study the behaviour of the distributions of

the number of visits for a dynamical system in which the dynamics of one of

the invariant regions is given by our skew map, while the other is a generic

strongly mixing map, like, for example, the hyperbolic automorphism of the

torus (also known as Arnold’s “cat map”).

So, following the notation introduced in Sec. 3.1, let us construct a trans-

formation T defined over the two-dimensional space Ω = Ω1 ∪ Ω2, with

Ω1 = T × [0, 1] and Ω2 = T × [−1, 0[. In this way Ω is obtained by gluing

together the cylinder Ω1 with the torus Ω2. Then, the map T1 = T|Ω1be

represented by our skew map, while T2 = T|Ω2by the hyperbolic automor-

phism:

T2 :

x′ = 2x + y mod 1,

y′ = (x + y mod 1) − 1.(3.21)

Like in the previous example, µ is taken as the usual Lebesgue measure.

3.3.1 Recurrences for domains of finite size

In order to investigate the distributions of the number of visits, let us start

by considering an arbitrary point x = (x, 0) along the boundary (y = 0)

of the two invariant subspaces Ω1 and Ω2. Then, let us take a measurable

neighborhood A of x, in such a way that w1(A) = λ, with 0 < λ < 1. Of

course, by definition, it follows immediately that w2(A) = 1 − λ. Since the

domain A intersects the boundary, the distributions Fk,A(t) are given by

Eq. (3.6).


Now, being T1 the skew map, we may use Eq. (2.4) to express the mean

return time into A1 like 〈τA1〉 = µy(IA1

)/µ(A1). Similarly, since T2 is

a strongly mixing map, we may apply Kac’s theorem, obtaining 〈τA2〉 =

1/µ(A2). So, the quantities w′1(A) and w′

2(A) read as

w′1(A) = λ

[

1 +1

µy(IA1)

]

, (3.22)

and

w′2(A) = (1 + λ) [1 + µy(IA1

)] , (3.23)

respectively.

Although we do not exactly know the distributions of the number of

visits Fk,A1(t) for the skew map when A1 is arbitrarily chosen, nevertheless

we may still understand the asymptotic behaviour of the limit distributions.

In fact, it is sensible to suppose, at least for large values of the normalized

time t and if A is sufficiently small, that Fk,A1(t) ' a t−β, with β very near

to 2, for k = 0, and slightly greater than 2 for k > 0, while a is a constant

positive number. This assumption is strongly suggested by the numerical

computations presented in Chapter 2 and supported also by the heuristic

explanations discussed there. Moreover, since the cat map, which is an

Anosov diffeomorphism, enjoys the Poisson distribution if we take balls or

cylinders converging around µ-almost all points, it is reasonable to expect

that the distributions Fk,A2(t) will approach the function e−t tk/k! when

µ(A) 1. Thus, using Eq. (3.6), we can write for t sufficiently large,

Fk,A(t) =λa

[w′1(A) t]β

+ (1 − λ)[w′

2(A) t]k

k!e−w′

2(A) t, (β ' 2). (3.24)

I verified that this formula, despite the approximations employed in or-

der to obtain it, describes quite well the asymptotic behaviour of the dis-

tributions of the number of visits computed for domains that intersect the

boundary, as shown in Figs. 3.3 and 3.4.

Note in particular that a power law tail appears as soon as the first

term in Eq. (3.24) gives the main contribution. Furthermore, the value

of the normalized time t for which the polynomial decay begins to prevail

increases as the measure of A decreases (but is still different from zero).

This means that if we compute numerically the distributions of the number


-7

-6

-5

-4

-3

-2

-1

0

0 10 20 30 40 50

log 1

0 F

(t)

t

Figure 3.3: Statistics of first return times FA(t) computed for a circular

domain A of radius r = 2×10−2 and center (xc, yc), where xc = 0.9 and yc

is such that λ = 10−1. The dotted line represents the function (3.24) for

k = 0. Note in particular that the statistics decays according to a power

law for large values of t.

of visits for a very small domain, we need to reach large values of t to see such

a power law tail. I would like to remark that the appearance of this power

law tail is a consequence of the fact that µ(A) 6= 0. Therefore, since it is not

possible to deal with the limit of zero measure domains while performing

a numerical analysis of the recurrence distributions, one should take into

account the presence of these finite size effects before going to conclusions

about the behaviour of the limit distributions.

3.3.2 Limit distributions of the number of visits

Starting from Eq. (3.24), it would seem that we might directly turn to

Proposition 3.1 to obtain the limit distributions Fk(t), but unfortunately

one of the assumptions required for the existence of Fk(t) itself fails. In

fact, it is immediate to verify that w′1(A) has not a finite limit when A

shrinks to x, since λ keeps a positive finite value and µy(IA1) → 0.

However, for the limit statistics of first return times it is possible to

overcome this difficulty, at least in a particular case. By choosing A ⊂ Ω


10-8

10-6

10-4

10-2

100

100 101 102 103

F1(

t)

t

a

b

Figure 3.4: Distributions of the number of visits of order k = 1, computed

for a square domain of side (a) ε = 0.2 and (b) ε = 0.05, such that λ = 0.5.

The dotted lines represent Eq. (3.24) with a = 0.5 and β = 1.94. Note that

as the domain’s size decreases, the contribution of the regular component

begins to prevail for larger values of t.

as a symmetric rectangle intersecting the boundary, like A = [x − ε/2, x +

ε/2]× [−ε, ε], I was able to rigorously prove that F (t) exists and is given by

F (t) =1

2

(

ξ(t) + e−t/2)

, (3.25)

where ξ is defined as ξ(0) = 1, while ξ(t) = 0 for t > 0. In order to

obtain this result, it was crucial the knowledge of the exact expression of

the recurrence statistics for square domains whose lower side is on the fixed

points of the skew map, as given by Eq. (2.5). Formula (3.25) is actually

confirmed by the numerical computations of the statistics, as shown, for

example, in Fig. 3.5.

Although we do not have an exact analytical formula of the recurrence

distributions for the skew map, nevertheless to investigate the asymptotic

behaviour of Fk(t) we may consider the limit for µ(A) → 0 of Eq. (3.24).

When A shrinks to a point x of the boundary, we have that w ′1(A) → ∞,

as previously seen, whereas it is easy to prove that w ′2(A) → 1 − λ. Since

we assume that the quantity a keeps a finite positive value, it is therefore


-5

-4

-3

-2

-1

0

0 5 10 15 20

log 1

0 F

(t)

t

Figure 3.5: Statistics of first return times FA(t) computed for the set

A = [x, x + ε] × [−ε, ε], with x = 0.3 and ε = 10−2. The dashed line

represents the law (3.25).

immediate to verify that, for µ(A) → 0, only the second term survives in

Eq. (3.24), which thus becomes:

Fk(t) = (1 − λ)[(1 − λ)t]k

k!e−(1−λ) t. (3.26)

This result is actually confirmed by the numerical computations performed;

in particular, varying the fraction λ of the regular component of A, I checked

that the exponent is given by 1 − λ as expected (see Figs. 3.6 and 3.7).

Note that even if, in the limit of zero measure domains, the power law

contribution disappears from the distributions of the number of visits, which

thus follow, for sufficiently large t, a Poisson law as in the pure mixing case,

nevertheless the presence of the regular component of A is still revealed by

the fact that the quantity 1 − λ is different from one.

To conclude, the distributions of the number of visits appear able to

capture both the different qualitative properties of the regular and mixing

regions, and to provide some information about the relative measures of

these components.

As a final remark, I would like to stress that although the two-dimensional

model just considered is rather simple, the distributions of the number of


-3

-2.5

-2

-1.5

-1

-0.5

0

0 1 2 3 4 5 6 7

log 1

0 F

(t)

t

1

2

3

45

Figure 3.6: Recurrence statistics FA(t) computed for circular domains Aof radius r = 10−2 and center (xc, yc), where xc = 0.9 and yc is such that

(1) λ = 0, (2) λ = 0.2, (3) λ = 0.4, (4) λ = 0.6 and (5) λ = 0.8.

visits exhibit some features that are found in dynamical systems having an

higher interest from a physical point of view, as we will see in the next

section.

3.4 Recurrences for the standard map

In this section I present the results obtained by investigating the distribu-

tions of the number of visits for the so-called “standard map,”

y′ = y − η2π sin(2πx)

x′ = x + y′mod 1. (3.27)

I chose the value of the coupling parameter as η = 3. In this way,

the phase space presents a structure where regular orbits are surrounded

by a chaotic “sea” (Fig. 3.8). Moreover, for such value of η the stochastic

layer, separating the regular orbits from the chaotic sea, is sufficiently sharp

compared to the size of the sets used to compute the distributions.

I considered three different cases: domains wholly contained in the regu-

lar region, domains lying on the chaotic sea, and domains that overlap both

3.4 Recurrences for the standard map 39

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

α(λ)

λ

Figure 3.7: Slopes α(λ) obtained by fitting the statistics of first return

times FA(t) against the function A e−α(λ) t, for 3 < t < 6. The domains

A are like the ones considered in Fig. 3.6. The dashed line represents the

function 1 − λ.

regions.

3.4.1 Domains on the regular region

The numerical distributions Fk,A(t) obtained for domains A wholly con-

tained in the regular region appear to follow asymptotically a power law

decay like a t−β, as shown in Fig. 3.9. The least-squares fit estimate of the

exponent β gives a value near 2, usually between 1.90 and 2.05. There is

furthermore some evidence suggesting that β slightly increases along with

the order k. It is interesting to note that the distributions computed for this

kind of sets behave in a very similar way as the ones got for the skew map.

3.4.2 Domains on the chaotic sea

The behaviour of the distributions of the number of visits for domains lying

on the chaotic component of the phase space appears, at first, surprising.


0 0.30

0.3

x

y

Figure 3.8: Plot of the standard map (3.27) with coupling parameter

η = 3.

In fact, the computed mean return time is given, with a very good approx-

imation, by the ratio of the measure of the chaotic region (which is about

0.88 with the choice η = 3 for the coupling parameter) and the measure of

the considered domain, thus being in agreement with the value that would

be provided by Kac’s theorem (which holds for ergodic systems) if it could

be applicable here. On the other hand, for sufficiently large values of t the

distributions obtained reveal a departure from the expected Poisson law,

showing a polynomial decay with an exponent near 2 (see Fig. 3.10).

The reasons for such a rather unexpected feature could sensibly be due

to the fact that orbits originating even from points far away from the regular

region usually approach it. In this way, the overall chaotic motion would be

appreciably influenced by the regular component, leading to the appearance,

in the distributions of the number of visits, of the observed power law decay.

In this regard it is interesting to note that Poincare recurrences seem to

provide a tool capable to capture some of the properties of the dynamics in

a more “sensitive” way with respect to other quantities that could be used

3.4 Recurrences for the standard map 41

10-6

10-4

10-2

100

100 101 102 103

Fk(

t)

t

Figure 3.9: Distributions of the number of visits of order k = 1 (lower

curve), k = 2 (middle curve) and k = 3 (higher curve), computed for a

square domain of side ε = 2.5 × 10−2 centered at (0.15, 0.25), which is

wholly contained in the regular region. The distributions appear to follow

asymptotically a power law decay like a t−β, where the least-square fit

estimate of β, performed for t in the range [10, 500], is 1.91 for k = 1, 1.95

for k = 2 and 1.97 for k = 3.

for the same purpose, like the mean return time into a given domain seen

above.

3.4.3 Domains intersecting the stochastic layer

Considering the results of Sec. 3.1, it is reasonable to expect that the asymp-

totic behaviour of the distributions of the number of visits for domains that

intersect the boundary between the regular and the chaotic regions should

be given by a linear superposition of a power law (the contribution, above

all, of the integrable orbits) and of a Poisson distribution (from the chaotic

sea).

In order to test this conjecture, I tried to fit the distributions Fk,A(t),

computed for a given domain A, against the following function, which should

sensibly represent their behaviour (from now on, to simplify the notation I


10-8

10-6

10-4

10-2

100

100 101 102 103

Fk(

t)

t

12

Figure 3.10: Distributions of the number of visits of order k = 1 and 2,

computed for a square domain of side ε = 5 × 10−2 centered at (0.7, 0.2),

which lies on the chaotic sea. The dotted line represents the function a t−β,

with a = 10−2 and β = 2.

drop the explicit dependence on A),

Fk(t) = w1a

(w′1t)

β+ w2

(w′2t)

k e−w′2t

k!. (3.28)

From Eqs. (3.3), (3.5) and (3.7) we have that the quantities w1, w2, w′1 and

w′2 are related in the following way:

w1 + w2 = 1,w1

w′1

+w2

w′2

= 1. (3.29)

So, for instance, w2 and w′2 in Eq. (3.28) may be expressed in terms of w1

and w′1,

w2 = 1 − w1, w′2 =

1 − w1

1 − w1/w′1

. (3.30)

Using then w1, w′1, α and β as fit parameters, I generally found a very good

agreement between the numerical distributions and formula (3.28), despite

the fact that it does not take into account the domain’s finite size effects.

In this respect, Fig. 3.11 shows a distribution of order k = 3, along with the

corresponding fit function.

3.5 Dissipative Henon map 43

10-6

10-4

10-2

100

100 101 102 103

F3(

t)

t


for a square domain of side ε = 0.1 and centered at (0.03, 0.3), which

intersects the boundary between the regular and the chaotic regions. The

dotted line represents Eq. (3.28) with β = 2.04.

It is worthwhile to mention that, for a given domain A, usually the

value w1 obtained from the fit procedure is greater than the geometrical

estimate of the relative measure of the regular share of A. This discrepancy

seems reasonably due to the fact that besides the points belonging to the

regular share, which are responsible for the power law decay, there is a

further contribution from the chaotic component of A, whose dynamics is

influenced in some way by the regular orbits (as seen above), so that the

effective relative size of the regular component appears to be greater than

expected.

3.5 Dissipative Henon map

I would like to conclude this chapter by presenting some results, mainly

obtained by E. Lunedei [34], concerning the distributions of the number of

visits for generic and periodic points of the following map, that was intro-


10-6

10-4

10-2

100

0 5 10 15

Fk(

t)

t

0

1

2

Figure 3.12: Distributions of the number of visits of order k = 0, 1 and 2,

computed for a circular domain of radius 5× 10−3 around a generic point.

The dotted lines represent the corresponding Poisson laws.

duced by Henon [35], defined on R2:

x′ = 1 + y − ax2,

y′ = bx,(3.31)

where a = 1.4 and b = 0.3.

In order to investigate the distributions of the number of visits Fk,A(t),

Lunedei used the so-called physical measure (also known as SBR measure)

supposing it is well defined, despite the fact that no rigorous proof about

its existence and its statistical properties has been given until now. The

numerical analysis was performed by taking the domain A as a little ball

centered both around generic and periodic points.

In the case of generic points, the distributions appear to agree in a very

good way with the Poisson law, as Fig. 3.12 shows. Since the Henon map is

not uniformly hyperbolic, we could not have taken for granted this result.

To study the distributions of the number of visits for periodic points he

computed, by a bisection method [36], the periodic orbits of the Henon map

with period P from one to ten. The investigation was limited up to points of

period ten because the lower periods are the ones with a stronger influence


10-8

10-6

10-4

10-2

100

0 2 4 6 8 10 12 14 16

F0(

t)

t

a

b


for a circular domain of radius 5 × 10−3 centered around a periodic point

of period two. The dotted line (a) corresponds to the exponential function

e−t, while the line (b) represents the function α e−αt, with α ' 0.66.

on return times. As expected, when A is a little ball centered around a

periodic point, the Poisson law does not fit anymore the numerical results,

as it is clear from Fig. 3.13, which shows the distribution F0,A(t) obtained

for a point of period two. Instead, for every periodic point considered, the

distributions of order k = 0 are well described by the function [compare it

to Eq. (3.19)],

F0(t) = α e−αt, (3.32)

where the coefficient α seems to depend on the period P in the following

way:

α(P ) = 1 − A%P , (3.33)

with A ' 0.96 and % ' 0.59 (see Fig. 3.14). Then, by making the assump-

tion of independence of the differences of successive return times, we used

a procedure similar to the one employed in Sec. 2.4.2 to obtain the limit

distributions of the number of visits of any order,

Fk(t) = α e−αtk∑

j=0

(k

j

)(1 − α)k−j α2j

j!tj. (3.34)


0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

0 2 4 6 8 10 12

α(P

)

P

Figure 3.14: Values of the coefficient α of the distributions F0(t) com-

puted for periodic points of period P . The dotted line represents the fit

function (3.33).

As Fig. 3.15 shows, the numerical distributions computed for periodic points

agree very well with this expression, which, under the identification α →(1 − pP ), coincides exactly with Eq. (3.18); like in that case, it would be

interesting here to explicitly relate α to the period. Moreover, we can note

that Eq. (3.34) is reduced to the Poisson function e−t tk/k! when α → 1.

Since α approaches 1 as the period increases, this means that the distribu-

tions for high periods are practically indistinguishable from those concerning

generic points, which are just described by the Poisson distribution. This

is self-consistent, because an aperiodic orbit can be thought like a periodic

orbit with an infinite period.

Summarizing this work on the Henon map, we may observe the following

facts. First, the distributions of the number of visits computed for generic

points agree with the Poisson law, as usually happens for systems with strong

mixing properties (see Ref. [8]). Second, periodic points exercise a real effect

on return times, and this effect decreases as the period grows. Furthermore,

the distributions computed for periodic points follow in a very good way the

behaviour predicted by assuming that the differences of successive return

times are independent.


10-5

10-4

10-3

10-2

10-1

100

0 10 20 30 40 50 60 70

Fk(

t)

t

15

15 30

Figure 3.15: Distributions of the number of visits of order k = 1, 5, 15 and

30, computed for a circular domain of radius 5 × 10−3 centered around a

periodic point of period two. The dotted lines represent the corresponding

theoretical predictions (3.34).

Chapter 4

Poincare recurrences and genomic

sequences

During the last years, we have assisted to a growing development of disci-

plines related to biology. In particular, the studies addressed to the inves-

tigation of genomes have attracted more and more attention. Surely, this

has been favoured, among other things, by the availability and easy access

to huge archives of genomic sequences.

Interestingly, the role of a part of the genome — the so-called noncoding

regions — is still substantially unknown. Understanding the function of

such noncoding regions appears even more important considering that they

usually represent, for eukaryotes, the widest portion of the genome, and

this part is generally larger in organisms placed at higher positions in the

evolutionary scale. For instance, in humans more than 95% of the genome

is represented by noncoding sequences.

Recently, many studies have been performed on genomic sequences us-

ing statistical methods (see for example Refs. [37]–[41]). In this respect,

G. Turchetti and I thought it could have been of some interest to explore the

possibility to apply the techniques developed for the analysis of dynamical

systems by means of Poincare recurrences. This idea was based essentially

on the observation that Poincare recurrences seem able to capture some

of the fundamental features of the dynamics of the underlying systems, as

shown in the previous chapter. So, I tried to investigate whether the re-

currence statistics may represent a tool capable to distinguish the coding

regions from the noncoding ones.

49

50 Poincare recurrences and genomic sequences

Unfortunately, the statistics of first return times does not appear suitable

for this purpose. Nonetheless the results obtained supply some information

that could be meaningful, I believe, in the discussion about the statistical

properties of genomic sequences.

4.1 Genome and genetic information

The deoxyribonucleic acid (DNA) is a macromolecule composed of funda-

mental units called nucleotides. In each nucleotide there is a phosphate

group, a sugar, and one of the following four bases: adenine (A), guanine

(G), cytosine (C) and thymine (T).

Several nucleotides bind themselves to form a chain-like molecule known

as polymer, which tends to assume an helical shape. Two of these helices,

linked together by hydrogen bindings between complementary bases, form

the DNA molecule. Usually, each helix is said to be a DNA strand.

The biological role of such a DNA molecule is fundamental: it represents

the so-called genome which contains the hereditary information own by every

living being. Its size differs from species to species; in humans it is made of

about 3.2 billion nucleotides.

The concept of gene was introduced by Mendel during his studies on

heredity to indicate the basic unit responsible for the transmission of a

single hereditary character. Modern biochemistry has revealed that a gene

is a specific sequence of nucleotides whose main function is, most of times,

to hold the information needed in the building of proteins. These latter are

macromolecules composed of aminoacids, and are essential for most of the

cellular activities.

A sequence of nucleotides determines an aminoacidic sequence (that is a

protein) in the following way: to every combination of three nucleotides is

associated one among twenty different aminoacid, according to the so-called

genetic code. The regions of a gene that concur to the formation of a protein

are named coding sequences.

What is surprising, is that usually only a small fraction of the whole

genome of eukaryotes is coding. In human beings, for example, more than

95% of DNA is represented by noncoding sequences, whose meaning is still

today widely unknown.

4.2 Extraction of genomic sequences 51

In this respect, several hypotheses concerning the function of the non-

coding part of the genome have been proposed. Some think it may be junk

DNA, a sort of residue of the molecular evolution; some instead that it could

help the genome to keep its compact form or could hold ancient genetic in-

formations. Finally, others believe that in the noncoding sequences may

reside the instructions necessary to the complex functioning of cells.

4.2 Extraction of genomic sequences

I chose the genomic sequences used in the analysis from the set of complete

genomes located at the GenBank database (http://www.ncbi.nlm.nih.gov),

considering both eukaryotic (about 60 chromosomes) and bacterial sequences.

Among the organisms whose genome present a considerable size, I investi-

gated C. Elegans, D. Melanogaster, S. Cerevisiae and E. Coli. To have an

idea, the nucleotides of C. Elegans’s chromosome III are about 13.2 millions,

of which about 9.8 millions in the the noncoding regions.

To extract the coding and noncoding sequences from GenBank’s files, I

developed a parser able to understand the rules contained in these archives.

Such rules specify, for example, the start and end positions, inside the files,

of the different kind of DNA stretches that form a particular sequence.

Subsequently, the coding sequences were obtained as follows. From a

given genomic sequence (usually a whole chromosome) I extracted all the

coding regions and then glued them together, in the same order in which

they appeared in the original genomic sequence. The noncoding sequences

were constructed with a similar procedure.

4.3 Preliminary statistical analyses

Before applying to the coding and noncoding sequences the method based

on Poincare recurrences, I performed a series of analyses in order to get some

insight about their main statistical properties.

In this respect, it is useful to regard the sequences like long strings given

by the succession of four symbols or “letters”, that is ‘A’, ‘C’, ‘G’ and ‘T’,

as many as the different types of nucleotides which can be found in DNA.

Then, one may denote an arbitrary string of n letters of this four-elements


0

0.05

0.1

0.15

0.2

0 0.2 0.4 0.6 0.8 1

f 2(x

)

x (word label)

actualproduct

Figure 4.1: Frequency distribution f2(x) of words two nucleotides long,

referring to the coding sequences from the chromosome III of C. Elegans.

alphabet with the term “word” (of length n). Of course, the total number

M of all the possible words of length n is M = 4n.

To each of these words it is possible to associate a number 0 ≤ x < 1, in

the following way. We can substitute the symbols ‘A’, ‘C’, ‘G’ and ‘T’ present

in given a word, with the digits 0, 1, 2 and 3, respectively. Thus, after such

a translation, a word may be reinterpreted as the base-four representation

of a whole number m. The number x associated to the word considered is

then simply obtained as x = m/M .

In this way, a sequence may be considered as the succession of contiguous

and non overlapping words of length n (each word, as said, representing a

group of n adjacent nucleotides in the sequence) or, equivalently, of points

belonging to the segment [0, 1[ whose coordinate is x.

4.3.1 Frequency distributions

In the beginning, I tried to analyze the presence of correlations among the

different types of letters (nucleotides), for both coding and noncoding se-

quences. For this purpose, I computed the normalized frequency distribu-

tion f1(x) of occurrence of single letters in a sequence, and the normalized

frequency distribution f2(x) of occurrence, in the same sequence, of two

4.3 Preliminary statistical analyses 53

0

0.05

0.1

0.15

0.2

0 0.2 0.4 0.6 0.8 1

f 2(x

)

x (word label)

actualproduct

Figure 4.2: Frequency distribution f2(x) of words two nucleotides long,

referring to the noncoding sequences from the chromosome III of C. Ele-

gans.

letters long words.

If a sequence were composed of nucleotides which are completely un-

correlated, then the normalized frequency f2(x), concerning a given pair of

nucleotides, would practically coincide, for sufficiently long sequences, with

the product of the frequencies f1(x) of each nucleotide of the pair.

As one may sensibly expect, I found that generally there is a visible

difference between the actual frequency f2(x) and that obtained as a product

of the corresponding single nucleotide frequencies f1(x). An example is

reported in Figs. 4.1 and 4.2.

In a similar way, I obtained the normalized frequency distributions fn(x)

for longer words, with n up to 8. The distributions with n = 5 and higher,

concerning the coding and noncoding part of a same genomic sequence,

show a rather interesting feature. Usually, in the noncoding case there is

a small group of words whose occurrence frequency is considerably higher

with respect to all the other words; in the coding case instead, the different

words appear more uniformly distributed (see Figs. 4.3 and 4.4).

This result is in agreement with what I found by analyzing the multifrac-

tal properties of the measure associated to the sequences, such a measure


0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0 0.2 0.4 0.6 0.8 1

f 5(x

)

x (word label)

Figure 4.3: Frequency distribution f5(x) of words five nucleotides long,

referring to the coding sequences from the chromosome III of C. Elegans.

being constructed through the frequency distribution of occurrence of the

words.

4.3.2 Multifractal analysis

To perform the multifractal analysis of the coding and noncoding part of the

different genomic sequences, I considered uniform partitions of the segment

I = [0, 1). I recall that we are representing all the possible M = 22n words n

nucleotides long as points whose coordinate x is given by a number between

0 and 1, as explained above.

So, subdividing I into 2k subintervals Ij (1 ≤ j ≤ 2k), each subinterval

will contain 22n−k words, and it is possible to associate to each of them

a measure µ such that µ(Ij) is the sum of the normalized frequencies of

occurrence of the words belonging to Ij.

Now it is easy to compute the following quantity:

Nq(k) =

2k

∑

j=1

µ q(Ij), q ∈ R, (4.1)

where in the sum are considered only the subintervals Ij whose measure is

greater than zero. In order to obtain the multifractal spectrum f(α), we

4.3 Preliminary statistical analyses 55

0

0.005

0.01

0.015

0.02

0 0.2 0.4 0.6 0.8 1

f 5(x

)

x (word label)

Figure 4.4: Frequency distribution f5(x) of words five nucleotides long,

referring to the noncoding sequences from the chromosome III of C. Ele-

gans.

need to compute the function τ(q) that is defined as

τ(q) = limk→∞

−1

klog2 Nq(k). (4.2)

Of course, since the maximum order k which makes sense in the present case

is 2n, we can not deal with the limit operation in Eq. (4.2). Nonetheless, we

may obtain an estimate of τ(q) through a least-squares fit procedure; usually

the values get in this way are good for the analysis of sequences. Finally,

the multifractal spectrum is given by the Legendre transform of τ(q):

f(α) = minq

(αq − τ(q)). (4.3)

An important feature of the parabolic-shaped function f(α) is the following:

when the measure µ tends to be uniform, the corresponding f(α) is “nar-

row”, while it becomes “broader” as the measure gets more inhomogeneous.

Using the frequency of occurrence distributions fn(x), with n from 5 to

8, I found that the multifractal spectrum of the coding regions of a given

genomic sequence is usually narrower than that of the noncoding ones of the

same sequence, the difference being more evident for eukaryotic organisms.


-0.2

0

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

f(α)

α

codingnoncoding

Figure 4.5: Multifractal spectra of the coding and noncoding sequences

from the chromosome III of C. Elegans, obtained for words seven nu-

cleotides long.

This means that the coding part is composed of words more uniformly dis-

tributed, as suggested also by the functions fn(x) themselves. An example

of such a different behaviour is shown in Fig. 4.5, in which are reported

the multifractal spectra of the coding and noncoding regions concerning the

chromosome III of C. Elegans.

The results obtained could induce to think that if some information is

contained in the noncoding regions of the genome, then it may be coded

through a sort of “language”, meaning that there is a limited number of

words (representing the vocabulary of this language) which are used to con-

struct the “sentences” expressing such information.

To check this hypothesis I computed the Shannon entropy, which is pro-

portional to the compressibility degree (and thus to the informative content)

of a string, for the coding and noncoding sequences. Given an alphabet of

4.4 Recurrence statistics 57

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8

H(n

)

n (word length)

codingnoncoding

Figure 4.6: Shannon entropy H(n) of the coding and noncoding sequences

from the chromosome III of C. Elegans, referring to words n nucleotides

long.

N symbols si (i = 1, 2, . . . , N), it is defined as

H = −N∑

i=1

pi log2(pi), (4.4)

where pi is the occurrence probability of the symbol si. I found it is nearly

the same for both kind of sequences (see, for example, Fig. 4.6), and therefore

this would seem to confute the preceding interpretation, suggesting that the

high-frequency words present in the noncoding regions do not represent a

significant part with respect to the whole sequence.

4.4 Recurrence statistics

In order to study the genomic sequences through the statistics of first return

times, it is essential to construct from each of them a dynamical system.

This involves, of course, the definition of a phase space Ω, of a dynamics T ,

describing how a state of Ω is transformed into another, and of a measure

µ.


-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0 0.5 1 1.5 2 2.5 3 3.5 4

log 1

0 F

(t)

t

Y. Pestis CO92C. Elegans (III)

S. Cerevisiae (IV)D. Melanogaster (IV)

Figure 4.7: Statistics of return times referring to the coding sequences of

some organisms, for words seven nucleotides long. The dotted line repre-

sents the function e−t.

In this respect, many are the possible choices. Regarding the sequences

as given by the succession of words of length n, with n fixed, I decided to

consider the phase space Ω as the collection of all the possible words made

of n nucleotides. Then, I chose the dynamics T in such a way it associates

to a word x ∈ Ω, the word which immediately follows the first occurrence of

x in the sequence; if x does not occur in the sequence, it is simply ignored.

However, the sequences used are sufficiently long so that the words ignored

are generally very few, and therefore this does not constitute a problem at

all. I would like to remark that other definitions for the dynamics could

equally be considered. Finally, the measure µ of a subset A ⊆ Ω is given by

the ratio between the number of words belonging to A and the number of

those in the whole phase space.

Now, choosing two words x1 and x2, such that 0 ≤ x1 < x2 < 1, it is

possible to compute the recurrence statistics FA(t), with

A = x ∈ Ω : x1 ≤ x ≤ x2. (4.5)

The main result I obtained is that, regardless of the word’s length consid-

4.4 Recurrence statistics 59

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0 0.5 1 1.5 2 2.5 3 3.5 4

log 1

0 F

(t)

t

Y. Pestis CO92C. Elegans (III)

S. Cerevisiae (IV)D. Melanogaster (IV)

Figure 4.8: Statistics of return times referring to the noncoding sequences

of some organisms, for words seven nucleotides long. The dotted line rep-

resents the function e−t.

ered for the computation of the statistics of first return times, the recurrence

statistics is essentially the same for both kind of sequences. Moreover, FA(t)

seems to follow, with a very good accuracy, the exponential law:

fA(t) = exp

ln[1 − µ(A)] t

µ(A)

. (4.6)

Some of the statistics of first return times computed for words seven nu-

cleotides long in the coding and the noncoding case are reported in Figs. 4.7

and 4.8, respectively.

What appears surprising, is that a recurrence statistics as given by

Eq. (4.6) corresponds to a dynamical system whose dynamics loose “mem-

ory” of the past after one iteration only, and therefore this results suggests

that the words present in both kind of sequences are completely uncorre-

lated.

Thus, it seems sensible to conclude that, from the point of view of

Poincare recurrences, the coding and noncoding regions behave as they were

strongly mixing dynamical systems, which are characterized by a really rapid

decay of the correlations, and that the weight of the long-range correlations


is scarce, at least compared to the one concerning the short-range correla-

tions.

Bibliography

[1] H. Hu, A. Rampioni, L. Rossi, G. Turchetti and S. Vaienti, Statistics of

Poincare recurrences for maps with integrable and ergodic components,

Chaos 14, 160 (2004).

[2] L. Rossi and G. Turchetti, Poincare recurrences and multifractal prop-

erties of genomic sequences, Physica A 338, 267 (2004).

[3] L. Rossi, G. Turchetti and S. Vaienti, Poincare recurrences as a tool

to investigate the statistical properties of dynamical systems with inte-

grable and mixing components, J. Phys. Conf. Ser. 7, 94 (2005).

[4] N. Haydn, E. Lunedei, L. Rossi, G. Turchetti and S. Vaienti, Multiple

returns for some regular and mixing maps, Chaos 15, 033109 (2005).

[5] H. Poincare, Sur le probleme des trois corps et les equations de la

dynamique, Acta Math. 13, 1 (1890).

[6] J. D. Meiss, Symplectic maps, variational principles, and transport,

Rev. Mod. Phys. 64, 795 (1992).

[7] J. D. Meiss, Average exit times in volume preserving maps, Chaos 7,

139 (1997).

[8] M. Hirata, B. Saussol and S. Vaienti, Statistics of return times: a gen-

eral framework and new applications, Comm. Math. Phys. 206, 33

(1999).

[9] B. V. Chirikov and D. I. Shepelyansky, Asymptotic statistics of Poincare

recurrences in Hamiltonian systems with divided phase space, Phys.

Rev. Lett. 82, 528 (1999).

61

62 Bibliography

[10] G. M. Zaslavsky and M. K. Tippet, Connection between recurrence-

time statistics and anomalous transport, Phys. Rev. Lett. 67, 3251

(1991).

[11] G. M. Zaslavsky and M. Edelman, Weak mixing and anomalous kinetics

along filamented surfaces, Chaos 11, 295 (2001).

[12] S. Ornstein and B. Weiss, Entropy and data compression, IEEE Trans.

Inf. Theory 39, 78 (1993).

[13] V. Afraimovich, Pesin’s dimension for Poincare recurrences, Chaos 7,

12 (1997).

[14] B. Pitskel, Poisson law for Markov chains, Ergod. Theory Dyn. Syst. 11,

501 (1991).

[15] M. Hirata, Poisson law for Axiom-A diffeomorphisms, Ergod. Theory

Dyn. Syst. 13, 533 (1993).

[16] A. Galves and B. Schmitt, Inequalities for hitting time in mixing dy-

namical systems, Random Comput. Dyn. 5, 337 (1997).

[17] N. Haydn, Statistical properties of equilibrium states for rational maps,

Ergod. Theory Dyn. Syst. 20, 1371 (2000).

[18] M. Abadi, Exponential approximation for hitting times in mixing pro-

cesses, Math. Phys. Electron. J. 7, 2 (2001).

[19] M. Abadi and A. Galves, Inequalities for the occurrence times of rare

events in mixing processes. The state of the art, Markov Proc. Relat.

Fields 7, 97 (2001).

[20] H. Bruin and S. Vaienti, Return time statistics for unimodal maps,

Forum Math. 176, 77 (2003).

[21] N. Haydn and S. Vaienti, The limiting distribution and error terms for

return time of dynamical systems, Discrete Contin. Dyn. Syst. 3, 589

(2004).

[22] N. B. Slater, Gaps and steps for the sequence nθ mod 1, Proc. Cam-

bridge Phylos. Soc. 63, 1115 (1967).

Bibliography 63

[23] P. Alessandri and V. Berthe, Three distance theorems and combina-

torics of words, Enseignement Mathematique 44, 103 (1998).

[24] Z. Coelho and E. De Faria, Limit laws of entrance times for homeomor-

phisms of the circle, Isr. J. Math. 93, 93 (1996).

[25] N. Buric, A. Rampioni and G. Turchetti, Statistics of Poincare recur-

rences for a class of smooth circle maps, Chaos Solitons Fractals 23,

1829 (2005).

[26] Y. Lacroix, Possible limit laws for entrance times of an ergodic aperiodic

dynamical systems, Isr. J. Math. 132, 253 (2002).

[27] E. Floriani and R. Lima, Statistics of return times in a self-similar

model, Chaos 9, 715 (1999).

[28] V. Afraimovich and G. M. Zaslavsky, Fractal and multifractal prop-

erties of exit times and Poincare recurrences, Phys. Rev. E 55, 5418

(1997).

[29] V. Afraimovich and G. M. Zaslavsky, Sticky orbits of chaotic Hamilto-

nian dynamics, Lect. Notes Phys. 511, 59 (1998).

[30] V. Afraimovich, A. Maas and J. Urias, Symbolic dynamics for sticky

sets in Hamiltonian systems, Nonlinearity 13, 617 (2000).

[31] C. F. F. Karney, Long-time correlations in the stochastic regime, Phys-

ica D 8, 360 (1983).

[32] S. Tabachnikov, Billiards, Panoramas et Synthese (Society of Math.

Paris, France, 1995), Vol. 1.

[33] N. Haydn, Y. Lacroix and S. Vaienti, Hitting and return times in ergodic

dynamical systems, Ann. Prob. 5, 2043 (2005).

[34] E. Lunedei, Ricorrenze di Poincare in sistemi dinamici a bassa dimen-

sionalita, Ph.D. thesis, University of Bologna, Bologna, Italy, 2006.

[35] M. Henon, A two-dimensional mapping with a strange attractor, Com-

mun. Math. Phys. 50, 69 (1976).

64 Bibliography

[36] C. Polymilis, G. Servizi, Ch. Skokos, G. Turchetti and M. N. Vrahatis,

Local analysis of area preserving maps by topological degree theory,

Chaos 13, 94 (2003).

[37] R. N. Mantegna, S. V. Buldyrev, A. L. Goldberger, S. Havlin, C.-K.

Peng, M. Simons and H. E. Stanley, Systematic analysis of coding and

noncoding DNA sequences using methods of statistical linguistics, Phys.

Rev. E 52, 2939 (1995).

[38] A. Provata and Y. Almirantis, Scaling properties of coding and non-

coding DNA sequences, Physica A 247, 482 (1997).

[39] L. Luo, W. Lee, L. Jia, F. Ji and L. Tsai, Statistical correlation of

nucleotides in a DNA sequence, Phys. Rev. E 58, 861 (1998).

[40] H. E. Stanley, S. V. Buldyrev, A. L. Goldberger, S. Havlin, C.-K. Peng

and M. Simons, Scaling features of noncoding DNA, Physica A 273, 1

(1999).

[41] A. Som, S. Chattopadhyay, J. Chakrabarti and D. Bandyopadhyay,

Codon distribution in DNA, Phys. Rev. E 63, 051908 (2001).

universit a degli studi di bologna - unibo.it · universit a degli studi di bologna facolta di...

Documents