the chemical master equation: from reactions to complex ...the chemical master equation: from...

15
The Chemical Master Equation: From Reactions to Complex Networks Massimo Stella April 21, 2015 Abstract This project investigates the chemical master equation and its links to complex networks. The report is composed of two parts: an introduction, deriving the chemical master equation from some basic results of statistical mechanics and probability theory, and a second part, relating the formalism of master equations to growing network models and random walks on graphs. At the end of the first part, further analytical and numerical results about Markov processes are reported and discussed. 1 The Physics behind the Chemical Master Equation The mathematical modelling of chemically reacting gaseous systems, via the framework of Markovian stochastic processes, relies on some delicate hypotheses from statistical mechanics [3]. In this section, we review these basic results, with the aim of outlining a physically coherent approach to the mathematics of the chemical master equation for chemical kinetics. 1.1 Some Physical Premises Historically, the modelling of chemical reactions as stochastic processes was introduced in [2] and became increasingly popular in the 1950s and 1960s. However, it was only in the nineties, with the work of Gillespie [1], that a rigorous microphysical derivation of such approach was provided, in order to demonstrate its a priori modelling validity. Before that date, in fact, it was possible to perform such fidelity check only a posteriori, through comparisons with real or molecular dynamics experiments [2, 13]. 1

Upload: others

Post on 13-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

The Chemical Master Equation: From Reactions to

Complex Networks

Massimo Stella

April 21, 2015

Abstract

This project investigates the chemical master equation and its links to complex networks. The

report is composed of two parts: an introduction, deriving the chemical master equation from

some basic results of statistical mechanics and probability theory, and a second part, relating the

formalism of master equations to growing network models and random walks on graphs. At the

end of the first part, further analytical and numerical results about Markov processes are reported

and discussed.

1 The Physics behind the Chemical Master Equation

The mathematical modelling of chemically reacting gaseous systems, via the framework of

Markovian stochastic processes, relies on some delicate hypotheses from statistical mechanics

[3]. In this section, we review these basic results, with the aim of outlining a physically coherent

approach to the mathematics of the chemical master equation for chemical kinetics.

1.1 Some Physical Premises

Historically, the modelling of chemical reactions as stochastic processes was introduced in [2]

and became increasingly popular in the 1950s and 1960s. However, it was only in the nineties,

with the work of Gillespie [1], that a rigorous microphysical derivation of such approach was

provided, in order to demonstrate its a priori modelling validity. Before that date, in fact, it

was possible to perform such fidelity check only a posteriori, through comparisons with real or

molecular dynamics experiments [2, 13].

1

Following the physical approach of [1], we use a frequentist probability interpretation, i.e.

probability is the fraction of trials in which an event E occurs. Such approach is viable in the

context of chemical kinetics, were there are very high numbers of molecules engaging in the

very same reactions [3, 5]. In addition, it allows to derive results that should otherwise be

postulated (by using Kolmogorov and De Finetti axioms [4]), such as the following:

1. Addition Law: If events A and B are mutually exclusive (i.e. they never occur at the same

time), then the total probability of “either A or B” is given by P (A[B) = P (A)+P (B);

2. Multiplication Law: The joint probability of two events A and B happening at the same

time is P (A\B) = P (A,B) = P (A)·P (B|A), where P (B|A) is the conditional probability

of B happening, given the occurrence of A.

In our case, events are going to be chemical reactions at molecular level [1]. Therefore, let

us consider a gas comtaining molecules of N 2 N di↵erent species, S1, S2, ..., SN , interacting

through M chemical reaction channels R1, ..., RM and all contained in a recipient of constant

volume V . Let Xi(t) be a variable related to the number of molecules of type Si, in the system,

at time t � 0, with i 2 I := (1, 2, ..., N). We focus principally on the bimolecular elementary

reaction channels of the form Si + Sj ! Sk + ..., with i, j, k 2 I.

We restrict our analysis to close-to-ideal gases in thermodynamic equilibrium. In other

words, we consider the molecules as distinguishable, non-puntiform1 hard spheres, of given

mass and radius, interacting mainly by collisions, with other types of long range interactions

being neglectable in both frequency and intensity terms. Furthermore, the thermodynamic

equilibrium implies the existence of well defined temperature parameter T for the whole system.

Also, it means that Boltzmann’s molecular chaos hypothesis (i.e. Stosszahlansatz ) is valid: the

particle velocities are both uncorrelated and independent of position, mainly because of thermal

fluctuations [5]. These physical premises lead to two mathematical propositions [1, 3]:

• Spatial homogeneity : the probability of finding any randomly selected molecule inside any

subregion �V of the volume V equals �V/V ; in mathematical terms the molecule positions

are independent2 random variables, uniformly distributed over the domain V .

• Maxwell-Boltzmann velocity distribution: denoted as kB Boltzmann’s constant [5], then

1Ideal gases require for particles to be treated as puntiform mass points. Furthermore, the distinguishabilityof particles refers to the possibility of identifying each particle in time, according to its Newtonian trajectory,given an initial “labelling”. This concept looses any validity in quantum mechanics, where there is no quantumcounterpart of the idea of trajectory [5].

2Two random variables X and Y are independent (or pairwise independent) i↵ their joint probability distri-bution factorises, in formulas P (X \ Y ) = P (X,Y ) = P (X)P (Y ) [4].

2

the probability of finding a molecule of mass m with velocity between v and v + dv is3:

pMB(v)dv =

✓m

2⇡kBT

◆3/2

exp

�m |v|2

2kBT

!. (1)

In mathematical terms, the above equation means that each Cartesian velocity component

of a randomly selected molecule is a normally distributed random variable, with zero mean

and variance kBT/m. Additionally, all such components are independent variables.

These two points are often referred to as the system being “well-stirred”, so that molecules

are well mixed though the whole spatial domain and in thermal equilibrium. It has to be

underlined that the above findings emerge from a deterministic chaotic (mixing) behavior of

molecules at microscopic level, in a scenario close to ideality and in thermal equilibrium. It is

ultimately this physical concept of “molecular chaos” that provides the “unreasonable e�cacy”

of a mathematical stochastic tractation of such systems [5, 3, 14].

1.2 Towards the Chemical Master Equation

We want to determine the evolution law for the species population vector4 X(t) = (X1(t), ..., XN(t)),

compatibly with the two above definitions of molecule positions and velocities and focusing on

bimolecular reactions. In order to perform such task, we have to determine the probability

⇡µ(t, dt) that two molecules, randomly selected at time t, react in the next dt time interval,

accordingly to the bimolecular channel µ. However, according to the above physical discussion,

in order for a bimolecular reaction to occur, two (spherical) molecules i and j have to collide

with each other first. Additionally, their collision must be e�cient [1].

Denoted with uµ(t, dt) the probability of a collision (defined analogously to ⇡µ(t+ dt), but

for a collision event) and with Pµ the probability of a chemical reaction to be triggered, then:

⇡µ(t, dt) = uµ(t, dt) · Pµ. (2)

In other words, the probability ⇡µ(t, dt) that an e�cient collision (i.e. a reaction) happens in

the time interval [t, t+ dt) is equal to the product of the collision probability uµ(t, dt) with the

conditional probability Pµ = P (trigger a reaction|collision).3In statistical mechanics, given a Cartesian vector v = (v

x

, vy

, vz

), the di↵erential element dv, sometimesdenoted also as d3v, is equal to dv

x

dvy

dvz

.4Because of the intrinsic stochasticity of our chemical system, we have to consider X(t) as an N -dimensional

random variable, having outcomes o defined on a subset of NN . Rather than considering the time evolution ofX(t), we are more interested in determining the probability P (X(t) = o), evolving over time.

3

In order to compute uµ we can resort to the following:

Theorem 1. [1] Let {Ci}i2N be a set of mutually exclusive and collectively exhaustive events,

partitioning the sample space. Let the event A be mutually exclusive to {Ci}i2N. Then:

P (A) =X

i

P (Ci) · P (A|Ci) (3)

Proof. The Cis represent a partition of the whole sample space, so that actually A can be

decomposed onto the set {Ci} in terms of mutually exclusive subsets, i.e. A = [i(A\Ci). This

means that P (A) = P ([i(A\Ci)) =P

i P (A\Ci), from the addition law. Also,P

i P (A\Ci) =P

i P (A,Ci) =P

i P (Ci)·P (A|Ci), with the last passage being due to the multiplication law.

The above theorem is valid also in the continuous case (i.e. when i is a real index, defined

on the set K), with the sum substituted by an integral, with a proper measure.

We consider Cv0 , v0 2 R3, being the event that two randomly selected molecules (in the

channel Rµ) at time t have a relative velocity v

0=vj�vi. Given the simmetries of the Maxwell-

Boltzman velocity distribution (explicitly depending only on the modulus of velocity), a simple

change of reference frame and the random variable transformation theorem for statistically

independent random variables [3, 4, 5] lead to

P (Cv0) =

✓m⇤

2⇡kBT

◆3/2

exp

�m⇤��v

0��2

2kBT

!, (4)

where m⇤ = mimj/(mi+mj) is the reduced mass of the two reactant molecules (in the channel

Rµ). In the reference frame of the j-th molecule, the i-th molecule moves on the straight

path connecting i and j at speed v

0, covering a length

��v

0�� dt in the time interval [t, t + dt).

Additionally, the two molecules collide when their relative distance is less than or equal to

ri + rj. These two quantities allow to approximate the volume Vint, inside which the two

molecules collide, as a cylinder of radius ri + rj and height��v

0�� dt [1]. Since the collision

probability is ultimately related to the molecule positions in the volume V , because of the

spatial homogeneity premise, then

P (Ac|Cv0 ) =Vint

V=

⇡(ri + rj)2��v

0�� dtV

, (5)

where Ac is a collision event, with probability equal to uµ, in other words P (Ac) = uµ(t, dt).

4

Similarly to Theorem 1, we now use all the collision relevant probabilities to obtain:

uµ(t, dt) =

ˆP (Cv0)P (Ac|Cv0 )dv

0=

1

V

✓8kBT⇡

m⇤

◆1/2

(ri + rj)2dt. (6)

Interestingly, the resulting uµ(t, dt) can be factorised as uµ(t, dt) = aµdt where aµ is in-

dependent on time. It has to be underlined that in computing5 the above integrals, we are

implicitly assuming that the molecule velocities do not change over the infinitesimal amount

of time given by dt, which is actually a reasonable assumption for a gas close to ideality (with

collisions as the only non-negligible intermolecular interactions).

Nevertheless, in order to compute the reaction probability ⇡µ(t, dt) from (2), it is necessary

to compute also the conditional probability Pµ of triggering a reaction in the channel Rµ, given

a collision between two molecules of that channel. Without recurring to quantum mechanics,

our classic framework allows for the description of two “triggering” mechanisms:

1. Directionality : In order for the collision to be e↵ective and trigger the reaction, it has to

bring close enough specific molecular regions. Given the spherical assumption, if those

regions are relative to a solid angle !i and !j for molecules i and j, respectively, then the

collision-conditioned reaction probability can be approximated as Pµ = !i!j/(4⇡)2;

2. Impact energy : Every collision is characterised by the total kinetic energy ✏ of the colliding

molecules. If ✏ is less than a certain threshold ✏µ (relative to the channel Rµ) then new

chemical bounds cannot form and the reaction does not happen. In this case, with some

modifications to the probability apparatus, it is possible to show [3, 1] that the trigger

probability follows the so called Arrhenius law Pµ = exp(�✏µ/kBT ).

In both cases, also Pµ is actually independent on time, therefore ⇡µ(t, dt) = uµ(t, dt)Pµ =

aµPµdt = cµdt where the probability rate6 cµ is indepedent on time, i.e. stationary [14, 6].

1.3 Derivation of the Chemical Master Equation

Interestingly, in a well-stirred, close to ideality, and at thermal equilibrium gas, the bimolecular

channel has a reaction probability quantifiable in a rather simple closed form, i.e. cµdt, with

stationary probability rate cµ [1]. Let us introduce the vectors n = (n1, n2, ..., nN) 2 NN and

5Even if the velocities components should be bounded by the speed of light, actually extending the Gaussianintegrals appearing in u

µ

on the whole R field leads to exponentially low errors, that can be neglected. Fur-thermore, this approximation trivially allows for an analytical solution of the integrals, by using the derivationunder integral rules [5].

6A similar approximation ⇡µ

⇠ ↵µ

dt, with ↵µ

independent on time, can be performed also for monomolecularand trimolecular reactions, but only in specific instances [3].

5

nµ = (nµ1, nµ2, ..., nµN) 2 ZN to address the population number of each species and the change

in the each of the population after an Rµ reaction, respectively. Then, n and n + nµ provide

the molecular populations of each species S1, S2, ..., SN before and after the occurrence of one

Rµ chemical reaction. In addition, each Rµ channel involves a di↵erent number hµ of reactant

molecules, according to the stechiometric coe�cients in the relative chemical equations. For

instance, the channel R↵ : S1 + S2 ! S3 encompasses h↵ = n1n2 di↵erent combinations of

reactant molecules from species S1 and S2. The reactant combination function hµ is evidently

a scalar function of n. Together with the jump vector nµ and with the probability rate constant

cµ, hµ(n) specifies the dynamics of the channel Rµ, with µ 2 (1, ...,M). In fact, we can now

determine the evolution of the species pupulation vector X(t) = (X1(t), ..., XN(t)) over time.

Theorem 2. If X(t) = n, then the probability p1 that only one Rµ reaction occurs in the time

interval [t, t+ dt) is given by hµ(n)cµdt+O(dt2).

Proof. Since the system molecules are distinguishable (according to the Maxwell-Boltzmann

distribution) then it is possible to uniquely label each one of them at time t. This allows

to actually “select two random molecules at time t”. However, each of the hµ(n) distinct

combinations of Rµ reactant molecules in the system has a nonzero probability, equal to cµdt,

of reacting according to Rµ, in the time interval [t, t + dt). The complementary event of the

Rµ reaction not happening has probability 1 � cµdt, in the same time interval. This sets a

Bernoulli process-like instance, in which the multiplication law implies that the probability

that a particular one of the hµ(n) reactant combinations participates in a Rµ reaction while

the other hµ � 1 combinations do not, is cµdt(1� cµdt)hµ(n)�1 = cµdt+O(dt2).

In order for such process to be a Bernoulli one, the above probability has to include also a

normalisation quantity, which relates to the request that any one of the combinations reacts

alone, in the very same infinitesimal time interval. Then, the addition law gives:

p1 = hµ(n)[cµdt+O(dt2)] = hµ(n)cµdt+O(dt2). (7)

From the multiplication law, a corollary of the above theorem is that the probability for

k � 2 reactions to occur in [t, t + dt) is actually of order O(dt2). The case of no reactions

happening is quantified by the following:

Theorem 3. If X(t) = n, then the probability p0 that no reaction occurs in the time interval

[t, t+ dt) is given by 1�P

µ hµ(n)cµdt+O(dt2).

6

Proof. [1] Let us underline that we have to consider only terms of order dt. As stated in the

previous proof, each of the hµ(n) combinations of Rµ reactant molecules has a probability

1� cµdt of not occurring in [t, t+ dt). Because of the moltiplication law, then, the probability

of no reaction occurring in channel Rµ is simply (1� cµdt)hµ(n) = 1�hµ(n)cµdt+O(dt2), where

we used a Taylor expansion. The joint probability that no reaction occurs in any of the M

available channels is, once again, provided by the multiplication law, as

p0 =MY

µ=1

⇥1� hµ(n)cµdt+O(dt2)

⇤= 1�

MX

µ=1

hµ(n)cµdt+O(dt2). (8)

The above two theorems constitute a first order (in time) machinery entirely built on physical

premises, which provides a deterministic analytical description about the probabilities regulat-

ing X(t), rather than X(t) directly. Let us fix the initial population vector n0 = n(t0) at the

initial time t0 and let us introduce the transition probability P (n, t|n0, t0) as the probability

that X(t) = n, given that X(t0) = n0, for t � t0. In order to obtain a continuous evolu-

tion dynamics, we have to relate the transition probabilities before and after an infinitesimal

amount of time, encompassing also the initial conditions [3]. In formulas, we have to relate

P (n, t + dt|n0, t0) to what might happen in [t, t + dt), namely to the occurrance of one of the

following mutually exclusive events: “no reaction”, “one reaction”, “more than one reaction”.

Since P (n, t+dt|n0, t0) implies thatX(t0) = n0 andX(t+dt) = n, in case no reaction occurs

in [t, t + dt), then the species populations are unaltered, in formulas X(t) = n. However, we

defined the probability of transitioning from X(t0) = n0 to X(t) = n as P (n, t|n0, t0), therefore

the probability to further transition to the state X(t+dt) = n is given by the following product:

P (n, t|n0, t0) · 1�

MX

µ=1

hµ(n)cµdt+O(dt2)

!, (9)

where the second factor is probability that no reaction occurs in [t, t+ dt), from Theorem 3.

Since the system has to transition to a state with X(t+ dt) = n, in case only one reaction

from channel Rµ occurrs in [t, t + dt), then the species population must start from a state

X(t) = n�nµ. Denoted with P (n�nµ, t|n0, t0) the probability of transitioning fromX(t0) = n0

to X(t) = n� nµ, then the probability to further transition to X(t+ dt) = n is:

P (n� nµ, t|n0, t0)�hµ(n)cµdt+O(dt2)

�, (10)

7

where the second factor is the the probability that one Rµ reaction occurs in [t, t + dt), from

Theorem 2. Straightforwardly from the same theorem, any probability contribution coming

from the “more than one reaction occurs” case is of order O(dt2) [1, 3].

Because of the mutual exclusivity of the above three events, we can finally quantify all the

contributions to P (n, t+ dt|n0, t0):

P (n, t+dt|n0, t0) = P (n, t|n0, t0)· 1�

MX

µ=1

hµ(n)cµdt

!+P (n�nµ, t|n0, t0) (hµ(n)cµdt)+O(dt2).

(11)

Subtracting P (n, t|n0, t0) on both sides, dividing by dt and taking the limit dt ! 0 retrieves

the so called chemical master equation [1, 3, 13, 2]:

@

@tP (n, t|n0, t0) = (hµ(n)cµ)P (n� nµ, t|n0, t0)�

MX

µ=1

hµ(n)cµ

!P (n, t|n0, t0), (12)

with initial condition P (n, t = t0|n0, t0) = 1 if n = n0 and P (n, t = t0|n0, t0) = 0 other-

wise. Interestingly, this di↵erential equation can be interpreted as a balance equation for the

probability of each discrete state X(t) = n [6]. In fact, the probability evolution over time

has to keep into account the “gain” due to transitions from other states with X(t) = n � nµ,

while the second term represents the “loss” due to transitions into other states, both of the

terms physically originating from chemical reactions. Interestingly, the physical eventuality

of no chemical reaction occurring over a given time period shapes the typical path of X(t),

consisting of piecewise traits (which are constant for the discrete state case) interspersed with

discountinous “jumps” (which can be present also in the continuous state case). Because of

this, the more general class of Markovian processes described by a gain-loss master equation

are also referred to as “jump processes” [6].

As an example, we simulated a rather simple chemical reaction network, with two species

A and B and two channels; in details, a bimolecular degradation reaction, A + Bk1! B,

coupled with a synthesis reaction, ; k2! A. Assuming the chemical system was well-stirred,

at thermal equilibrium and close to ideality, we quantified the probability (i.e. propensity)

of the degradation and of the synthesis to occur as ↵deg = hdegcdeg = A(t)B(t)kdeg/V and

↵syn = hsyncsyn = ksynV , respectively, where X(t) = (A(t), B(t)) is the species population

vector at time t and V is the system volume. In predicting the stochastic dynamic of this

chemical network, we did not explicitly use its associated chemical master equation but its

equivalent algorithmic formulation, instead, i.e. Gillespie’s Stochastic Simulation Algorithm

8

(SSA) [1, 12]. We implemented the SSA in Mathematica 9, with parameter values V = 1m3,

kdeg/V = 0.05 s�1, ksynV = 2 s�1, t0 = 0, A(0) = 5 and B(0) = B(t) = 1 for t � t0. For our

chemical network, analytic results [13] predict a Poisson stationary distribution ⇧(n) for the

probability of having A(t) = n at time t � t0; in formulas

⇧(n) =1

n!Mn

A exp(�MA) MA =ksynV

2

kdegB(0)(13)

with MA = 40 being the average number of molecules of A at time t � t0 in our case. Even

if our simulations are not numerically intensive, they corroborate the analytical convergence of

Gillespie SSA to the exact results, derived from the chemical master equation [12, 3, 13].

Figure 1: SSA simulation for our chemical network toy model. Top: an ensemble of 5 discrete “jump”

trajectories of the number of A species molecules A(t) over time t. The trajectories evidentiate the

convergence of the fluctuations around the value MA = 40, starting from the intial condition A(0) =

5. Bottom: Normalised stationary distribution for (# of trajectories,# of transitions,MA) equal to

(5, 300, 39.78) (left), (20, 1000, 40.28) (center) and (30, 1500, 40.14) (right). Even if our numerical

results are not numerically intensive, they suggest the (analitically proven) convergence of Gillespie

SSA to the chemical master equation stationary distribution [13].

1.4 Beyond the Chemical Master Equation

From a purely mathematical point of view, master equations are a particular case of the

Chapman-Kolmogorov equation for Markov processes [14, 6, 4]. In case of continous Markov

chains, the Chapman-Kolmogorov equation relates the transition probability from state y at

9

time t0 to state x at time t by integrating over all possible intermediate transitions y ! z ! x

at any time t0 < t1 < t, analogously to what we did in deriving the chemical master equation.

In formulas, the Chapman-Kolmogorov equation can be stated as [14]:

P (x, t|y, t0) =ˆ

dzP (x, t� t1|z, t0)P (z, t1|y, t0) (14)

The chemical master equation represents a first order approximation in time to the evolution

of transition probabilities for the chemical species populations. Such finding [3] is analogous to

the property of a class of continuous Markov processes, in which a time interval dt corresponds

to an O(dt) displacement x� y, with the following features

ai(x, dt) =

ˆdy(yi�xi)P (y, dt|x, t0) = O(dt) bij(x, dt) =

ˆdy(yj�xj)(yi�xi)P (y, dt|x, t0) = O(dt),

(15)

and also with negligible higher order terms. For such Markov processes, a Kramers-Moyal

expansion [3, 6] of the Chapman-Kolmogorov equation (namely a Taylor expansion in x � y

with t1 = t0 + dt) leads to the celebrated Fokker-Planck equation, largely recurring in the

Brownian motion and in many other di↵usion related processes [6, 14, 9]:

@PT

@t= �

NX

i=1

@

@xi[fiPT ] +

1

2

NX

i,j=1

@2

@xj@xi[QijPT ] , (16)

where PT = P (x, t|y, t0), while fi = limdt!0 ai/dt and Qij = limdt!0 bij/dt. In case the Qij

are independent on x, it can be shown that the Fokker-Planck equation, with t0 = 0, rules the

evolution of the probability density ⇢(x, t) associated with the stochastic process [14, 6]

xi(t+ dt) = xi(t) + fi(x(t))dt+pdt⌘i(t), (17)

where ⌘i(t)s are zero mean Gaussian random variables with h⌘i(t+ ndt)⌘j(t+mdt)i = Qji�nm.

Such relationship is fundamental for many simulation techniques of stochastic processes [6, 3].

In the limit dt ! 0, the above stochastic equation leads to the so called Langevin equation [6],

which is a stochastic di↵erential equation

dxi

dt= fi(n) + ⌘i(t), (18)

with many applications in synchronisation theory [14] and which comprises multi-variate Gaus-

sian white noise (h⌘i(t)i = 0,⌦⌘i(t)⌘j(t

0)↵= Qji�(t� t

0) and Q = {Qij} positive definite).

10

2 From Theoretical Chemistry to Complex Networks

In the last few decades, the challenge of tackling complexity, in real world systems, required

the development of a multidisciplinary field, an “umbrella” encompassing and combining tech-

niques from di↵erent disciplines, spanning from mathematics to physics, from social sciences to

economics [9]. It is in this broader context of complexity science that network theory developed,

mainly drawing tools from graph theory, statistical mechanics and probability. Accordingly, it

is not a surprise that master equation approaches are widely used on networks [11, 10, 7, 9].

2.1 Growing Exponential Networks

Rigorously, a network is the physical representation of reality having the topological properties

of a finite graph G = (V,E), which is formally a finite set V of N 2 N vertices (or nodes)

connected by a set E of edges. For instance, the Internet can be represented as a network of

routers connected by wires, according to a given topology [9, 8]. In the following, however, we

use network as a synonym of graph. The network connectivity is contained in the adjacency

matrix A = {Aij}i,j=1,...,N . For an undirected, simple, loopless network, Aji = Aij = 1 if nodes

i and j are connected, while Aji = Aij = 0 otherwise. Let us define the degree ki of a node i

as the number of its connected first neighbors, in formulas ki =P

j Aij.

Let us discuss a simple model of a network having the above properties and growing in size

over time [9, 11]. The model starts with an initial configuration having one node only, at time

step t0 = 1. At each subsequent discrete time step, a new vertex is added to the network and

it is connected purely at random to one older vertex. Therefore, at time step t the network

consists of t nodes and t� 1 links. The degree distribution of such a network can be retrieved

by a master equation approach [11]. Let b be the birth (i.e. insertion) time of a node inside

the network. Within the network dynamics, each node i transitions from a degree equal to 1 at

b = bi to a degree k = ki at time t > b. Let P (k, t|1, b) be the conditional probability of such a

transition. Then the following discrete-time master equation holds

P (k, t+ 1|1, b) = 1

tP (k � 1, t|1, b) + (1� 1

t)P (k, t|1, b), (19)

with the initial condition P (k, 1|1, b) = �k,1. On the right-hand side, the above equation keeps

into account only “gain” terms to the (left-hand sided) fraction of nodes having degree k at

time t + 1. The first term represents the probability of a vertex, originally with degree k � 1,

to receive a connection, with uniform probability 1/t, from the new node added at time t. On

11

the other hand, the second term is the probability for a node, already of degree k, to keep its

degree fixed by not receiving any new connection (with probability 1 � 1/t). Once again, a

frequentist probability interpretation quantifies the probability of finding a node with degree k

at time t, i.e. the degree distribution p(k, t), as the fraction of nodes having degree k at time t

[9]. In formulas, from the addition law we obtain:

p(k, t) =1

t+ 1

tX

b=1

P (k, 1|1, b) (20)

Performing a sum over the di↵erent values of s in both the sides of (19) and using the degree

distribution definition, we obtain the following:

(t+ 1)p(k, t+ 1)� tp(k, t) = p(k � 1, t)� p(k, t) + �k,1 (21)

Compatibly with the linear growth of the number of both nodes and links, it is possible to show

[11] that p(k, t) grows linearly in time and it factorises in p(k, t) ⇠ tp(k) in the t � 1 regime.

This implies that, in the same regime, the above master equation approximates a recurrence

equation for the stationary degree distribution p(k) [11]:

2p(k)� p(k � 1) = �k,1 ) p(k) = 2�k (22)

Therefore, this simple model of growing network displays an exponentially decreasing probabil-

ity of finding a node of degree k, which is rather unrealistic for many real-world networks [9]. A

similar master equation approach can be adoperated also in less trivial scenarios. For instance,

in a model equivalent to the above one, except for the presence of a preferential attachment

procedure [9] (where each node at time step t receives the connection from the newly inserted

vertex with a probability proportional to its degree, i.e. ki/2t), it is possible to write down a

master equation similar to (19):

P (k, t+ 1|1, b) = k � 1

2tP (k � 1, t|1, b) + (1� k

2t)P (k, t|1, b), (23)

with the same initial conditions and with a power-law degree distribution p(k) / k�3. This

model with preferential attachment leads to a scale-free degree distribution (i.e. p(k) satisfies

the functional equation f(ak) = bf(k) with a, b 2 R) and it is also known as the Barabasi-

Albert model [9, 10, 11]. According to several empirical findings, many real-world networks

seem to display the scale-free property, even though such finding is currently debated [8, 9].

12

2.2 Random Walks on Networks

On a given network, it is possible to perform a time discrete random walk, with a walker

transitioning from node i, at time step t, to one of i’s neighbors j, at time step t+1, uniformly

at random. Similarly to the chemical reactions scenario, we want to derive a master equation

regulating this stochastic process. Given the uniform node hopping, the transition probability

P (j, t+1|i, t) = Pi!j = Pij is evidently stationary and equal to �iAij/ki, with �i normalisation

constant. Notice that, as for molecular collisions, also this random walk is a Markovian (i.e.

memoryless) process [10]. For finite heterogeneous networks7 with arbitrary degree distribution

p(k), it is possible to derive a master equation for the more interesting transition probability

P (i, t|i0, 0) of a random walker starting at node i0 at time t0 = 0 and visiting node i at time

step t as [7]:

@

@tP (i, t|i0, 0) =

NX

j=1

PjiP (j, t|i0, 0)�

NX

j=1

Pij

!P (i, t|i0, 0). (24)

In the above master equation, the first term on the right side quantifies the “gain” probability

of moving to node i from every other network node in one hop (the presence of Aji in Pji

weighting the sum only on i’s neighbors), including also the initial condition, while the negative

term constitutes the total probability of moving out from i, to any of its first neighbors (always

including the initial condition). It can be rigolously proven that [10, 7], for such a “regular”

random walk on finite networks, without sinks or sources, the P (i, t|i0, 0)s identify an ergodic,

irreducible and aperiodic Markov chain [4], which admits a unique stationary probability vector

Psta = (P (1)sta , ..., P

(N)sta ), whose generic component P (i)

sta quantifies the probability for a walker to

be in node i, as

P(i)sta =

kihki

1

N, (25)

where hki is the average node degree in the given network. This analytic finding implies that

a random walker visits “more often” nodes with higher average degree. Furthermore, the

hopping probabilities Pijs can be used also to compute the mean first passage time hTii of

node i, i.e. the average number of time steps for a walker to leave from i and come back to it

7The definition of “heterogeneous” network is rather delicate, since it expresses the presence of di↵erent sta-tistical properties between nodes. In [7] the authors referred to “heterogeneous” networks as to graphs havingnodes with di↵erent node degree, with nodes of the same degree having also the same statistical properties.However, this assumption neglects other higher-order correlations (i.e. assortativity, etc.) arising at meso-scopic levels and found in real world networks [9]. Even if additional techniques for a better quantification ofheterogeneity have been proposed [8], we still use “heterogeneous” in the degree-based sense of [7].

13

[10]. For heterogeneous finite networks, it is possible to show [7] that hTii is actually equal to

1/P (i)sta, which is intuitively compatible with the uniformity of the random walk. Additionally,

for su�ciently homogeneous scale-free networks, with degree distribution p(k) / k��, the

probability P (t1) to perform a first passage in t1 time steps follows a power-law, i.e. P (t1) /

t�(2��)1 , with � being tipically between 2 and 3 for most technological and social real-world

networks [9].

3 Conclusions

In the first section of this project we derived the chemical master equation for chemical reactions

in a “well-stirred” gas, close to ideality and at thermal equilibrium. We discussed the physical

meaning of many mathematical findings of our approach, underlining also the importance of the

molecular chaos hypothesis for the system to be e�ciently described as a Markov process. In

the same section, we performed and discussed numerical experiments on a simple bimolecular

reaction network, according to Gillespie’s stochastic simulation algorithm. We also briefly linked

the master equation formalism to the Chapman-Kolmogorov equation and to the Langevin one.

Instead, in the second section we reviewed the master equation formalism in two di↵erent

areas of network theory, namely growing network models and random walks on networks, dis-

cussing closed form analytical results for quantities such as the degree distribution or the mean

first passage time.

All in all, our review denotes the master equation, together with its simulation techniques,

as a powerful mathematical tool, with solid physics foundations, that can be successfully applied

to a variety of systems and models, inside the fascinating panorama of complexity science.

References

[1] D. T. Gillespie, A rigorous derivation of the chemical master equation, Physica A,

188 (1992).

[2] M. Delbruck, Statistical fluctuations in autocatalytic reactions, The Journal of

Chemical Physics, 8 (1940).

[3] D. T. Gillespie, Markov Processes: an Introduction for Physical Scientists, Aca-

demic Press 1992.

14

[4] G. Grimmett, D. Stirzaker, Probability and Random Processes (3rd Edition), Ox-

ford University Press (2001).

[5] L. D. Landau, E. M. Lifshitz, Statistical Physics Vol. 5 (3rd Edition), Butterworth-

Heinemann (1980).

[6] C. W. Gardiner, Handbook of Stochastic Methods (3rd Edition), Springer (2004).

[7] J. D. Noh and H. Rieger, Random Walks on Complex Networks, Physical Review

Letters, 92 (2004).

[8] E. Estrada, Quantifying network heterogeneity, Physical Review E, 82 (2010).

[9] M. J. Newman, Networks: An Introduction, Oxford University Press (2010).

[10] A. Barrat, M. Barthelemy and A. Vespignani, Dynamical Processes on Complex

Networks, Cambridge University Press (2008).

[11] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks, Advances in

Physics, 51 (2002).

[12] R. Erban, S. J. Chapman and P. K. Maini, A Practical Guide to Stochastic Sim-

ulations of Reaction-Di↵usion Processes, CoRR (2007).

[13] R. Erban and S. J. Chapman, Stochastic modelling of reaction-di↵usion processes:

algorithms for bimolecular reactions, Physical Biology, 6 (2009).

[14] M. Cencini, F. Cecconi and A. Vulpiani, Chaos: From Simple Models to Complex

Systems, World Scientific (2010).

15