modelling biologyulrichw/mathstat/mathiii.pdf · modelling biology basic applications of...

62
Models in Biology Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory Course for Students of Biology, Biotechnology and Environmental Protection Werner Ulrich UMK Toruń 2008

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology

Modelling Biology

Basic Applications of Mathematics and Statistics in the Biological Sciences

Part I: Mathematics

Script C

Introductory Course for Students of

Biology, Biotechnology and Environmental Protection

Werner Ulrich

UMK Toruń 2008

Page 2: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

2 Models in Biology

Contents Introduction .................................................................................................................................................. 3

1: How to build a model ............................................................................................................................... 4

2. From Euclidean to fractal geometry ....................................................................................................... 10

3: Biological growth processes ................................................................................................................... 18

4: Models of competition and predation ..................................................................................................... 26

5: Models in biochemistry .......................................................................................................................... 34

6. Markov chains ........................................................................................................................................ 41

7. The Weibull function and life table analysis .......................................................................................... 48

8. Basic models in genetics ........................................................................................................................ 53

Literature .................................................................................................................................................... 59

Online archives and textbooks .................................................................................................................... 60

Mathematical software ............................................................................................................................... 61

Important internet pages ............................................................................................................................. 62

Latest update: 10.01.2008

Page 3: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 3

Introduction

The following text is the third part of a lecture in basic mathematics biologists. The whole lecture con-

tains what might be considered an international standard of basic knowledge although many readers will surely

miss important branches. This part deals with the application of mathematics in biology. It focuses on model

building and interpretation. Again, many examples are included that show how to program simple tasks with a

spreadsheet program and how to use advanced mathematics software.

The following text in not a textbook. It is intended as a script to present the contents of the lecture in a

condensed form. There is no need to write a textbook again. Today, the internet took over many former tasks

textbooks had. The end of this text contains therefore a small overview over important internet pages where

students can find mathematics glossaries, textbooks, and program collections.

Page 4: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

4 Models in Biology

1. How to build a model

In this lecture we will learn how to build simple biological models using a spreadsheet program. We will

see why this is necessary and how to use such models.

Why is it necessary to build mathematical models using experimental or observational data? There are

several reasons for this. Biology has transformed from natural history (history!) to an explanatory science.

It not only tries to describe phenomena in nature it tries to understand causes and relations. For this task we

have to structure our observations and to look for relations between them. This is exactly the modelling proc-

ess: we use the science of structures, mathematics, to uncover hidden patterns and relations. Modelling is

therefore more than finding out whether sample means differ or whether we have simple correlations between

data. We have to parameterize these relations. But models have many other tasks. First of all, they generate

new predictions about nature, predictions that then have to be verified or falsified. This prediction generating

feature is of course also a method to verify our model. Secondly, good models allow predictions to be make

about the future. This is a main aim for all environmental models. They are designed to predict the future of

populations, ecosystems and biodiversity. At last models reduce the chaos in our data and allow the develop-

ment of new theories and concepts.

Models can be classified into certain classes. On one end of a continuum we have verbal models stating

more or less precisely relations between a set of variables. These verbal statements may be incorporated into

diagrams where the variables are connected by arrows. Then, we have a qualitative model. On the other end,

there are explicit mathematical models that formalize relations. These relations may be fully parameterized

and then we have a quantitative model that gives quantitative predictions about variable states. At last, models

may contain exact parameter values at all stages of computation. We speak of deterministic models because

all future states of the models can in principle be computed by the initial set of values. On the other hand, the

model might contain more or less stochastic variables, variables that are driven by random events. In this case,

future parameter values are less sure or even chaotic. In this case we speak of stochastic models.

This short discussion indicates already what we need to build a model. This discussion is visualized in

Fig. 1.1

• The first step is that we have a theory. Shortly speaking, a theory is a set of hypotheses stated in a

formal language. We need hypotheses about nature and the relations between certain variables.

Modelling without a priori theoretical reasoning will lead to nothing. Our a priori experience

must lead to a selection of variables, so-called drivers, of the model. These drivers might have

explicit or stochastic values. They might be parameterized (characterized by explicit values or

functions) or not. In the latter case the model itself should assign values or value ranges to these

parameters.

• Then, we have to collect the necessary data. These data have to match the requirements of our

theory. Making experiments or observations without an explicit theory in mind will very often

result in large sets of data without any value because afterwards we (suddenly) notice that one or

another important variable had been ignored and not measured or that our method was inappro-

priate to incorporate the variable values of the model. This latter case occurs very often if we

took to few replicates and the variability in measurement is too high. Problems also arise if we

Page 5: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 5

used different methods for observations and we

later notice that these differences make it impos-

sible to compare the data (for instance because

they differ in the degree of quantitativeness).

• In a next step we have to confirm assumed rela-

tionships between these drivers. We might as-

sign qualitative or quantitative relations. If we

quantify the relations (for instance from a re-

gression analysis) we parametrize these rela-

tions.

• Then, we have to formalize the relations. This is

best done by a flow diagram or flow chart.

The flow diagram forces us to write each rela-

tion and each step of the model explicitly. This

step often uncovers smaller or larger errors in

our initial model that would have remained undiscovered in a purely verbal model formulation.

Making flow diagrams learns us thinking hard!

• The following step is then a technical one. Rewriting our flow diagram into a computer algo-

rithm. For more complicated models this should be a done using a common computer language

like C++, Pascal, R or Fortran, simple models can be written via a spreadsheet program like Ex-

cel.

• Our model will generate a set of output variables or whole classes of relations. We have to check

these parameters, whether their values are realistic, whether they correctly predict real values and

whether they are able to predict the future.

• At the end we have to modify our model in the light of its predictions and variable states.

Let’s exemplify the above steps of modelling from a simple example with real data. We measured the

population densities of a parasitic wasp species of the hymenopteran genus Aspilota during a series of genera-

tions. Aspilota is a group of small braconid wasps that predominantly develop as internal parasitoids of necro-

phagous flies of the family Phoridae. It is a very abundant and species rich genus. Our initial assumption is that

the population densities should be influenced by the densities of its host species and by a set of weather vari-

ables. Additionally, we assume that the densities of the previous generation should also influence wasp densi-

ties because high or low previous densities should find their expression in reproductive rates. By this, we ver-

bally stated an initial theory and pointed to a set of interesting variables. These variables are Dwasp, Dhost, and

climatic variables. What climatic variables? To allow a model to be constructed we must specify the variables

and their way of influencing. From previous studies and a literature survey we decide to recognize five climatic

input variables, precipitation CP, cloudiness CC, air temperature CT, relative atmospheric humidity CH, and Year Year Hosts

CT CH CP CS CC CT CH CP CS CC previous Gen. following Gen.1980 15 77 111 157.2 71.25 116 7 1

1981 1.9667 80 58.9333 51.8 77.5 1981 15.4 76 165.3 133.8 75 120 3.8 0.21982 -0.9 79 32.9667 96.6 60 1982 16.7 71 54.5 193.9 62.5 260 3.2 0.31983 2.967 78.667 52.9333 67.7333 76.25 1983 16.8 65 34.9 192.2 62.5 191 3.8 2.81984 1.7667 76.333 43.5666 75.6 67.5 1984 14.2 72 52.4 146.9 76.25 148 0.1 0.11985 -1.3889 77.667 29.2667 77.5 71.6667 1985 13.4 76 139.7 135.8 76.25 56 0.8 0.11986 -0.467 78 54 62.3 74.1667 1986 16.6 70 50.6 227.8 57.5 178 14.5 12.31987 -2.489 81.333 62.9333 79.4 70 1987 14.2 78 100.5 125.2 80 70 10.3 8

Mean values of climatic factors (January to March) Mean values of climatic factors (June) Aspilota

Fig. 1.1 Theory

DriversParameters

Data

Quantifi-cation

Functions

Flow chart

ComputeralgorithmOutput

New predictions

Page 6: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

6 Models in Biology

total hours of sunshine CS. We incorporate into our theory that these climatic drivers affect larval mortality

mainly during the periods of activity of the insect and in the winter. By this we state a preliminary theory about

main factors influencing wasp densities.

Now, we stated our initial hypotheses sufficiently precise and are able to gather the necessary data. A

part of the data are shown in the next Table. Immediately we notice one problem of the data set. We have in

total 12 input variables but only 8 years of observation (16 generations). In reality the number of observations

should always be much larger than the number of input variables, but in any case at least as large. In our case,

we will deal with generations and the data set is slightly larger than the number of input variables.

We will try (with caution) to find out whether the climatic variables and the previous generation deter-

mine the following wasp generation. In a first step we develop a series of equations that try to describe the in-

terdependencies of our variables. This can either be done by a try and error methods, by predefined hypothesis

taken from other studies or by special statistical techniques which we will discuss in the statistics part

We have now two functions, one for the first (Gen1) and one for the second generation (Gen2) of the

wasp species. We find that probably cloudiness, hours of sunshine and rel. humidity are of minor influence.

The most probable hypothesis concerning the influencing variables includes temperature and precipitation dur-

ing the activity period and the winter, the number of hosts, and the wasp density of the previous generation.

This is of course a first model. We reduced the whole set of data into two

equations that show us that probably the main influencing factors for wasp densi-

ties are temperatures and precipitations during the activity period as well as pre-

vious wasp densities. All other variables and astonishingly host densities (Pho)

seem to have only minor effects and should be left out. The coefficient of deter-

mination is high indicating that about 80 to 90 percent of the observed variance

in wasp density could be explained by the included variables. We also infer that

high temperatures obviously hamper

wasp development or activity. Pre-

cipitation has once a positive and

once a negative effect. Our regression

model provides us therefore immedi-

ately with some important and not

foreseeable hypotheses about our

wasp population.

We identified the main input drivers

of our future model. We can now try

to develop an explicit flow chart. For

flow charts apply several conventions

of which the most important are

shown beside. They allow construct-

ing charts that explicitly show every

1 97.6 7.0 0.61 0.92 2G 2 55.7 3.5 0.05 0.99 1

So Wi

Ju Ju

Gen CT CP Genen CT CP Gen

= − + −= − − +

Get data

Counter

Start

Y=f(x)

n > max

Writeoutput

Y = value?

Z=g(x) H=h(x)

Stop

Start main

Compute

Stop main

Case option

Writeoutput

Loop test

Casestatement

Get data

Counter

Fig. 1.2 Fig. 1.3

Page 7: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 7

Start

GetGen2, Pho

Ran(CT, CP, Pho)Gen2=f(Gen2)

Gen1 = f(CT,CP,Gen2,Pho)

[IF (Gen1 < 0) minIF (Gen1 > Pho) Pho]

Ran(CT, CP, Pho)Gen1=f(Gen1)

Gen2 = f(CT,CP,Gen1,Pho)

[IF (Gen1 < 0) minIF (Gen1 > Pho) Pho]

Loops

StopFig. 1.4

AB

CD

EF

GH

1A

spilo

ta m

odel

2G

en2

Pho

CT(

Sum

)C

T(Ju

)C

P(W

in)

CP

(Jun

)3

Star

ting

cond

ition

s10

100

4m

ax17

.716

.882

785

min

1513

.478

656

Itera

tion

7C

ondi

tions

+C3

=+($

D$3

-C7)

/$D

$3=+

LOS

()*(

E$4-

E$5)

+E$5

=+LO

S()*

(F$4

-F$5

)+F$

5=+

LOS

()*(

G$4

-G$5

)+G

$5=+

LOS(

)*(H

$4-H

$5)+

H$5

81

Com

pute

=97.

6-7*

E7+

0.61

*G7-

0.92

*C7*

D7

9C

heck

=JEŻ

ELI(C

8<$D

$3;C

8;$D

$3)

10C

heck

=JEŻ

ELI

(C9>

0;C

9;1)

=+($

D$3

-C10

)/$D

$311

1.5

Com

pute

=55.

7-3.

5*F7

-0.0

5*H

7+0.

99*C

10*D

1012

Che

ck=J

EŻE

LI(C

11<$

D$3

;C11

;$D

$3)

13C

heck

=JEŻ

ELI

(C12

>0;C

12;1

)14

=+A

8+1

Con

ditio

ns;=

+C13

=+($

D$3

-C14

)/$D

$3=+

LOS(

)*(E

$4-E

$5)+

E$5

=+LO

S()*

(F$4

-F$5

)+F$

5=+

LOS

()*(

G$4

-G$5

)+G

$5=+

LOS(

)*(H

$4-H

$5)+

H$5

15C

ompu

te=9

7.6-

7*E

14+0

.61*

G14

-0.9

2*C

14*D

1416

Che

ck=J

EŻE

LI(C

15<$

D$3

;C15

;$D

$3)

17C

heck

=JEŻ

ELI

(C16

>0;C

16;1

)=+

($D

$3-C

17)/$

D$3

18=+

A11

+1C

ompu

te=5

5.7-

3.5*

F14-

0.05

*H14

+0.9

9*C

17*D

1719

Che

ck=J

EŻE

LI(C

18<$

D$3

;C18

;$D

$3)

20C

heck

=JEŻ

ELI

(C19

>0;C

19;1

)

Page 8: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

8 Models in Biology

step of the program. Now, we develop such a chart for our problem. We begin with a start button. Next, we set

Gen2 to a start value. Weather conditions are unforeseeable and we model this by letting these conditions fluc-

tuate at random between observed upper and lower limits. Next we compute the density of the second genera-

tion according to the multiple regression equation above. But we have to modify the equation. Wasp densities

can’t be higher than a certain upper limit, the carrying capacity. This limit is set by the number of host species,

the Phoridae (Pho). For simplicity, we assume these densities to be more or less constant. Additionally, the

reproductive rates should decrease at very high densities (according to the logistic growth equation). To model

this we use this logistic growth equation instead of Gen1 or Gen2. We set

Now, we look at the result of our computation. We use

Gen2 for further computation or we first adjust Gen2 if

the values are below 0 or above the carrying capacity. The

whole procedure is repeated for the 2. generation. At the

end of the model should be a loop that controls the num-

ber of iterations to be done. The whole flow chart is

shown in Fig. 1.4. The Table on the previous side shows

an Excel solution for this problem. We have the random

weather conditions (green) and the data checks (blue).

Computing the densities is shown in the yellow cells. Ad-

ditionally, we have a generation counter. This counter

helps us later sorting our data and making a plot.

Now we have to analyze our data. First of all, does our

model give realistic results? To check for this, we plot the result of 1000 model runs against the generation

counters (separately for the first and the second generation). We detect three main features (Figs. 1.5 and 1.6).

First of all, the model predicts densities and variability of the spring generations to be higher than that of the

summer generations (20 ± 7 to 15 ± 5 individuals m-2). Because we used real data we can compare this predic-

tion with reality. Indeed the measured densities of the spring generation of Aspilota were about 20% higher

than that of the summer generation and the variability about two times higher. However, the predicted mean

densities are in both cases about two times too high.

The model predicts correctly upper density boundaries of about 25 to 35 ind. m-2. Indeed, the real Aspi-

lota densities were during an 8 year study period always below 30 individuals per m2.

We also note three occasions were the predicted density fall below zero. Our model population would

1, 21, 2 1,2 Pho GenGen GenPho−⎛ ⎞= ⎜ ⎟

⎝ ⎠

0

5

10

15

20

25

30

0 500 1000Summer generation

Den

sity

05

1015202530354045

0 500 1000Spring generation

Den

sity

-505

101520253035

0 500 1000

Summer generation

Den

sity

-505

1015202530354045

0 500 1000Spring generation

Den

sity

Fig. 1.5

Fig. 1.6

Fig. 1.7 Fig. 1.8

Page 9: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 9

die out. From this we can predict an extinction probability of 3 / 1000 = 0.003. In reality extinction probabili-

ties were estimated by two different methods and expected to be above 1% per year. Our model predicts too

low extinction probabilities. Why? To answer this question we have to look to our model drivers. We simpli-

fied our model by holding the host density Pho constant. How do the model predictions change if we let Pho

changing in a similar way as the climatic factors, randomly between measured upper and lower limits. This is

shown in Figs. 1.7 and 1.8. Suddenly, the situation changes. First of all, predicted mean densities are lower than

before. Now, they resemble the observed densities but are still about 30% too high. The number of zero counts

increased markedly. The Figure shows an example with 12 such counts but from about 10000 generations com-

puted I inferred a mean of about 30 extinctions per 1000 generations. We expect therefore a local extinction

rate (just by chance) of about 3%. This prediction is again in line with the estimate obtained by other methods.

The latter modification also points to the main driver of our model. It is the host density. This was not

to foresee from our initial equations. However, remember that it is only a model. The next and decisive step is

to obtain a series of exact predictions that can be tested with field data. For such a test more independent data

on population densities of both wasp and host species would be necessary.

Page 10: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

10 Models in Biology

2. From Euclidean to fractal geometry

Look at the following figure 2.1. I tried to measure the length of the boundary of Europe. This seems a

silly task but let’s try. We take a map and measure the length of the countries’ boundaries and all the coastlines.

How exact can we measure? This depends of course on our unit of measurement. If we take 1000 km as a basic

unit our estimate will be quite misleading. A unit of 1 m would give a very exact result but is impossible to

obtain. But in theory it would even be possible to use 1 mm as a base. Then we would have to consider every

small pebble that influences the boundary length. Now, let’s plot the boundary length of Europe against our

unit of measurement. Measurements are given as length / unit (unit–1). To plot unit against unit we take the re-

ciprocal. Instead of 1000 m-1 as the unit we use 1/1000m-1 = 0.001m, instead of 1 cm we use 1 / 0.01m-1 = 100

m and so on. Our reciprocal is therefore a measure of magnification or exactness of our measures. After some

time and work we get a picture that is shown below (Fig. 2.2). We detect an allometric relation between our

magnification (often also termed scaling factor) and the boundary length measured. The slope of this allomet-

ric relation is 0.3. This seems to be a curious example. It means that if we would reduce the unit of measure-

ment to very smalls values (even to infinity), our perceived boundary length of Europe would become larger

and larger up to infinity.

Let’s try another example. We take a circle

of radius 1 and do the same exercise (Fig.

2.3). Archimedes once computed the cir-

cumference of a circle by a series of lines

that inscribe and circumscribe the circle.

He used triangles with angle α and

summed up all the length d. The smaller α

is, the more such triangles inside a circle

exist and the better will be our estimate of

the circumference. The circumference is of

1000 km500 km

y = 157000x0.30

1000

10000

100000

1000000

0.001 0.01 0.1 1

Scaling factor

Leng

th o

f Eur

ope

[km

]

Scaling factor = 1 / unit of measurement

Fig. 2.1

Fig. 2.2

Page 11: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 11

course 2πr. To compute d for each

angle α we use the already known

cosine law. In this case we simply

get

We plot the circumference Σd

against the scaling factor α (Fig.

2.3). Now, the allometric relationship that appeared in the Europe example vanished. Perceived length and scal-

ing factor are for simple geometric objects nearly independent. Where is the difference between the ‘natural

object’ and the geometric one? Or, must natural objects be treated with a different geometry?

To answer this question we have to deal with a modern branch of geometry, with fractal geometry.

Indeed the French mathematician Benoit Mandelbrot (born 1924 in Warsaw) developed fractal geometry just

by asking “How long is the coast of Britain?”

Consider again a power law. It has the general form y = axz. If we compare two values y1 and y2 that,

say, stem from two different observations, we find that the quotient y1 / y2 is

(2.1)

The relation is therefore independent of the initial settings and the unit of measurement. These initial

conditions contains the factor a that is cancelled out by the division. Only the exponent z remained, the scaling

exponent. This is a very important property of power functions. One interpretation of this feature is that the

structure of the process the power function describes is independent of the data points we consider. We get the

same pattern at different degrees of resolution. A spatial or temporal pattern that looks always similar inde-

pendent of the (spatial or temporal) scale at which we look at it is called a self-similar pattern. A general de-

scription of such self-similar patterns is the power function.

A good example is the Figure 2.4 on the left side. A very simple geometric object is repeated in the

same way at different scales. The result is an object that looks similar to the branching patterns of our nervous

or circular system.

One of the first mathematicians, who studied self replicating geometric objects was Wacław Sierpiński

(Polish mathematician, 1882-1969). His Sierpiński triangle (Fig. 2.5) is the

basis for many artworks and computer graphics. Building these objects is

very simple. A computer program that constructs Sierpiński objects

looks as follows

1. Start with a triangle (or another simple object)

2. Shrink this triangle to 1/2 (or another scale)

3. Make three copies (or n copies)

4. Arrange these copies in quadrants 2, 3, and 4 (or other

fixed points)

5. Go to step 2

2(1 cos( ))d α= −

1 1 1

2 2 2

z zy x xay a x x

⎛ ⎞ ⎛ ⎞= =⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

αα

d

d

y = 3 .10x0.002

1

10

1 10 100 1000

Scaling factor

Circ

umfe

renc

e

Fig. 2.3

Fig. 2.4

Page 12: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

12 Models in Biology

By this simple instruction many so-called self-replicating

(or self-similar) objects (fractals) can be produced (Fig.

2.6). But Fig. 2.7 is also a self replicate. A simple instruc-

tion is repeated again and again resulting in a complicated

pattern. I choose this object because it looks very similar

to the villi of a vertebrate intestine. Many programs gener-

ate self-similar objects or fractals. Fig. 2.7 is a so-called

Julia set (after the French mathematician Gaston Julia,

1893-1973), generated by a very simple iterative instruc-

tion f(x) = z2 –0.75, where z is a so-called complex num-

ber. I used the program ChaosPro for computing the Fig-

ures above and beside.

Indeed many complex biological patterns stem from a self

replicating process. A nice example for self-similarity is a

fern (Fig. 2.8). The same structure is repeated at all scales

of resolution, from large leafs to the smallest leaflets. But

not all self-replicating objects are self-similar. Fig. 2.9

shows the Australian giant earthworm Megascolides

australis (Picture from Kästner, Lehrbuch der speziellen

Zoologie, Part I,3. Stuttgart 1982). Annelida have the

ancestral seriate (metameric) body plan of the metazoan.

But they are not self-similar. The body rings are not rep-

licated at different scales, different levels of resolution.

Fig. 2.5

Fig. 2.6

Fig. 2.7

Fig.2.8

Page 13: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 13

Now, we look at the problem of dimension. A point has the

Euclidean dimension 0, a line the dimension 1, an area 2, and a

cube 3.

In our case length L and resolution s are connected by a power

function. In general total length L should be the ruler length l

multiplied with the number of ruler lengths needed. This is a

function n(l) of l. Because of s = 1 /l we get

We assumed n(l) as being a power function of s with exponent d.

The length of an object is therefore not a fixed

value but is related to the magnification (the

resolution) we look at it. In this case we better

speak of perceived length. For a simple

Euclidean object (d -1) = 0 as in our circle ex-

ample. Hence, d = 1. In other words, length is

independent of the resolution we look at it. The

maximum value of d is of course 1 because the

perceived length L can’t grow faster than the

scaling factor s.

What about the constant a? Let’s consider a

simple geometric object (Fig. 2.10). In this case

d = 1 and L = a. If we have a square we can

divide this square into 9 smaller squares without overlapping. A cube can be divided into 27 such subcubes and

so on. 3 = 31, 9 = 32, 27=33 and so on. Therefore, the length of a line, the area, or the volume appears to be the

ruler length (in our example s = 3) to the power of the Euclidean dimension, for a line 1, for an area 2, for a

volume 3, and so on. Therefore a = sD. We can now combine our last two equations and get

(2.2)

This equation is a fundamental one. It tells us how the length of an object depends on the magnification

we look at it. To see this we return to our example of the boundary length of Europe. The length was L =

157000s0.3. s, the inverse of the ruler length l, is a measure of the magnification under which we measure the

boundary. At the highest ruler length of 1000 km the magnification is lowest. For our Europe example equation

2.3 becomes L = 157000s(1+ 0.3) - 1. The Euclidean dimension we consider is a length. D, the dimension of a

length, takes therefore the value 1.

How to interpret the value D+d. Because of 0 ≤ d ≤ 1 D+d takes always values between the actual

Euclidean dimension and the next higher one. It is commonly termed the fractal dimension of an object. The

boundary of Europe has therefore the fractal dimension of 1.3, a perimeter of a circle has a fractal dimension

that equals its Euclidean one (D = 1).

Why is it important to know about fractal geometry? Because it provides the clue for understanding

1 11( ) d d dL n l l s s ass

− −∝ ∝ ∝ =

1 ( ) 1D d D dL s s s− + −= =

a = 31

a = sD

a = 32

a = 33

Fig. 2.10

Fig. 2.9

Page 14: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

14 Models in Biology

many relations of organismic growth patterns, body size relations, patterns in ontogeny, gene expression, or

ecology. I shall give two examples. The ferns shown above leads us to the problem of ontogenetic growth.

Consider a primordial blood or xylem vessel. Genetic activity generates at a certain time t a specified branching

pattern. If the genetic activity would remain unchanged over time (this is the simplest possible pattern) branch-

ing would occur at each time window t in a similar manner. The whole process would be self-similar. Recent

investigations showed indeed that many ontogenetic processes can be understood in this way. A nice example

are colour bands in butterflies that are generated by only a few (sometimes only 2) enzymes that act similar at

different times t.

Assume that the Figure below (Fig. 2.11) shows the branching pattern of blood vessels or a plant’s

phloem or xylem system. The process repeats in a similar manner at each stage. It is therefore a self-similar

process. From this notion we immediately derive basic relations between vessel parts. For instance, total vessel

volume vn (the sum of all vessels) at each stage k must fulfil a power function of the form

where VK and rK are the respective volumes and radia at the beginning of the branching process. Similar power

laws hold for vessel length and cross-sectional areas.

Vessel volume is related to many other physiological variables like nutrient flow, metabolic rate, con-

ductivity, heat production etc. For instance tissue area must scale to the quotient of vessel radia as

A similar proportion holds of course for the total vessel length. Because tissue area is linear proportional to the

maximum metabolic rate and vessel volume is V = πr2L we can introduce these relations into the above equa-

tion and get

In other words, metabolic rate should scale allometrically to

vessel volume. Vessel volume is proportional to total body weight.

Simple scaling laws tell us therefore that metabolic rate should scale

allometrically to total body mass. This is our already known law of

Kleiber. Of course, such reasoning does not provide values of our

scaling exponents z and x. However, recent investigations showed that

with reasonable starting conditions values similar to the ones observed

in nature result. That means, that ’simple’ geometric reasoning is able

to explain morphological and physiological patterns in nature.

A second example. Animals of different body size perceive

their environment in a different manner. What is for us a small

meadow is for a mouse a large wood of grasses and for an even

smaller insect a universe. A meadow has a fractal dimension. We can

an n

K K

V rV r

⎛ ⎞= ⎜ ⎟

⎝ ⎠

z

n n

K K

A rA r

⎛ ⎞= ⎜ ⎟

⎝ ⎠

x

n n

K K

M VM V

⎛ ⎞=⎜ ⎟

⎝ ⎠

Fig. 2.11

πrK2LK

rK

LKπrK2LK

rK

LK

Page 15: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 15

measure this as in our previous example. But now we use a slightly different method. We take a series of pho-

tographs of the meadow as shown in the next Figure (Fig.

2.12). Each photo was taken at a different magnification.

We measure the darker areas between the blades (by lay-

ing a grid upon the photos and counting all the cells not or

only in part occupied by blades) as an estimate of free

space and plot them against magnification (Fig. 2.13). Of

course, this is only a simple example and we have only

three data points but our method (the so-called grid

method) results in a power function. The program Excel

automatically fits such a function to the data points (we will see later how) and gives the respective equation.

The slope is 0.28. From the above fundamental function of self-similar processes

we interpret that the blades of a meadow form a fractal landscape with a fractal length dimension of (D+d) =

1.28.

Let’s look through the eyes of an animal. An animal of body length, say, 10 cm perceives its environ-

ment from a certain point of view. How will an animal of 1cm perceive the same environment. If the habitat

has fractal properties perceived lengths of habitat structures should follow the above function. For an animal of

1 cm body length habitat boundaries (like grass blades) will be 100.28 = 3.63 times as large. The area on which

these animals might live would be 2* 3.63 = 7.26 times larger. Let’s formalize our argument, because this is

always the first step for a mathematical treatment. We assumed that area A scales to the measurement of length

as

Now, we assume two other things. First of all we change length through species body weight. Body

weight should scale to length to an exponent of 3 (W ∝ L 3). Next, we assume that the number of individuals

per area of a species is proportional to the area in which they live (N ∝ Α). In our case the area is the perceived

area of a species. We get

( ) 1( ) D dL s bs + −=

2*0.28 0.56A s L−∝ ∝

Fig. 2.12

y = 6.2x0.28

1

10

100

1 10 100Magnification

Are

a

Fig. 2.13

Page 16: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

16 Models in Biology

It seems that we got a first simple ecological rule, that the number of individuals of a species is approxi-

mately independent of its body length. This seems to be a silly result. You have 1000 insects on a squre meter

grassland, but 1000 elephants are seldom found. But this is not the end of the argument. Individual number

(species abundance) depends on a second important variable. The available food. We know already Kleiber’s

rule that metabolic rate scales to body weight to the power of 0.75. If food intake is proportional to metabolic

rate and limits the abundance, the body weight should be inverse proportional to the number of individuals (N∝

W-1) of a species and should scale to body weight to a power of -0.75. Individual number is therefore limited by

two independent factors, perceived habitat and available

energy for metabolism. We can now combine both argu-

ments. To do this we have to multiply both scaling laws

and get

(2.3)

We expect therefore that an animal of 1 gram body

weight should be 1000.94 =76 times less abundant than a

species of only 0.01 g. We can rescale to body length and argue that we expect that an animal of 1 cm body

length should be 102.82 = 661 times more abundant than an animal of 10 cm body length. Now, we have a gen-

eral ecological rule how abundances of animal species and their body weights should be related. We got this

rule by a combination of fundamental scaling laws and from fractal geometry. The crucial variable in our equa-

tion is the fractal dimension D of the habitat under study. Fig. 2.14 shows a real example. The figure gives a

plot of mean body weights versus mean densities of 18 guilds of animals (from Testacea through nematodes,

various arthropods to vertebrates) of a German beech forest on limestone (data from Schaefer M. 1991. Fauna

of the European temperate deciduous forest. In: Temperate Deciduous Forests, Eds. E. Röhrig, B. Ulrich. Am-

sterdam, pp. 503-525). We detect a nice allometric relation and the power function describing it has a slope of -

0.89, which is nearly exactly the theoretical value we inferred above.

At the end we once again compare Euclidean and fractal geometry. Such a comparison was provided by

three American ecologists, G. West, B. Enquist and J. Brown, in 1999. They wondered why in organisms quar-

ter power laws dominate and termed this pattern the fourth dimension of life. The table below shows the dif-

ferences in power law exponents between ordinary Euclidean and fractal geometry. In fractal tissues length L

(or l for fractals) scales to area A (a) by the power of 1/3 and to volume V (v) by the power of 1/4. Volume in

Euclidean bodies if proportional to A3/2 , in fractal bodies this relation takes v = a4/3. But in every case volume

is linearly proportional to body mass M.

How to derive these proportionalities. West, Brown and Enquist proposed one solution in the scientific

journal Science. L ∝ A1/2 and L ∝

V1/3. Hence A ∝ V2/3. West, Brown

and Enquist now assumed that liv-

ing organisms have at least to a cer-

tain degree self-similar structures of

energy and metabolite transporting

0.56 / 3 0.19 N W W− −∝ ∝

0.75 0.19 0.94N W W W− − −∝ ∝

Variable Conventional Euclidean Fractal Length L∝A1/2∝V1/3∝W1/3 l∝a1/3∝v1/4∝W1/4

Area A∝L2∝V2/3∝W2/3 a∝l3∝v3/4∝W3/4

Volume V∝L3∝A3/2∝W v∝l4∝a4/3∝W

y = 4.1223x-0.89

0.00010.001

0.010.1

110

1001000

10000100000

100000010000000

0.000001 0.001 1 1000

Mean body weight [mg]

Mea

n de

nsity

[m-2

]

Fig. 2.14

Page 17: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 17

organs and tissues. In such tissues our simple scaling rules

have to be modified. According to eq. 2.3 we should have

modified equations.

(2.4)

Of course ε and φ both have values between 0 and 1. What is φ? By the same logic as above it should be

possible to express a volume by the product of area and length. Considering again a fractal object and using eq.

2.4 we get

with η being between 0 and 1. We see that the scaling exponents (3 + φ) in eq. 2.4 equals (3 + ε + η).

We introduce this in eq. 2.4 and get eq. 2.5

(2.5)

The second crucial assumption in the whole argument is now that evolution has maximised the meta-

bolic rate of organisms with respect to body mass. Body mass scales linearly to volume. Metabolic rate as a

catalytic process is proportional to tissue surface. Hence under this argumentation the surface (area) volume

ratio should be maximised. Hence we are looking for the solution of

Later we will learn how to solve this problem systematically. Now we simply simulate the function us-

ing various values of ε and η (Fig. 2.11). We get a maximum at ε = 1 and η = 0. Hence

(2.6)

Now, all other scaling laws for fractal tissues of the last table follow. Because a ∝ M (the metabolic rate)

and v ∝ W (the body weight) we get immediately

(2.7)

This is Kleiber’s metabolic rule. Eq. 2.6 and 2.7 form the basis for many quarter power scaling laws in

biology. These are recently intensively studied and the field of biological scaling is one of the most dynami-

cally developing.

We detect another important thing. ε = 1, that means that in bio-

logically optimized organisms tissue surfaces should have a frac-

tal dimension near 3 rather than the Euclidean value of 2. If you

look at intestine villi as in the Figure beside or at our brain you

see why.

1223

13

l aa v

l v

εεφ

φ

+++

+

⎫∝ ⎪

∝⎬⎪

∝ ⎭

2 1 3v al l l lε η ε η+ + + +∝ ∝ =

23a v

εε η+

+ +∝

2 max3

εε η+

→+ +

2 1 33 1 4a v v

++∝ =

34M W∝

00.10.20.30.40.50.60.70.8

0 0.2 0.4 0.6 0.8 1 1.2

ε

k

η = 0

η = 1η = 0.5

Fig. 2.15

Photo by Gwen V. Child, UTMB

Page 18: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

18 Models in Biology

3. Biological growth processes

In lecture 11 we heard already about growth processes, about exponential growth and the Pearl-Verhulst

model of logistic growth. Now we will deal with these processes in more detail. Growth models can be given in

two ways, a discrete form and a continuous form. Discrete forms are generally given as recursive functions of

the form

or

(3.1)

This latter version is a difference equation that depends on growth per discrete time steps Δt.

For instance the exponential growth model in discrete forms is

and

In continuous form the latter difference equation is given as a differential equation of the form

Difference equations are appropriate for discrete populations where generations do not overlap. They are

also more easy to model in computer algorithms, for instance when using Excel to model population growth.

Now we look closer to the logistic growth process. It was given by the differential equation

(3.2)

The biological interpretation of this equation is that the rate of change results from an exponential

growth process and a damping process that reduces the population growth. This damping acts immediately on

the population. What happens if this damping sets on later. Hence if the exponential process precedes the

damping we expect to find first a high population size which is afterwards reduced by mortality factors. In this

case we speak of time lags. For instance, assume a viral or bacterial disease. Viruses have typical exponential

growth rates at initial stages of an epidemic. Later, more and more hosts are attacked and the viruses do not

find appropriate hosts. Infection rates decrease, but they decrease with a certain time lag. Such time lags can be

modelled by a simple modification of our initial logistic growth model.

(3.3)

Eq. 3.3 contains a time lag τ in the damping term. Let’s study the behaviour of this modified model.

We use the discrete version of eq. 3.3 and plot the generations using Excel. We approximate N(t+1) by

the sum of N(t) plus the difference ΔN.

1 ( )t tN f N+ =

1 ( )t t tN N f N t+ − = Δ

1 0t tN rN N+ = +

1t tt

N N N rNt t

+ − Δ= =

Δ Δ

( )dN f Ndt

=

2( ) ( )( )( ) ( ) ( )dN t K N t rrN t rN t N tdt K K

−= = −

( ) ( )( )( ) ( ) ( ) ( )dN t K N t rrN t rN t N t N tdt K K

τ τ− −= = − −

Page 19: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 19

(3.4)

Our Excel model looks as follows. We have N(t) in column B. ΔN is computed in column C. To do this

we also need a column that gives us N(t-τ).

N(t) is now the sum of N(t) and ΔN.

Similar models were studied by the

British biologist Robert May in the first half

of the 1970th and had enormous influence

( )( 1) ( ) ( ) ( ) K N tN t N t N N t rN tK

τ− −+ = + Δ = +

Fig.3.1

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r=0.7; τ = 2;K =500

E

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r=0.2; τ = 0;K =500

A

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r=2.099; τ = 0;K =500

B

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r=1; τ = 1;K =500

C

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r=2.7; τ = 0;K =500

F

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r=2.95; τ = 0;K =500

G

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r=3.05; τ = 0;K =500

H

A B C D E12 Parameters r K Tau K3 1 500 1 1004 t N(t) Delta N N(t-tau)5 0 10 9.296851659

6 +A5+1max(0,B5

+C5)+$B$3*B5*(1-

(D6/$C$3) +B5

0100200300400500600700800900

1000

0 20 40 60 80 100 120

TimeN

(t)r = 0.3; τ = 5; K = 500

D

Page 20: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

20 Models in Biology

on ecological and parasitological research.

This simple model gives a number of predictions.

1. At low rates of r the population increases logistically until a maximum value is reached (Fig. 3.1 A).

This maximum can be inferred by setting the differential equation of the logistic growth to zero.

It equals the so called carrying capacity K. The biological interpretation is that irrespective of parameter

value the logistic growth model has one point of equilibrium (stationary point), the carrying capacity K.

2. Higher growth rates r result in regular cycles, which at even higher rates appear more and more ir-

regular. This means that the resulting cycles have longer and longer periods (B, F, and G)

3. At some growth rates the system initially appears to be relatively stable, but then larger amplitudes

(higher population fluctuations) appear (B).

4. Even small time lags force the system towards population fluctuations. The larger the time lag is, the

smaller r has to be to leave the system inside a stable range (D and E).

5. At higher growth rates the system looks more an more chaotic. We speak of pseudo chaos or deter-

ministic chaos, because it is not really chaotic since its generating function is strict deterministic. Again we

notice that a very simple deterministic function is able to generate unforeseeable (pseudo)-chaotic patterns.

They are called pseudo-chaotic because they are still deterministic but the resulting pattern is so complicated

that we have difficulties to find a posteriori, that means without knowledge of the generating function, any

regularities. May showed that at certain parameter combinations it is impossible to infer the generating function

from its output. Hence the analysis of biological time series might suggest very complicated structuring forces

whereas the series is in reality generated by a very simple deterministic process.

6. Too high rates of reproduction lead to the extinction of the population due to too high population fluc-

tuations.

May was not the first to observe that seemingly simple models can generate very unforeseeable output.

One important recursive model is the so-called Ricker

function, after W. Ricker, who introduced it in the 1950th

to model fish population dynamics, to estimate fish densi-

ties and to establish fishery quotes. This model is defined

by

(3.5)

( ) 0 ( ) 0 ( )dN t rr N t N t Kdt K

= → − = → =

( )1

kNk kN rN e α β− +

+ =

0

5

10

15

20

25

30

0 10 20 30 40 50 60

k

Nk

Fig.3.2

0100200300400500600700800900

1000

0 200 400 600 800 1000

N(t)

N(t)

+1

r=0.7; τ = 2;K =500

B

0100200300400500600700800900

1000

0 200 400 600 800 1000

N(t)

N(t)

+1

r=0.7; τ = 1;K =500

A Fig.3.3

Page 21: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 21

Fig. 3.2 shows an example for this model with r =

5, α = 0.5 and β = 2. We notice a very irregular

pattern. Indeed, the above function generates either

very regular patterns or, for certain parameter com-

binations, unforeseeable irregular ones. In this it

resembles the logistic growth model. It also produces a pseudo-chaotic behaviour. Models like the Ricker

model, or the closely related Nicholson-Bailey model, are important because due to their flexibility they allow

irregular fluctuations in animal populations to be modelled. We learn later how to do this.

An important tool in the study of discrete recursive models are plots of N(t+1) against N(t) as shown in

Fig. 3.3. The Fig. shows the effects of a time lag. In A a stable point is reached after a few generations. In B no

stable point is reached. The systems fluctuates around a point that equals K but it will never reach it.

When using differential equations we have the problem to solve them. In the last lecture we learned

already how to solve the logistic equation. But in most cases it is very difficult to do this. However, again Math

programs like Maple of Mathematica can do these technical things for us. We have to interpret their results.

Above the Mathematica solution of the logistic differential equation without time lag is given. The result looks

different from our solution on page 80.

This is now the same form as before.

For x = 0 C becomes

Of course our model is still highly simplified. It

treats populations as living in a homogeneous world without external influences. Such models might apply for

viral or bacterial growth. For animals or plants other variables that influence population growth have to be con-

sidered. Additionally, many populations have low recruiting rates at low densities. Our logistic growth model

instead predicts high growth rates at low densities. A simple possibility to make the model more realistic is to

introduce a lower density limit M below which the population goes extinct. Our model now becomes

(3.6)

This modified model has no simple closed solution. A numerical solution as before shows that for N

>> M this modified model behaves similar to the simple logistic growth model. For low N population growth

becomes very slow (Fig. 3.4).

(1)0 1

rx rx

rx f rx rxKe Ke Ky

e e e N Ce−= = =− − −

0

0

K NCN−

=−

( )( )dN K N N MrNdt K N

− −=

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r = 1; K = 500; M = 100

Fig. 3.4

Page 22: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

22 Models in Biology

Random offshots Environmental factors can be very different. But we can combine them and modify our initial logistic

growth model by a simple additive term.

(3.7)

What is ran(a,b). It is a random number that affects the change in population size. We speak in this case

of a random offshot. Simply speaking random numbers are defined by a certain instruction that generates them

in such a way that they are not foreseeable inside the

range the instruction defines. Most often used are linear

random numbers. Give a range (a,b). Inside this range

every real number has the same probability to appear.

Computer programs generate linear random numbers in

the range (0,1). For number in other ranges we transform

them. For instance to get random numbers in the range

between 0 and 10 we simply have to multiply ran(0,1)

with 10. To get random numbers between 3 and 7 we

( ) ( , )dN K NrN ran a bdt K

−= +

Fig.3.5

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

+1

r=1; τ = 1;K =500, RAN(-5,5)

A

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

+1

r=1; τ = 1;K =500, RAN(-50,50)

B

0

2

4

6

8

10

12

14

0 20 40 60 80 100 120

Time

N(t)

Fig. 3.6

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

A

r=2.5;K=500;m=0

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

B

r=2.5;K=500;m=100

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

C

r=2.5;K=500;m=201

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

D

r=4.5;K=500;m=350

Fig.3.7

Page 23: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 23

have to add 3 and then multiply with 4. In general

(3.8)

For model 3.7 being realistic our random offshoot has sometimes to be above and sometimes below

zero. Hence the range might be (-a,a). If ran in eq. 3.7 is small in relation to dN the model still remains largely

deterministic (Fig. 3.5 A). At a high range (-a,a) dN becomes more and more unforeseeable and our model

transforms to a stochastic one (Fig. 3.5 B).

What is if the random offshoot solely determines next generations. Then our model looks as follows.

(3.9)

This is the simplest version of a so called random walk. An example gives Fig. 3.6. A random walk is

a process where the next step starts with the previous value but the direction and the amount of change is un-

foreseeable. Random walk models have wide application in biology and in the next part we will deal with them

in more detail.

Constant rates of foraging Now we consider another modification of the logistic growth model. Assume our model deals with a

fish population that is reduced by fishery. The reduction is independent of the actual population density. Hence

we use the discrete version of the model and modify eq. 3.1 by adding a constant term m, denoting the mortal-

ity by fishing.

(3.10)

Fig. 3.7 on the previous page shows plots of this model. We get four main predictions

1. Small to moderate fishery might stabilize population densities (A and B). This surprising result stems

from the reduction in density and hence in the more moderate increase in fish density.

2. Above a certain rate of fishery, the fish population inevitably dies out (overfishing). This result seems

rather trivial. However, it would be surprising if our model would not predict this outcome (C).

3. The extinction threshold depends on r, the rate of reproduction. The higher r is the more fish can be

gathered (B, D).

4. At high r and m fish populations should become more and more unpredictable. However, higher m

values further tend to stabilize populations.

Now we look a little bit closer at eq. 3.10 and transform it into a difference equation. We modify eq.

3.10 and introduce the time difference Δt

We transform and get a differential equation of the form

Has this equation stationary points, where the population remains unchanged? Our derivative function is

obviously quadratic. To study the behaviour of this function we compute the roots. Mathematica gives the fol-

( , ) ( ) (0,1)ran a b b a ran a= − +

1 ( , )t tN N ran a a+ = + −

( )( 1) ( ) ( ) ( ) K N tN t N t N m N t rN t mK

−+ = + Δ − = + −

( )( ) ( ) ( ) ( ) K N tN t t N t N t m t N t rN t t m tK

−+Δ = +Δ Δ − Δ = + Δ − Δ

( ) ( ) ( ) ( )( ) ( ) ( )(1 )N t t N t K N t dN K N tN t rN t m N t r mt K dt K

+Δ − − −= + − → = + −

Δ

Page 24: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

24 Models in Biology

lowing solution. However, our problem has no simple solu-

tion. Look at Fig. 3.8. When applying differential equations

we are not dealing with a single solution but with a class of equations. To interpret our results we have to dif-

ferentiate between these classes. We analyse a so–called phase diagram as given in Fig. 3.8. In our case we can

differentiate three cases. Remember that the functions denote change in population size. Hence, negative values

denote a decrease, positive values an increase.

First dN / dt has no root (the blue line) . Because all values are negative the change in density is always

negative and the population inevitable dies out. This occurs when (-4m + rK) < 0. In this case the root of our

function has no solution in R. The value m = rK / 4 is termed the critical harvesting rate. If m exceeds this

value the population dies out.

Second, there is one root (the red line). In this case (-4m + rK) = 0 and the equilibrium point is N(t) =

K / 2. However any small disturbance of this highly fragile system would also lead to the extinction of the fish

population. This is an often observed phenomenon, in models like this and in nature. Balanced systems (system

in equilibrium) can be highly unstable.

Third, there are two roots. In this case populations with densities below N1 and above N2 go extinct.

Populations having densities between N1 and N2 increase until a stable (and maximum) density at N2 is

reached.

At last we consider a case where the rate of fishery is not stable but itself a function of fish density. This

is a more realistic case because harvest is of course a function of encounter with the fishes. For simplicity we

assume that harvest is proportional to fish density. Our new model looks therefore as follows

(3.11)

We consider only one example (Fig. 3.9). Even at high rates of fishery (m = 1, hence the total initial

population is fished) the system remains stable if only the rate of increase is high enough. Low rates of increase

instead will drive the population towards ex-

tinction. Too high values of r again result in a

pseudo chaotic behaviour of the system. We

solve eq. 3.11 again for N(t) to obtain the roots.

Now our solution looks very different. There is

one trivial stable equilibrium at N(t) = zero.

Another equilibrium point is at (m-r-1) = 0. The

case when this term is larger than zero does not

make, of course, sense. Hence for the popula-

tion to survive must hold m < r + 1. Our simple

( ) ( )( 1) ( ) ( ) ( ) ( ) ( ) ( )(1 )K N t K N tN t N t N mN t N t rN t mN t N t m rK K

− −+ = +Δ − = + − = − +

-35-30-25-20-15-10

-505

10

0 0.5 1 1.5 2 2.5 3

N(t)

dN/d

t

N1 N2

Fig. 3.8

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

r = 3;τ = 1;K = 500;m = 1

Fig. 3.9

Page 25: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 25

model predicts therefore that for the population to

be stable the rate of fishery must be less than the

rate of reproduction + 1.

Of course, all of these models are very simplistic.

But they form still the base for more sophisticated models that incorporate much more density influencing vari-

ables. Additionally, logistic growth models like the ones above had an enormous influence on the development

of ecological, parasitological and economic theories. You find very nice descriptions of various growth models

at http://www.math.duke.edu/education/postcalc/growth/contents.html.

Page 26: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

26 Models in Biology

4. Models of competition and predation

Competition Now we extend our logistic growth model and deal with two populations that compete for a common

resource. Suppose two parasite species A and B infect a common host. The infection by one species prohibits

the infection by another. How to model this system? We assume that both species follow a logistic growth

equation as studied before. As a simple modification we add to the logistic term (1 - N/ K) another term that

mimics the effect of competition, hence that further reduces the increase in population size. Our simple model

looks as follows

(4.1)

In this model KA, and KB denote the carrying capacities for A and B. The terms αΑ NA and αΒ NB are the

competition reduction terms and the α-values are the competition coefficients that denote the strength of the

effects of B on A and vice versa. This type of model was independently introduced by Lotka (1925) and

Volterra (1926) (Alfred James Lotka, 1880—1949, American demographer and mathematician; Vito Volterra,

1860—1940, Italian mathematician) and later (in a slightly different version) intensively studied by the Russian

mathematician George F. Gause. It heavily influenced biological modelling and lead to the competition para-

digm in ecology.

We have two coupled differential equations and our task is to study the behaviour of this system and to

interpret the results. We again study the

sign of dN(t) / dt and first compute the

equilibrium points of the system (dN / dt

= 0). Hence

We solve these linear algebraic equations for the equilibrium densities NA and NB. and get

(4.2)

We divide both equations through another and get

Coexistence is possible if NA and NB are both positive We have two possibilities to obtain positive val-

ues for NA/NB

( ) ( ) ( )( ) 1

( ) ( ) ( )( ) 1

A A B BA A

A

B B A AB B

B

dN t N t N tr N tdt K

dN t N t N tr N tdt K

α

α

⎛ ⎞+= −⎜ ⎟

⎝ ⎠⎛ ⎞+

= −⎜ ⎟⎝ ⎠

( ) ( )( ) ( )

A B B A

B A A B

N t N t KN t N t K

αα

+ =+ =

1

1

B B AA

A B

A A BB

A B

K KN

K KN

αα α

αα α

−=

−−

=−

A B B A

B A A B

N K KN K K

αα

−=

Page 27: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 27

1.αBKB > KA and αAKA > KB. We multiply both inequalities and get αBαB > 1. Both competition coeffi-

cients are by definition smaller than 1. Our condition is therefore impossible. The interpretation is that both

species should go extinct. However, there is always one stable point, when one species dies out and the other

survives. Hence, in this case the outcome depends on which species is the first to go extinct. The other sur-

vives.

2.αBKB < KA and αAKA < KB. In this case both values of NA and NB become positive. A stable (also

fragile) equilibrium exists and both can species survive.

In two other cases one species dies out

1.αBKB > KA and αAKA < KB Species B survives, A goes extinct.

2.αBKB < KA and αAKA > KB Species A survives, B goes extinct.

How to simulate this model? The Table below shows an Excel solution. I used again the discrete logistic

growth model and introduced the competition term.

Fig. 4.1 shows one solution of case four where no stable equilibrium exists. It depends on growth rates

and, as in this case, on the difference in carrying capacity which species survives. Even small differences may

lead to opposite outcomes. However, nearly always one species dies out. Hence the Lotka - Volterra competi-

tion model was one of the main arguments

for the long hold view that species that

share the same set of resources cannot coex-

ist. This is the principle of competitive

exclusion, first formulated by the G. F.

Gause in 1934.

But look at the following modification of

our model. The original model is strictly

deterministic although the outcome is not

( ) ( )( 1) ( ) ( ) 1

( ) ( )( 1) ( ) ( ) 1

A B BA A A A

A

B A AB B B B

B

N t N tN t N t r N tK

N t N tN t N t r N tK

α

α

⎛ ⎞++ = + −⎜ ⎟

⎝ ⎠⎛ ⎞+

+ = + −⎜ ⎟⎝ ⎠

A B C D E F G H I1 Species A αA Species B αB2 Parameters r K Tau r K Tau3 2.5 500 0 0.5 2.6 500 0 14 t N(t) Delta N N(t-tau) N(t) Delta N N(t-tau)5 0

6 +A5+1max(0,B5

+C5)

+$B$3*B5*(1-(D6+$I$3*E5)/$C

$3) +B5 max(0,E5+F5)

*(1-(G6+$E$3*B5)/$G$3 +E5

0100200300400500600700800900

1000

0 20 40 60 80 100 120

Time

N(t)

Species B: r = 2.5; K = 500; α = 1.0; τ = 0

Species A: r = 2.6; K = 500; α = 0.5; τ = 0

Fig. 4.1

A B C D E F G H I1 Species A αA Species B αB2 Parameters r K Tau r K Tau3 2.5 500 0 0.5 2.6 500 0 14 t N(t) Delta N N(t-tau) Stochasticity N(t) Delta N N(t-tau) Stochasticity5 0 0.1 0.01

6 +A5+1max(0,B5

+C5)

((D6+$I$3*E5)/$C$3)+$E$6*B

5*(los()-0.5*B5) +B5 max(0,E5+F5)

+$F$3*F5*(1-(G6+$E$3*B5)/$G$3)+$I$6*B5*(l

os()-0.5*B5) +E5

Page 28: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

28 Models in Biology

always foreseeable. However, more realistic

are stochastic models where via random

numbers other factors are incorporated that

influence growth rates. Here I simply added

a linear random number term (aN(t)ran (-

1,1)) to the model. The same parameter val-

ues as above now allow for a stable coexis-

tence of both species (Fig. 4.2). Hence,

random effects might allow coexistence.

Recent investigation with more elaborate models have demonstrated that chaotic model behaviour is an impor-

tant element in maintaining and even enhancing species richness of natural habitats.

Predation Assume we have one protein that is produced proportionally to its concentration, it est it triggers its own

production. Viral infections are typical examples for this. Assume further we have an enzyme that counteracts

and degrades this protein. Its production is proportional to its own concentration and to the concentration of the

protein. Viral and bacterial infections and the reaction of the immune system are examples for such a process.

In ecology predator - prey or parasitoid - host system are examples. Indeed the following class of models was

first developed for predator - prey systems. Again Lotka and independently Volterra proposed it as the follow-

ing pair of quadratic differential equations

(4.3)

The logic behind this model is very simple. The rate in prey increase dH / dt is proportional to prey

abundance (as in the exponential growth model) and it is proportional to the number of prey taken by the preda-

tors. The latter is given as HP, the rate of prey - predator encounters. The change in predator density dP / dt

follows the same simple logic. Note the negative sign of rP. This simply notifies the fact that the predator popu-

lation would die out without prey present.

We study this system in the same way as before. Equilibrium points are at

Contrary to the previous example the model pre-

dicts that both populations should survive. A plot of

prey and predator densities versus time shows Fig. 4.3.

We observe an irregular cycling of both populations.

Although in theory the system should be stable, small

H H

P P

dH r H HPdt

dP r P HPdt

α

α

= −

= − +

H

H

P

P

rP

rH

α

α

=

=

0

50

100

150

200

250

300

0 20 40 60 80 100 120

Time

N(t)

Prey

Predator

Fig. 4.3

050

100150200250300350400450500

0 20 40 60 80 100 120

Time

N(t)

Species B: r = 2.5; K = 500; α = 1.0; τ = 0

Species A: r = 2.6; K = 500; α = 0.5; τ = 0

Fig. 4.2

Page 29: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 29

disturbances might force prey or predators to become

extinct.

This original version of the model is not very

realistic. We modify our model and assume for both

populations a logistic growth with time delay. Hence

we introduce the logistic growth terms and get

(4.4)

To evaluate the stability point of this system we consider a model without time delay and solve for dH /

dt = 0 and dP / dt = 0 and obtain the solution shown below. Instead of one single solution we have four. The

first two and the fourth contain trivial solutions and are not realistic except the case when the prey survives and

the (single) predator dies out. The third solution instead is quite complicated. But we observe that a positive

(and therefore realistic) values for H only appear if rP < αP KH. Hence KH > rP / αP . Similar, for the predator

population to be stable we need KP > rH / αH. The discrete model version in Excel looks as below. Fig. 4.4

shows again a cycling behaviour and low predator densities. Our stability criterion is met and we expect both

populations to persist. An often observed feature in

such models is that predator populations are more

stable than prey populations. Mean densities of

predators are lower than mean prey densities.

These are quite realistic model features.

At last we introduce a stochastic element

into our predator - prey model. As in the logistic

growth model before we add a simple random

number (ran(-2,2)) to the change terms of our

model. Using the same parameter values as above

we get a picture as in Fig. 4.5. Prey densities cycle

as before although less regularly and often damped.

But predator densities are now decoupled from the

prey cycling and more stable than before. Again

we see that introducing a stochastic element into

deterministic models might change model behav-

( ) ( )( ) 1 ( ) ( )

( ) ( )( ) 1 ( ) ( )

H HH

P PP

dH t H tr H t H t P tdt K

dP t P tr P t H t P tdt K

τ α

τ α

⎛ ⎞−= − −⎜ ⎟

⎝ ⎠⎛ ⎞−

= − − +⎜ ⎟⎝ ⎠

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120

Time

N(t)

Prey

Predator

Fig. 4.4

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120

Time

N(t)

Prey

Predator

Fig. 4.5

A B C D E F G H I1 Prey Predator2 Parameters r K Tau αΗ r K Tau αP3 0.9 500 0 0.04 0.05 200 2 0.00184 t N(t) Delta N N(t-tau)5 0 100 89.2832788 20 0.883662456

6 +a5+1max(0,B5

+C5)

=($B$3*B5)*(1-D6/$C$3)-

$E$3*D6*E5 +B5max(0,E5+

F5)

$F$3*E5)*(1-G6/$G$3)+$I$

3*B5*E5 +E5

Page 30: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

30 Models in Biology

iour significantly. Additionally, random effects might stabilize patterns as in the case of the predators in Fig.

4.5.

Parasite—host models Now we consider a special case of predation,

parasitism. The aim is to develop a specific model

that describes the dynamics of parasites and their

hosts. Parasitism is a special case of predation

because the parasite does not kill its host immedi-

ately but allows him further reproduction. Hence

our simple Lotka - Volterra predator - prey ap-

proach does not work. The first step in develop-

ing a model is to study the biology of hosts and

parasites. From this we make a conceptional

model in form of a flow diagram that shows us

life history stages and potential parameters of our

future model. The next step is then to quantify

our model. In 1985 J. P. Hudson (In Rollinson &

Anderson (1985), Ecology and Genetics of Host-Parasite Interactions, Acad. Press London) gave a model of

infections of red grouse (Lagopus scoticus) by the nematode Trichostrongylus tenuis. This model can readily

be extended to deal with a general class of infections. Such an extended conceptual model serves as the basis

for further quantifications. It looks as follows (Fig. 4.6). We have host and parasite populations. Hosts have

natural mortality and parasite induced mortality. Additionally many parasites reduce host fecundity. Parasites

have certain birth and mortality rates. Additionally we need a parameter that describes infection rates.

In 1978 Robert May and Roy Anderson developed a set of models to describe such host parasite interac-

tions. They became the standard models for research in parasitology.

Model development looks as follows. Host density is given by

(4.5)

The logic behind these assumptions is simple. A change in host density is the difference between natural

fecundity and mortality multiplied with actual host density minus parasite induced mortality and reduction and

fecundity multiplied with actual parasite density. Hence we again assume that increase and decrease are line-

arly proportional to actual host and parasite density.

( ) ( )dH b a H Pdt

α ρ= − − +

Fig. 4.6

b A B C D E F G H I

1 Prey Predator2 Parameters r K Tau αΗ r K Tau αP3 0.9 500 0 0.04 0.05 200 2 0.00184 t N(t) Delta N N(t-tau)5 0 100 90.01137617 20 1.527903287

6 +a5+1max(0,B5

+C5)

=($B$3*B5)*(1-D6/$C$3)-

$E$3*D6*E5+4*LOS()-2 +B5

max(0,E5+F5)

=(-$F$3*E5)*(1-G6/$G$3)+$I$3*B5*E54*LOS()-2 +E5

HostPopulation

H

ParasitePopulation

P

Naturalhost mortality

Parasiteinduced

host mortality

Naturalhost fecundity

Parasiteinduced

reductionin fecundity

Infectionrate

Parasitebirth rate

Parasitemortality

a

α

μ

λ

β

Page 31: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 31

Parasite densities take a more complicated form

(4.6)

Again the change in density is the sum of different influencing factors. Increase in density is given by

the number of encounters of parasites and hosts (the product of P and H, the so-called mass effect) multiplied

by the parasite birth rate and the infection rate. The first term defines the proportionality of increase with para-

site density and with host density. However, not the actual density but the density of available hosts is taken.

This number is modelled by the term H / (H0 - H) , with H0 being the initial host density. The next term con-

tains parasite decrease. This again is assumed to be proportional to actual density with the parameters natural

mortality and increase in host density (a parameter that in fact leads to a reduction in the parasites). The last

term contains parasite mortality due to host mortality. The last multiplicative term P / (1+1/k) intends to mimic

parasite aggregation, the fact that multiple parasites occur in one host. The term is taken from a special statisti-

cal distribution, the negative binomial, with which we will deal in the statistics part.

First, we try to evaluate stability conditions. We apply Mathematica and solve both equations for H and

P (for dH / dt = 0 and dP / dt = 0) The program returns an undigestable output.

We try another way. Eqs. 4.5 and 4.6 can be simplified to

where the constants c and C contain all the constant parameter values. H* denotes the equilibrium den-

sity of H, where no further change in density occurs. Hence we approximated the term containing H0 by the

equilibrium densities. All parameters are positive. Now a very simple solution exists with positive values for H

0

( )1/

dP H PP b P Pdt H H k k

λβ μ α= − + −− +

1 2

3 4 5 50

( ) ( )

dH c H c PdtdP HP c c c P P C c Pdt H H

= −

= − − = −−

Page 32: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

32 Models in Biology

and P. We conclude that our model has an equilib-

rium. We further conclude that at this point the quo-

tient of parasite to host density should be P* / H* =

c1 / c2. Hence or model predicts that for parasite -

host systems equilibrium densities should mainly be determined by host fecundity and mortality.

Lastly, we turn to one of the actual standard models in parasitology, proposed by Robert May and Roy

Anderson in 1980. They assumed again that changes in host densities are the sum of host fecundity that is pro-

portional to actual host density and host mortality that is again a mass effect, hence the product of H and P, the

parasite density. However, now total host density is divided into a population part H that is not infected and a

part J that is infected. Hence H + J = Htotal. The model looks as followed

(4.7)

r(H+J) is the reproduction rate of not infected hosts. δPJ is the reduction in host reproduction induced by

parasite infection. (α + a)J is the mortality of infected parasites. λJ denotes the increase in parasitism due to

infection. The most complicated term is the last. It formalizes the assumption that the decrease in parasite den-

sity is proportional to actual parasite density multi-

plied with natural mortality rate λ and the destruction

of pathogens by hosts assumed to be proportional to

total host density.

This model is able to generate host parasite cycles in

density and is widely used in the study of co-

evolution of parasites and hosts. The analysis of

stability gives (by setting the model equations to

zero) a very complicated result. But the equilibrium

density for P is simple. We see that for the parasite

population to be stable (a - r + α) > 0. Hence

The biological interpretation is simple. For the para-

sites to persist the total mortality rate of the infected

hosts has to be larger than the fecundity of the host

population. Additionally, we look at the term for H.

Because all rates are positive the term is positive if

again

Hence, hosts only have a stable equilibrium if para-

( )

( )

( ( )

dH r H J PHdtdJ PH a JdtdP JP H J Pdt

δ

δ α

λ μ γ

= + −

= − +

= − + +

a rα+ >

a rα+ >

Page 33: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 33

sites have a stable equilibrium. This is only possi-

ble if either host and parasite populations are con-

stant (very improbable) or if the populations ex-

hibit a cyclic behaviour around the equilibrium

points. Additionally we have a solution for δ, the

parasite induced mortality rate

This inequality tells that for the hosts to have stable populations r must be smaller than a + α, the mor-

tality rates of infected hosts. This is the same condition as before.

2 2( ) ( 2 ) a aa r a aa r

αα δ α δα

+− + > + → >

+ −

Page 34: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

34 Models in Biology

5. Models in biochemistry

In leture three we dealt already with one important

model in biochemistry. The Michaelis Menten model of

enzyme kinetics. In this lecture we will discuss other

important models. Our starting point is a general enzyme

substrate reaction where an enzyme E binds to a substrate S

to form a compound ES. This is a reversible process and SE can dissociate to E and S. The concentrations of E,

S, and ES depend on the initial concentrations of E and S. Our question is what are the concentrations at

equilibrium? To answer this question we have to develop a model that gives us the concrentrations. Our

reaction equation looks as follows

E + S ↔ ES (5.1)

We denote the concentrations of E, S, and ES with [E], [S], and [ES}. Because the system is closed the

total amount of material must be constant. For every compound of ES one E and one S is lost. From this we get

two conservation equations

[E] + [ES] = [E0] and

[S] + [ES] = [S0]

where E0 and S0 denote the initial concentrations. Hence

[E] - [S] = [E0] - [S0] (5.2)

We assume that the speed of the associative reaction is proportional to both reactants, the concentrations

of [E] and [S]. This is a general assumption in kinetics and we have to understand why. The chance for one

molecule of E to bind to S is linearly proportional to [S]. The propbability of n molecules of E to bind is

therefore n[S]. N is nothing more than the concentration of E, [E]. Hence the chance to bind is proportional to

[E][S]. The higher this chance is, the higher is the speed. This assumption is similar to the predator - prey

encounter probability in the Lotka Volterra predation model. It is termed the mass effect. Hence

( 5 . 3 )

Remember that if a process is proportional to several variables

then we have to take the product of these variables. kES

denotes the proportionality constant. The dissociaton process

can be described in the same way

(5.4)

The change in the concentration of ES should be proportional to the association and dissociation speeds.

Hence

(5.5)

[ ][ ]

[ ][ ]

ES

ES

ES ES

v Ev Sv k E S

∝∝=

;

; ;

[ ][ ]

E S

E S E S

v ESv k ES

=

; ;[ ][ ] [ ]ES E S ES E Sv v v k E S k ES− = Δ = −

y = 3.3851x2 + 0.7265x + 0.5071

00.5

11.5

22.5

33.5

44.5

5

0 0.2 0.4 0.6 0.8 1[S]

[S]

Fig. 5.2

Enzyme Enzyme Substratecomplex

Enzyme andproducts

Fig. 5.1

Page 35: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 35

We introduce equation 5.1 into 5.2 and get

kES([S0]+[E0]) and kE;S[S0] are constants. We denote them k1 and k2 and get

(5.6)

Based on very simple assumption we got a simple model of reaction kinetics. We see that the change in

reaction speed depends solely on the change in substrate concentration. Our model enables us to predict reac-

tion times in dependence of substrate concentrations. These have to be measured. We might also estimate the

constants by simultaneous measuring Δv and [S] and fitting our model to a plot of Δv against [S]. This is shown

in Fig. 5.2. Δv cannot be measured directly. Again we make an approximation and measure the change of [S]

and use this as a measure Δv. Hence we assume that the change in concentration is proportional to the speed.

Programs like Excel provide automatically a so-called fit to our data from the type of model we predefined.

This is in our case a quadratic function. In other words we assume that

This can be rewritten into a quadratic first order differential equation

(5.7)

Now we rearrange

We know this equation already. It is the differential equation for logistic growth with constant predation

(eq. 3.10 in lecture 3). Indeed this similarity is not accidental. In the case of ecology we deal with individuals

of a population. In chemistry we deal with molecules belonging to a certain chemical. If both entities

(individuals and molecules) obey similar mass proportion laws the mathematical description should also be

similar. Indeed many ecological, chemical and also genetic models are very similar. They deal with entities that

behave according to certain probabilistic laws. Their environment (ecosystems, membrane surfaces or chromo-

somes) can be described by the same geometry (often a fractal geometry). Then the mathematical description of

these very different entities looks nearly identical.

Now we look at the substrate enzyme reaction from a slightly different perspective. Enzymes often have

a very small concentration with respect to the substrate. Because [S] is assumed to be constant eq. 5.3 changes

to

(5.8)

The mass conservation law is now

;

; ;

[ ]([ ] [ 0] [ 0]) ([ 0] [ ])[ ][ ] [ ] ([ 0] [ 0]) [ 0] [ ])

ES E S

ES ES E S E S

v k S S S E k S Sv k S S S k S E k S k S

Δ = − + − −

Δ = − + − +

2; 1 2[ ] ( )[ ]ES E Sv k S k k S kΔ = + − −

2; 1 2[ ] [ ] ( )[ ]ES E SS k S k k S kΔ = + − −

2; 1 2

[ ] [ ] ( )[ ]ES E Sd S k S k k S kdt

= + − −

2; ; ;

;

[ ] [ ][ ] ( ( ([ 0] [ 0])[ ]) [ 0] [ ](1 ) [ 0]1([ 0] [ 0])ES E S ES E S ES E S

E S

d S Sk S k k S E S k S k S k Sdt S E

k

= + − + − = − −+ −

; ;

[ ][ ]

ES ES

E S E S

v k Ev k ES

==

Page 36: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

36 Models in Biology

[E0] - [E] = [ES] (5.9)

The change in speed dv is given by

(5.10)

Now the change in speed is propor-

tional to the enzyme concentration and we

get a first order linear differential equation

The solution is

This is also a Michaelis Menten process but in a different form. In lecture 3 we dealt with speed against

substrate concentration here we looked at enzyme concentration against time. The maximum reaction speed is

of course at [E] = kE;S / (kES + kE;S)

Collision theory Above we modelled a chemical reaction via a mass effect. This can be generalized. For any reaction

n1A + n2B ↔n3C +n4D

we can establish the equilibrium equation of the form

(5.11)

With K being the reaction constant. The speed of the forward reaction is according to the mass effect

(5.12)

The sum of n1 and n2 determines the order of the reaction (in eq. 5.11; in 5.12 of course n3 + n4)). To

solve these equations we have to determine the concentration of the reactants in dependence of time. Let’s

consider three basic types of reactions. A simple reaction has the form A→B. The speed of this reaction is

proportional to the concentration of A. This is a first order reaction. Hence

The concentration time function is an exponential function. Integration gives

(5.13)

Next we consider reactions of the second order.

A + B ↔C +D

; ; ;([ 0] [ ]) [ ] ( )[ ] [ 0]E S ES ES E S E Sdv k E E k E k k E k Edt

= − − = − + +

1 2[ ] [ ]d E k E kdt

= − +

;( ); ;

; ;

[ 0] [ 0][ ] [ 0] ES E Sk k tE S E S

ES E S ES E S

k E k EE E e

k k k k− +⎛ ⎞

= + −⎜ ⎟⎜ ⎟+ +⎝ ⎠

3 4

1 2

[ ] [ ][ ] [ ]

n n

n n

C D KA B

=

1 2

3 4

[ ] [ ]

[ ] [ ]

n nAB

n nCD

A B v

C D v

[ ] [ ]Ad k Adt

= −

0

[ ]

0[ 0]

[ ] /[ ] [ ] [ ]A t

kt

A t

d A A kdt A A e−= − → =∫ ∫

Page 37: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 37

We define the value of x that

diminishes A and B due to reaction.

Hence if the initial concentration of

A is [A0] the concentration after

time t is [A0-x]. Th reaction speed

is then v = dx / dt. We can therefore

describe the process by

The backward reaction is given by

These are quadratic first order differen-

tial equations. The solutions are again logistic

growth equation. For higher order reactions solu-

tions become very complicated or do even not exit

as closed functions. The above solution is quite

complicated. The Excel solution is shown above

and Fig. 5.2 shows that the concentration of A de-

creases asymptotically by an hyperbolic function.

We can do simpler and solve by hand!

(5.14)

Now we apply a small trick. It holds

(5.15)

This is a very important equation with which many quadratic differential equations can be solved by

hand. It is a special case of a class of equations by which quotients can be simplified, the method of dividing

into partial fractions.

We solve eq. 5.13 as follows

Setting x = 0 at t = 0 and combining both integration constants gives

At the end we get

0 0[ ][ ]ABdxv k A x B xdt

= = − − −

0 0[ ][ ]CDdyv k C y D ydt

= = − −

0 0[ ][ ] ABdx k dt

A x B x= −

− −

0 0 0 0 0 0

1 1 1 1[ ][ ] [ ] [ ] [ ] [ ]A x B x A B A x B x

⎛ ⎞= −⎜ ⎟− − − − −⎝ ⎠

( ) ( )( ) 00 0 1 1 2

0 0 0 0 0 0 0 0 0

[ ]1 1 1ln [ ] ln [ ] ln[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] AB AB

A xdx dx A x B x C C k dt k t CA B A x B x A B A B B x

⎛ ⎞⎛ ⎞ ⎛ ⎞−− = − − − + = + = − =− +⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟− − − − − −⎝ ⎠ ⎝ ⎠⎝ ⎠

∫ ∫

0

0 0 0

[ ]1 ln[ ] [ ] [ ]

ACA B B

⎛ ⎞= ⎜ ⎟− ⎝ ⎠

0

10

20

30

40

50

60

0 0.2 0.4 0.6

Time

x(t)

A(t)

Fig. 5.3

A B C D E1 Constants A B C[1] k2 50 49 0.020203 2

3

=LN($B$2/$C$2)/($

B$2-$C$2)

4 Time x(t) A(t)

5

0

+($B$2*EXP($C$2*$E$2*A5+$D$2)-$C$2*EXP($B$2*$E$2*A5+$D$2))/(

EXP($C$2*$E$2*A5+$D$2)-EXP($B$2*$E$2*A5+$D$2))

=$b$2-B5

Page 38: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

38 Models in Biology

(5.16)

This is a linear equation. Plotting ln ([A0 - x]) / [B0 - x]) against time t yields a straight line with slope -k

([A0] - [B0]) and intercept ln ([A0] / [B0]). From these values x and k can be determined if we only know the

initial concentrations of A and B.

Note that the above method cannot be used when the initial concentrations of A and B are equal. Then

our integration by the partial fractions method looks different.

Setting x = 0 at t = 0 and combining both integration constants gives

Our linear equation now becomes

(5.17)

Fig. 5.4 shows typical functions of the concentra-

tions of one of the products [c] against time. The plot

was computed using eq. 5.13 and the Mathematica solution above. Second order reaction tend to have steeper

initial slopes.

Again we notice that our models of reaction kinetics have a similar structure to simple population

growth models. In the case of kinetics molecules and atoms are treated as being billiard balls flying randomly

in space and colliding with one another. Recent ecological models treat individuals of populations in a very

similar manner. Host—prey encounter are simulated solely according to a simple mass effect with encounter

probabilities being proportional to the densities of the interacting individuals and the migration rates (their

‘temperature’). Under these assumptions kinetic and ecological models look very similar.

One way to model our reaction kinetics or ecological laws is via the collision theory. Molecules or indi-

viduals interact when they collide (or meet). Now take the law for ideal gases. A gas that is compressed

changes its internal energy. The energy you need to compress is transformed into kinetic energy leading to

higher collision rates of the gas molecules. Hence the change in internal energy due to changing pressure

should be proportional to its volume. Hence from pV = nRT and referring to one mol gives

Integrating gives

We do not need integration constants because we are working with definite integrals. Pressure is nothing

more than concentration per unit volume. Hence our theory should also refer to solutions and chemical reac-

0 00 0

0 0

[ ] [ ]ln ([ ] [ ]) ln[ ] [ ]A x Ak A B tB x B

⎛ ⎞ ⎛ ⎞−= − − +⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠

1 2200

1[ ][ ]

dx C kdt kt CA xA x

−= + = − = − +

−−∫ ∫

0

1[ ]

CA−

=

0 0

1 1[ ] [ ]

ktA x A

= +−

( )dG RTV pdp p

= =

2 2

1 1

22 1

1ln

G p

G p

pnRTdG dp G G G nRTp p

= → − = Δ =∫ ∫

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3

Time

[C] First order

Second order

Fig. 5.4

Page 39: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 39

tions. There p1 and p2 denote the concentrations before and after the reaction of the type of eq. 5.11. The speed

is therefore

Therefore

or according to eq. 5.11

(5.18)

This is the well known Gibbs - Helmoltz equation (after the German chemist Herman von Helmholtz,

1821-1894, and Josiah W. Gibbs, American chemist, 1839-1903) which describes the change in free enthalpy

in dependence of temperature and concentration of reactants. Our initial mass law assumed that reaction speed

should be proportional to the number of encounters. This is here assumed to be proportional to the free en-

thalpy of the system. We solve for K and get the important equation.

(5.19)

The Swedish Nobel winner Svante Arrhenius (1859-1927) now assumed that reaction speed is propor-

tional to the number of activated particles in the system, hence the number of particles having an energy higher

than necessary for the reaction: v ∝ nA. This number should be proportional to the equilibrium constant K. We

get an equation that describes the dependence of reaction speed on temperature

(5.20)

with ΔG = ln(A)ΔE, with ΔE being the energy necessary to initiate the reaction (activation energy). A

defines the maximum speed at high temperature.

At the end we look at the dependence of the reaction constant K on the temperature T. From the Gibbs

Helmholtz equation we get

Therefore, the change in K at changing t is given by

(5.21)

This is the well known equation of van’t Hoff (after the Dutch chemist Jacobus H. van’t Hoff, 1852-

1911). To estimate a difference in K from a temperature difference T1 to T2 we have to integrate

1 2

3 4

1

2

ln([ ] [ ] )

ln([ ] [ ] )

n nAB

n nCD

v G nRT A B

v G nRT C B

= Δ =

= Δ =

31 2 4

3 1 241 2

[ ] [ ] [ ] [ ]ln ln[ ] [ ][ ] [ ]

nn n n

n n nnA B C DG G G nRT nRT

A BC DΔ = Δ − Δ = = −

lnG nRT KΔ = −

Gn R TK e− Δ

=

EnRTv K v Ae− Δ

∝ → =

G ln( ) ln( )nRT

G nRT K K ΔΔ = − → = −

2ln( )d K Gdt nRT

Δ=

2 2

1 1

22 1 2

1 2 1

1 1 1ln( ) ln( ) ln( ) ln( )K T

K T

K G Gd K K K dtK nR nR T TT

⎛ ⎞Δ Δ= − = = = − −⎜ ⎟

⎝ ⎠∫ ∫

Page 40: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

40 Models in Biology

Note the change in sign at the end.

What has all of this to do with ecology? At the

beginning I argued that biological models might look

very similar independent for what they were originally

designed. Consider a virus disease or a new phyto-

phagous insect on plants immigrating into a new re-

gion. We can reinterpret our collision theory. The tem-

perature of the individuals is now the speed of migra-

tion (remember that temperature is the speed of molecules or atoms). The activation energy might be inter-

preted as the level of immune system (the health) or the level of plant defence against phytophagous insects

infection rate. N is the population density of the insect or the number of viruses in the air. Hence we expect that

the speed of new infections, the infection rate per time, is a first order reaction and should follow an exponen-

tial function of migration potential. In Fig. 5.5 the infection rate k according to this model is plotted against

migration rate M. D denotes the density and c the constant. We also expect the infection speed v, to be propor-

tional to the number of insects or viruses present. Hence v ∝ D = kD with k being the infection rate. The num-

ber of infected plants or host I should follow an exponential function

Of course our analogy has limitations. We did not consider reproduction rates of the insects or viruses.

For our model to be realistic we would have to add productions rates of the ‘reactants’. Nevertheless for short

term changes in infections (inside one generation) or for rapid colonisations our simple analogy model appears

to be quite appropriate. Indeed in 2001 the American plant ecologist Stephen Hubbell published an intensively

discussed book, where he based major parts of community ecology solely on a so-called ecological drift model,

that is in its core nothing more than an extended model of reaction kinetics. In the statistics part we will deal

with such models in detail.

At the end I should notice that

today for medical and biochemi-

cal enzyme kinetics a huge num-

ber of different software is avail-

able that analyse automatically

many of the previous models and

provide numerical solutions.

( )dIv k D Idt

= = −

00.10.20.30.40.50.60.70.80.9

0 0.2 0.4 0.6Migration rate

Infe

ctio

n ra

te

Fig. 5.5

m a x

=c

D Mv v e

Page 41: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 41

6. Markov chains

Beside equation solving matrices have by far more biological applications. Assume you are studying a

contagious disease. You identified as small group of 4 persons infected by the disease. These 4 persons con-

tacted in a given time another group of 5 persons. The latter 5 persons had contact with other persons, say with

6, and so on. How fast does the disease spread in the population? To answer this question we first define a ma-

trix describing the first contacts. You have four infected persons and 5 contact persons of the second group.

Hence person 1 of the first group contacted with person 2 of the second group. No. 2 of the first group

contacted with No. 1, 2, and 4 of the second group and so on. Now you describe the second order contacts of

group three with group two.

To find the number of persons in group three that had (via group two) contact with infected persons of

group one you have to multiply both matrices. We get as the result

How to interpret this result? From the computational scheme of a dot product of two matrices follows

that the new elements of C result from all combinations of respective rows and columns of A and C. Hence the

ones and twos denote indirect contacts of a person in the third group with a person of the first group. Person 1

of the third group had 6 indirect contacts (1+1+2+2), the persons 2 and 4 only one.

However, we can also use probabilities of infection instead of contacts. Say that any contact gives a

probability of 0.3 that a person will be infected. We have to replace the one with 0.3 to get the probability that

persons in contact with infected persons get infected. Our model becomes

0 1 0 01 1 1 00 0 1 10 1 0 00 0 0 1

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

A

0 1 1 0 10 0 0 0 11 1 0 0 11 0 0 0 00 1 1 0 01 1 0 0 1

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

B

1 1 2 20 0 0 11 2 1 10 1 0 01 1 2 11 2 1 1

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

= • = ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

C B A

Page 42: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

42 Models in Biology

Hence person 1 of the third group has a probability of 0.54 of being infected. Of course this method can

be applied to further groups or generations (if we interpret the groups as generations). By this we get probabili-

ties of occurrences of initial events in subsequent time windows. The matrix multiplication allows for the pre-

diction of infections during epidemics.

Markov chains The above discussion leads immediately to the concept of Markov chains (after the Russian mathemati-

cian Andrei Markov, 1856-1922). A Markov chain is a sequences of random

variables in which the future variable is determined by the present variable but

is independent of the way in which the present state arose from its predeces-

sors. Hence if we have a series of process states the value of state n is deter-

mined by only two things. The value of state n-1 and by a rule that tells how

state n-1 might transform into state n. Most often these rules contain probabili-

ties. Then state n-1 goes with probability pi into any state i of n. Hence in a

Markov chain states prior than the previous do not influence the future fate of the chain. This is why Markov

chains are often said to be without memory.

Take for instance a gene that has three alleles A, B, and C. These can mutate into each other with prob-

abilities that are given in Fig. 6.1. A mutates into B with probability 0.12 and into C with probability 0.2.

Hence with probability 1 - 0.12 - 0.2 = 0.68 nothing happens. We can these so-called transition probabilities

write in a matrix form.

This matrix that gives the transition probabilities is called the transition matrix. The sum of all matrix

rows must add to 1, the sum of all probabilities. This is a general feature of all probability (stochastic) matri-

ces. If we now take the initial allele frequencies we can compute the frequencies of the alleles in the next gen-

eration. Assume we have initial frequencies of A = 0.2, B = 0.5, and C = 0.3. This gives a vector of the form X0

= {0.2, 0.5, 0.3}. The frequencies in the next generation are computed from

Again, the new frequencies of A = 0.201, B = 0.429, and C = 0.37 add up to zero. If we multiply two

probability matrices the resulting matrix is again a probability matrix.

If we continue the process we get X3=PX2, X4=PX3… In general the frequencies of a Markov chain after

0.09 0.09 0.18 0.180 0 0 0.09

0.09 0.18 0.09 0.090 0.09 0 0

0.09 0.09 0.18 0.090.09 0.18 0.09 0.09

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

= • = ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

C B A

0.68 0.07 0.10.12 0.78 0.050.2 0.15 0.85

⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠

P

1 0

0.68 0.07 0.1 0.2 0.2010.12 0.78 0.05 0.5 0.4290.2 0.15 0.85 0.3 0.37

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟= • = =⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

X P X

A B

C

0.10.2

0.050.15

0.07

0.12

Fig. 6.1

Page 43: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 43

n states starting from the initial conditions X0 and

determined by the transition matrix P is given by

(6.1)

Of cause this law is very similar to recursive

processes leading to exponential distributions. It is

a generalization. Equation 6.1 defines the simplest

form of a Markov chain process.

We also see that state n+1 is only dependent

on state n. This property serves even as the general

definition of a Markov process. The probability i of

a state Xn with respect to the previous states X1 to

Xn-1 is the same as the probability of Xn with re-

spect to state Xn-1 only. The previous states have no

influence any more. Mathematically written

(6.2)

Does our mutation process above reach in stable allele frequencies or do they change forever? This

question can be answered twofold. Does the frequency distribution remains constant or does the process elimi-

nate one or more alleles? The first

question is whether the frequencies

of the alleles remain constant. In this

case the following condition must

hold

This can be written in terms of eigen-

vectors

(6.3)

Pn is called the stationary state. This state is defined

by the eigenvector U of the transition matrix P with

the largest eigenvalue. This is scaled to λ = 1. Xn is

called the steady-state or equilibrium vector. The

Excel example beside shows the transition matrix of

three alleles. λ3 = 1 and the third eigenvector U3 de-

fines the stationary state, that is the frequency distribu-

10 0

−= • = •Λ • •n nnX P X U U X

n n 1 n 2 n 3 n 1 n n 1p(X i | X ,X ,X ...X ) p(X i | X )− − − − −= = =

1 •+ = =n n nX X P X

n n nP X 1X (P 1I) X 0• = → − • =

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10 12 14

Steps

Freq

uenc

y

Stationary frequenciesdefined by the eigenvector

Fig. 6.2

Page 44: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

44 Models in Biology

tion the process will end in. Note that the eigenvectors are nor-

malized to have the length one. To get the frequencies

(probabilities) we have to divide U3 through the sum of it’s val-

ues. Fig. 6.2 shows that in our case the allele frequencies quickly

converge to the steady state.

Do all Markov chains converge? We look at several im-

portant special cases. Fig. 6.3 shows a graphical representation

of a Markov chain with four states. Given are the transition

probabilities. The missing probabilities can be inferred from the

scheme. We see that state D cannot be reached from any other

state. It forms a closed part of the whole chain. If a chain does

not contain closed subsystems it is called irreducible. In such a

system all states can be reached. Fig.

6.4 now shows a simple example of a

periodic chain. The whole chain

forms a circle. Fig. 6.3 shows an

aperiodic chain. Fig. 6.3 shows also

two other important concept. First

the states A, B, and C are recurrent that means it is sure that that a finite time (even a very long time) the

process returns to the initial state. State D is not recurrent. In some chains there is only a certain probability that

the chain returns to a previous state. These chains are called transient. There is no way back to D. An impor-

tant class of finite Markov chains are now recurrent and aperiodic chains. These are called ergodic. The chain

of Fig. 6.3 is ergodic (except of state D), chain of Fig. 6.4 not (it is periodic). The next table shows the transi-

tion matrices, the eigenvalues, and the eigenvectors of both chains. For both chains λ = 1 exists, but only chain

6.3 converges. The probability matrix theorem now tells that every irreducible ergodic transition matrix (that

is the matrix containing only probabilities) has a steady state vector T to which the process converges.

(6.4)

This steady state vector is defined by the eigenvector of the matrix. Hence for ergodic matrices (these

are by far the most important) eq. 6.3 has always a solution. The theorem also implies that every transition

matrix has an eigenvector λ = 1.

Now look at the following matrix

It defines a transition matrix. Once state B or state D is reached the probability of change to B and D is 1.

These states cannot be left. The matrix has two absorbing states. In general a transition matrix has as many

absorbing states as it has ones on its diagonal.

Fig. 6.2 points also to another problem. How fast do Markov processes converge to the steady state. The

kk 0lim P X T→ ∞ =

0.5 0 0.2 00.2 1 0.4 00.2 0 0.1 00.1 0 0.3 1

⎛ ⎞⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎝ ⎠

P

A

BD

C

0.3

0.9

0.6

0.3

0.4

0.1

Fig. 6.3

Fig. 6.4

A B C

0.6

0.8 0.7

A B C D Eigenvalues Eigenvector 4A 0 0.3 0.3 0 -0.3 0.384111B 0.4 0.7 0 0 0.1 0.512148C 0.6 0 0.7 0.9 0.7 0.768221D 0 0 0 0.1 1 0

A B C Complex eigenvalues Eigenvector 3A 0.2 0 0.6 -0.05 0.597913 0B 0.8 0.3 0 -0.05 -0.597913 0C 0 0.7 0.4 1 0 0

Page 45: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 45

time to convergence is obviously connected to the probabilities in the matrix. The recurrence time of a state i

is now defined as the mean time at which the process returns to i. It can be shown that the recurrence times T

of any state i are inversely related to the stationary probabilities π.

(6.5)

The mean time to stay in any state is of course the inverse of the

probability not to leave the state. Hence in Fig. 6.3 the recurrence

time of state A is T = (0.38+0.51+0.77)/0.38) = 4.33 steps. The

question how long it will take to reach the stationary state is iden-

tical to the question what function describes the Fig. 6.3 and how

to calculate the parameter values of the function. With some

mathematics one can show that it is an exponential function of

the type

(6.6)

There are no simple solutions for the parameters.

A typical application of Markov chains in biology is succession.

For instance gravel pits have a distinct mosaic plant community

structure. Abandoned pits go through series of successional

stages. If we now map the plant distribution of the gravel pit im-

mediately after abandonment we get a matrix of initial states.

From other studies it is known with what frequency certain struc-

tural elements transform into others. Hence we have a transition

matrix. We can now describe the whole process of succession by

a Markov chain model. In this case we have a matrix of the initial

stage and the transition matrix. The model looks as follows

(6.7)

We consider six different plant community classes and have the

following transition matrix. Our

initial stage is given by Fig. 6.6

where the six communities are rep-

resented by different colours. After

t = 100 states (Fig. 6.7) our map

changed totally. Community types 1

and 2 dominate, 5 and 6 vanished.

After even 1000 states (Fig. 6.8) not

much had changed. However, very

slowly the frequency of community

3 raises. The proportion of commu-

ii

1T =π

btp ae −=

0= •ttX P XA B C D E F

1 4 2 1 1 1 32 3 1 1 3 2 23 3 3 1 1 3 14 3 3 2 4 2 25 1 1 3 2 2 36 2 3 2 4 3 37 1 2 3 1 3 48 1 2 3 1 3 4

Fig. 6.8

A B C D E F1 1 2 4 1 6 42 1 1 2 1 6 43 4 3 1 2 6 54 4 3 1 4 5 55 4 2 4 5 5 46 6 2 3 5 4 37 5 1 3 6 4 48 3 1 2 6 5 2

Fig. 6..6

A B C D E F1 3 2 3 3 4 22 2 1 2 4 3 33 2 2 3 1 3 14 3 4 1 2 2 35 2 1 1 3 2 16 2 3 2 1 2 37 4 1 1 2 1 38 2 3 2 1 3 2

Fig. 6.7

do while(jj.le.runs) do 100 i=1,arkol do 101 j=1,arnu ran1=ran(iseed) k=area(i,j) prob1=0 do 102 ii=1,spec prob1=prob(ii,k)+prob1 if(ran1.le.prob1)then area(i,j)=ii goto 101 endif 102 continue 101 continue 100 continue

Fig. 6.5, Photo Jan Meyer

1 2 3 4 5 61 0.12 0.20 0.21 0.29 0.18 0.062 0.26 0.02 0.31 0.03 0.31 0.033 0.05 0.05 0.08 0.12 0.28 0.284 0.09 0.29 0.03 0.26 0.02 0.165 0.26 0.22 0.27 0.21 0.08 0.226 0.21 0.22 0.09 0.09 0.13 0.25

Sum 1.00 1.00 1.00 1.00 1.00 1.00

Page 46: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

46 Models in Biology

nity 4 remains stable but type 2, which dominated the intermediate stage of succession, decreases.

How to compute these pictures. Either you apply a commercial program that computes Monte Carlo

simulations and Markov chains or you write a program for your own. In our case I used a self written program

that iterates equation 6.2. For shorter series you can run a math program iteratively. Above a simple Fortran

solution is shown with which I computed the matrices on the left side.

Markov chains find application in probability theory. Assume for instance you have a virus with N

strains. Assume further that at each generation a strain mutates to another strain with probabilities ai→j. The

probability to stay is therefore 1-Σai→j. What is the probability that the virus is after k generations the same as

at the beginning. This can be modelled by a Markov chain with the following transition matrix

We get the desired probability from the matrix element p11 of Pk. Hence

The next table shows the respective Excel solution for a given transition matrix using the Matrix add in

for k =6. The requested probability is pii 0.23. Markov chains are therefore ideal tools for calculating probabili-

ties if we have multiple pathways to reach certain states. Particularly, they describe the probability to get in k

steps from state A to state B if the transition probabilities can be described using a transition matrix.

Random walk models A special example of Markov chains are random walks

Wee know already that random walks are defined by the general state equation

The state Nt is only defined by the previous state and a probability function of change. Typical examples

of such random walks are for instance animal movements. Let’s consider an animal A being at place x0. In a

next step it might turn to left with probability pl, turn to right with probability pr or walk straight on with prob-

ability ps Our random walk model looks at follows

i 1,1 1N

N1 1,i 1

1 a ap

a 1 a

⎛ ⎞−⎜ ⎟= ⎜ ⎟⎜ ⎟−⎝ ⎠

K

M O M

L

k k 1P U U−= • λ •

1t tN N ran+ = +

P A B C Eigenvalues EigenvectorsA 0.5 0.05 0.3 0.338197 0.814984 0.550947 0.368878B 0.3 0.8 0.1 0.561803 -0.450512 -0.797338 0.794506C 0.2 0.15 0.6 1 -0.364472 0.246391 0.482379

k = 5 Lk Inverse0.004424 0 0 0.878092 0.264583 -1.107265

0 0.055966 0 0.109323 -0.798204 1.2310890 0 1 0.607621 0.607621 0.607621

PN A B C ULk ULkU-1

A 0.230675 0.20048 0.258105 0.003606 0.030834 0.368878 0.230675 0.20048 0.258105B 0.47613 0.51785 0.43003 -0.001993 -0.044624 0.794506 0.47613 0.51785 0.43003C 0.293195 0.28167 0.311865 -0.001613 0.013789 0.482379 0.293195 0.28167 0.311865

Page 47: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 47

This is a recursive equation that describes a direc-

tional process. It’s two dimensional equivalent would

have the form

where the columns define forward or backward walk.

Recursive probability functions are also special cases of Markov chains. We can’t know, where the

animals ends his walk. But we might use a model of 5000 animals and try to give probabilities of the outcome.

Such a Monte Carlo simulation provides us with a frequency distribution of end points of the random walk.

Then we can tell that a typical animal ends his walk there or there. To model this we need the possible area into

our animal can walk, the number of possible states. This is indicated by the green area in Fig. 6.9. If this num-

ber is finite we speak of a bounded random walk. What is if the animal reaches the lower or upper boundary?

In the Figure the animal is reflected from the barrier.

1

l

n n s

r

px x p

p−

⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠

11 12

1 21 22

31 32

⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠

n n

p px x p p

p p

pl

pr

ps

Fig. 6.9

Page 48: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

48 Models in Biology

7. The Weibull function and life history tables

Animals and plants have at each stage of their life history certain probabilities to die. These probabilities

can be combined in demographic or life history tables. A typical life table is shown in the first table. It is a

life table with discrete age categories. The first column gives the age t. The second column contains the num-

bers of individuals observed Nt. These are the individuals that survived to time t. At time t=0 we have the initial

population size at birth. Nt+1 - Nt =Dt gives the number of deaths in interval t. The mortality rate mt is the quo-

tient of deaths Dt and the original number Nt at interval t. The cumulative mortality rate Mt is the quotient of

the total numbers of deaths and the original population size N0.

(7.1)

lt = 1 - mt-1 is the proportion of individuals that survived to interval t. The cumulative proportion surviv-

ing st is of course 1 - mt. The mean number of individuals alive at each interval Lt is the arithmetic mean of Nt

and Nt+1.

(7.2)

To compute the further life expectancy Et from time t on we need the cumulative Lt. This is the total

number of years all the mean numbers of individuals will live. Tt is defined as

(7.3)

The mean life expectancy Et is then the quotient of Tt and Lt

(7.4)

t max

tt 1

t0

DM

N==

t t 1t

N NL2

++=

t max

t ti t

T L=

= ∑

tt

t

TEL

=

Age Observed number of animals

Number dying

Mortality rate

Cumula-tive morta-

lity rate

Proportion surviving

Cumula-tive pro-portion

surviving

Mean number

alive

Cumula-tive Lt

Mean fur-ther life expec-tancy

t Nt Dt mt Mt lt st Lt Tt Et

0 1000 370 0.37 0.370 - - 1000.00 3028.00 3.03 1 630 210 0.33 0.580 0.63 0.630 815.00 2028.00 2.49 2 420 170 0.40 0.750 0.67 0.420 525.00 1213.00 2.31 3 250 140 0.56 0.890 0.60 0.250 335.00 688.00 2.05 4 110 50 0.45 0.940 0.44 0.110 180.00 353.00 1.96 5 60 26 0.43 0.966 0.55 0.060 85.00 173.00 2.03 6 34 19 0.56 0.985 0.57 0.034 47.00 88.00 1.86 7 15 10 0.67 0.995 0.44 0.015 24.50 41.00 1.65 8 5 2 0.40 0.997 0.33 0.005 10.00 16.00 1.60 9 3 2 0.67 0.999 0.60 0.003 4.00 6.00 1.50

10 1 1 1.00 1.000 0.33 0.001 2.00 2.00 1.00 11 0 - - 0.00 0.000 - - -

Page 49: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 49

The mean life expectancy of a six year old animal is therefore 1.86 years.

Next we have to deal with reproduction. Having discrete age classes we can assign each class a repro-

duction rate rt as the quotient of newborn individuals and total population size Nt. Hence

(7.5) This model does not contain deaths. To include death rates we extend the model to k equations contain-

ing population sizes

The population size of N0 is the sum of all reproduction processes at each age class. The mortalities are

given by eq. 7.5. The above equation can be expressed in matrix notation (the Leslie matrix)

or

(7.6) Hence N(2)=P•N(1)=P•P•N(0) an so on. We get

(7.7) From eq. 3.20 we get

where Λ is the matrix of eigenvalues of the transition matrix P.

To see whether the process is stationary we need

(7.8) In other words, the vector Nt is one of the eigenvectors of the transition matrix P having the eigenvalue λ

= 1.

One example from botany. Boucher and Mallona (Forest Ecol. Manage. 91(1997): 195-204) reported

survival rates of the lowland tropical rainforest tree Vochysia ferruginea after Hurrican devastation. Following

the population during 5 years they found that small adults produce in the mean 35.6 seedling and large adults

70.1 seedling. The probability for a seedling to stay a seedling in the next year was 0.209, the probability to

become a small sapling 0.01. All other seedling died. Using the respective data for small and large sapling and

r( t )tt t 1 t

t

r (t ) t0

Nr N N eN

dNr(t)dt N N eN

Δ+= → =

= → =

n

0 1 1 n n n ni 1

1 0 0

2 1 1

n n 1 n 1

N r N ...r N r N

N m NN m N...N m N

=

− −

= + =

==

=

0 01 2 n 1 n

1 10

1

n 1 n 1

n 1n n

N (t 1) N (t)0 r r ... r rN (t 1) N (t)m 0 0 ... 0 0... ...0 m 0 ... 0 0N (t 1) N (t)... ... ... ... ... ...

0 0 0 ... m 0N (t 1) N (t)

− −

+⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟+⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟= •⎜ ⎟ ⎜ ⎟⎜ ⎟

+⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟+ ⎝ ⎠⎝ ⎠ ⎝ ⎠

N(t 1) P N(t)+ = •

tt 0N P N= •

t 1t 0N U U N−= • • •Λ

t t tN P N 1N (P I) N 0= • = → − • =λ

Page 50: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

50 Models in Biology

small and large adults they constructed the following Leslie matrix

The diagonal of this matrix give the probabilities of staying in the same class, the first row gives the

numbers of birth (propagules), the upper triangle the probabilities of regressing and the lower triangle the prob-

abilities of moving to another class. The largest eigenvector is > 1 that means the population was healthy

(why?). The age structure after 20 generations and the respective numbers of individuals starting from only 100

seedling comes from N20 = P20N0. Boucher and Mallona predicted a very fast recovery of the population after

destruction. Of course this projecting relies on the assumption that our matrix entries remain constant.

The net reproduction rate of a population is defined as

( 7 . 9 )

To compute R0 we need data of numbers of female offspring for each age class (so-called pivotal classes). This

is shown in the next table. R0 is now the sum of all lxbx

(7.10)

In our case the net reproductive rate is less than 1 and we infer that the population will decline.

From R0 we also get the mean generation length G. G is defined as

(7.11)

The example of the previous tables gives G = 29.9 years. The last values we need is the innate capacity

of increase. We know already the exponen-

tial growth model

( 7 . 1 2 )

r in this model gives the rate of increase.

Lotka gave an equation how to estimate r

from a life history table. He found that r

must satisfy the following condition

0Numbers of daughters in generation t+1RNumbers of daughters in generation t

=

t

0 i ii 1

R l b=

= ∑

n n

i i i ii 1 i 1

n0

i ii 1

l b i l b iG

Rl b

= =

=

= =∑ ∑

rt0N N e=

Age Pivotal age

Observed number at pivotal age

Percent survi-ving

No of female

off-spring

Female offspring

per female ltbt

t t Nt lt Dt bt

0-9 4.5 950 0.95 0 0.000 0 10-19 14.5 905 0.905 50 0.055 0.05 20-29 24.5 870 0.87 410 0.471 0.41 30-39 34.5 740 0.74 300 0.405 0.3 40-49 44.5 710 0.71 100 0.141 0.1 50-59 54.5 640 0.64 5 0.008 0.005

R0 0.865

Seedling Small sapling Large sapling Small adult Large adult Complex Eigenvalues Starting densitiesSeedling 0.209 0.000 0.000 35.600 70.100 0.083 0.000 1000.000Small sapling 0.010 0.653 0.020 0.000 0.000 0.459 0.000 100.000Large sapling 0.000 0.170 0.407 0.000 0.000 0.650 0.335 0.000Small adult 0.000 0.000 0.570 0.731 0.000 0.650 -0.335 0.000Large adult 0.000 0.000 0.000 0.266 0.997 1.155 0.000 0.000

Final densitiesk=20 1.613 152.642 450.605 585.690 717.009 16877.634

0.032 3.068 9.061 11.780 14.425 339.2630.007 0.697 2.059 2.678 3.280 77.1230.010 0.939 2.770 3.600 4.408 103.8920.017 1.582 4.672 6.073 7.435 174.971

Page 51: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 51

( 7 . 1 3 )

Knowing R0 and G we can approximate r from

(7.14)

From our previous example we get

R = ln(0.865)/29.9 = -0.005 < 0. The population appears to decrease very slowly.

Our life tables also allow for the calculation of reproductive values in a population. The reproductive

value at age t is defined as the number of progenies plus the expected future number of progenies. It is

(7.15)

In the above example the reproductive value at age 25 is

0.9. The reproductive value at age 0 is of course identical

to the net reproductive rate R0.

Next we deal with is the Weibull distribution (after the

Swedish mathematician Waloddi Weibull, 1887-1979)

(7.16)

The Weibull distribution has the mean

and the variance

With Γ being the Gamma function described in the statstics lecture.

We get the cumulative density distribution from the integral

(7.17)

For β = 1 the Weibull distributions equals a simple expo-

nential function. For β = 3 the distributions approximates

(but not equals) a normal, for larger b the distribution be-

comes more and more left skewed (Fig. 7.1).

The Weibull distribution is particularly used in the analy-

sis of life expectancies and mortality rates. We simply

model the mortality rate m at time t using a general power

function model

nri

i ii 1

e l b 1−

=

=∑

0ln(R )rG

=

n ni i i i

t ti t 1 i tt t

l b l bV bl l= + =

= + =∑ ∑

1 xf ( , ) x eββ− −αα β = αβ

)11()( /1 +Γ= −

βα βxE

))11()12(())(( 2/22 +Γ−+Γ=− −

ββαμ βxE

xF( , ) 1 eβ−αα β = −

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 1 2X

f(x)

β = 0.5

β = 1β = 2

β = 3

0

0.2

0.4

0.6

0.8

1

0 50 100 150t

F

b=1b=2b=3b=4

T = 100

Fig. 7.1

Fig. 7.2

Page 52: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

52 Models in Biology

Using S(t) as the distribution of survival and modelling

this via an exponential model under the assumption that

the mortality rate is constant we get

For this usage the Weibull distribution is rewritten in a

two parametric form

( 7 . 1 8 )

where T denotes the characteristic life expectancy and t the age. f(β) gives then the probability that a given per-

son will die at age t. T is the age at which 63.2 % of the population already died. We get T from eq. 7.17 with α

= 1/T by setting t = T.

(7.19)

Having now data on age specific mortality rates f(β) we can estimate the characteristic life expectancy T

and the shape parameter β. The parameterized model then allows for the calculation of survival and mortality

rates and associated demographic variables at any given time t (Fig. 7.2). The Fig. shows the cumulative mor-

tality rates in dependence of time for T = 100 and different β using eq. 7.17.

Having data on mortality rates we can estimate the characteristic life time T from eq. 5.19. We use a

double log transformation

(7.20)

Using the cumulative mortality rates of the first tables we obtain b from the slope of a plot of ln[ln(1-F)]

against ln(t) (Fig. 7.3) We get a slope of 1.20, typical for many insects that have an exponential mortality - time

distribution. The intercept b is

This is the characteristic life expectancy. Interpolating the second column of the initial table give for 630 indi-

viduals to have died a very similar result around two years.

10m(t) m tβ−=

0m tS( t ) eβ−=

t1Ttf ( ) e

T T

β⎛ ⎞β − −⎜ ⎟⎝ ⎠β

β =

tTx 1F(1, ) 1 e 1 e 1 0.632

e

β

β

⎛ ⎞−⎜ ⎟⎜ ⎟− ⎝ ⎠β = − = − = − =

tln[ ln(1 F( )] ln ln(t) ln(T)T

β β β β⎛ ⎞− − = = −⎜ ⎟⎝ ⎠

0,891,2b (lnT) T e 2.09

−−

= − → = =β

y = 1.2009x - 0.8888

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5

ln(t)

ln[-l

n(1-

F)]

Fig. 7.3

Page 53: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 53

8. Basic models in genetics

Our last lecture deals with simple models in genetics. Surely, genetics is one of the most mathematicized

parts of biology. This holds in particular for population genetics and evolutionary genetics. We will deal with

the logic behind some of the models. (Some examples of this lecture were inspired by the excellent Primer of

Population Biology by Edward O. Wilson and William H. Bossert, Stamford 1971)

One of the fundamental laws in genetics is the Hardy Weinberg law. It tells that without evolutionary

processes the frequency of alleles in a genome remains constant. To understand the

logic behind this law assume a gene with two alleles A and B. The frequencies of

A and B in the whole population are denoted with p and q. Hence p + q = 1. Now

assume crossing. The frequency of AA is the product pp, the frequency of BB is

the product qq. The frequency of AB is then pq + qp. The total frequencies of all

combinations are of course again 1. Hence pp + qq + 2pq = 1 or

(8.1)

This is the law first proposed in 1908 by the British mathematician Godfrey Harold Hardy (1877-1947)

and the German physician Wilhelm Weinberg (1867-1937).

Does the frequency of the alleles A and B

remain stable? Look at the following stan-

dard scheme. From this scheme we see

that the frequency of q in the next generation is q2 + qp. Hence

(8.2)

This is of course very simple and surely known from school. The biological interpretation is that with-

out selection, gene flow from one population to another or mutation events allele frequencies in a population

remain stable. No evolution occurs.

But evolution occurs! First we look at mutation events, where a gene or an allele A mutates to B. Muta-

tions are dose dependent, that means the number of mutation events M in a genome is proportional to the total

amount of the mutation inducing agent D, the dose. Hence

( 8 . 3 )

where k is a constant that describes the effectiveness of the agent to induce mutations.

The total amount of mutations of a certain gene locus is of course proportional to the total number of

that gene in the population N. M ∝ N. Hence we can define a new value, the mutation rate μ that describes this

proportionality.

(8.4)

The problem is now how to describe the rate of change of p due to mutation events. The rate of change

2 2 2( ) 2 1p q p pq q+ = + + =

2

2 2 2( )

( 2 ) ( )pq q q p q q

p pq q p q+ +

= =+ + +

M D M kD∝ → =

M kDN N

μ = =

AA AB BB SumAfter crossing p2 2pq q2 1Frequency of B 2pq / 2 q2 pq+q2

pp pqqp qqq

pqpBA

BA

Fig. 8.1

Page 54: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

54 Models in Biology

is proportional to the frequency of the gene p. Hence

(8.5)

However, there are also back mutations from B to A The same law holds

(8.6)

Integration gives a fundamental rule for the change of gene frequencies in time under mutation pressure

(8.7)

Hence allele frequency decreases exponentially. We can also compute the equilibrium frequencies of p

and q due to mutation and back mutation rates. Because we are dealing with frequencies: p + q = 1. The total

rate of change in p is the sum of the change from p to q plus the change from q to p.

(8.8)

At equilibrium no further change should occur and we get

(8.9)

This will be the equilibrium frequency of A.

Next we look at gene flow. Assume a population has an allele A with frequency p. Due to migration the

next generation gets individuals from outside by immigration and looses individuals by emigration. What is the

new frequency of A if the frequency of A of the immigrating population is p*. Let i denote the immigration and

e the emigration rate. Both processes are again assumed to be proportional to actual density. The total number

of individuals before migration was N0. Ni individuals immigrated, Ne emigrated. Hence the new population

contains alleles A. The new frequency is

The new frequency of A can be determined solely from emigration and immigration rates and from the

frequency of A in the donor population. If the population size remains constant i equals e. The change of p

through time is then given by

(8.10)

Hence p = -i (p0 - p*) t. The change in allele frequency caused by gene drift is a linear process.

Next we consider selection. Assume a gene with two alleles A and B with frequencies p and q. If B is

under selection pressure the fraction of individuals having B is diminished by a factor s. s is termed the fitness.

dp pdt

μ= −

dq qdt

ν=

0

0

t

t

p p e

q q e

μ

ν

−=

=

dp p qdt

μ ν= − +

(1 )p q p p νμ ν νμ ν

= = − → =+

0 0 0 0* ( ) *e iN p N p N p N eN p iN p− + = − +

0 0 0

0 0 0

* (1 ) *1new

N p eN p iN p p e ippN eN iN e i

− + − += =

− + − +

0 0 0 0 0( *) ( *)newp dpp p p p i p p i p pt dt

Δ= − = − − − → =− −

Δ

Page 55: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 55

It is a value between 0 and 1. The

selection coefficient is now defined

as (1-s). With this coefficient we

have to multiply frequencies to get

the respective frequencies after selection. Now we have several possibilities to model this process of selection.

First assume that B is recessive and totally eliminated. The frequency of q after selection (in the next

generation) is

(8.11)

This looks like a recursive function. Because it doesn’t matter where we start we can generalise the last

equation and get

Can we simplify this equation to get the frequency of q after n generations? Look

(8.12)

An example: Assume a dog race should loose a recessive gene responsible for having curled hairs. This

gene has a frequency of 1 per 100 dogs. How long would it take to drop the frequency to a level 1 per 10000

dogs if all dogs with curled hairs are protected from breeding. Solving eq. 8.12 for n gives

More common than total elimination of recessive alleles is a partial elimination or partial selection

against this allele. Now our scheme

becomes

As before we compute the

frequency q of B after n generations

(8.13)

There is no general closed solution to this recursive equation. To simplify we obtain another strategy

and compute the change Δq = qn+1 - qn.

Now we consider only small changes in q and transform into a differential equation

0 0 01 2

0 0 0 02 1p q qq

p p q q= =

+ +

1

11n

nn

qqq−

=+

2

1 2 2 0

21 2 0

2

11 1 2 11

1

n

n n nn n

nn n

n

qq q q qq qqq q nq

q

− − −

−− −

+= = = → =

+ + +++

0

0

0.01 0.0001 9900 generations0.01*0.0001

n

n

q qn nq q

− −= → = =

20 0 0 0 0 1 1

1 2 2 20 0 1

(1 ) (1 ) (1 )1 1 1

n nn

n

p q s q q sq q sqq qsq sq sq

− −

+ − − −= = → =

− − −

2

1 2

(1 )1

n nn n

n

sq qq q qsq+

− −Δ = − =

AA AB BB SumBefore selection p0

2 2p0q0 q02 1

Selection coefficients 1 1 1-sSelection p0

2 2p0q0 0 p02+2p0q0

AA AB BB SumBefore selection p0

2 2p0q0 q02 1

Selection coefficients 1 1 1-sSelection p0

2 2p0q0 (1-s)q02 p0

2+2p0q0+(1-s)q02

Page 56: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

56 Models in Biology

(8.14)

There is no simple solution to this equation. We

approximate a solution. s and q are both smaller

than 1. Hence sq2 << 1. We simplify 15.14 and get

(8.15)

Rearranging gives

There is an exact solution to this problem although complicated. It involves the ProductLog function a

solution of y = xex. However, numerical solutions are always possible. You need

(8.16)

Now assume that the allele B is selected against but on the other hand is produced by a constant muta-

tion rate μ. Does this process lead to an equilibrium frequency q of B. We model the change of B as before

An equilibrium means Δq = 0. Solving for q gives

This is the equilibrium frequency of B at constant s and μ.

Now we look at the important case where heterozygotes are superior. This is the well know heterosis

effect. By definition the fitness of the heterozygotes is 1. The change in q from one generation to the other

becomes

(8.17)

which has a quite simple solution for

q.

(8.18)

Now we treat the same problem from

2

2

(1 )1

dq sq qdt sq

− −=

2 (1 )dq sq qdt

= − −

2 (1 )dq sdn

q q= −

00

21 1 1ln( )

(1 )

nn qq

qq

qdq snq qq q

−= + =

−∫

2

1 2(1 )(1 )

1osq qq p q q

sqμ μ→

− −Δ = − Δ = − −

2

2 (1 )1sq q

ssqμμ

μ= → =

+−

20 0 2 0

1 0 02 21 0 2 0

(1 ) (1 )1 (1 )q q s qq q q q

s q s q− + −

Δ = − = −− − −

1

1 2

sqs s

=+

AA AB BB SumBefore selection p0

2 2p0q0 q02 1

Selection coefficients 1-s1 1 1-s2

Selection (1-s1)p02 2p0q0 (1-s2)q0

2 1-s1p02-s2q0

2

Page 57: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 57

a matrix orientated perspective. The Excel

example beside shows the probability matri-

ces that we obtain if we cross the genotypes

AA, Aa and aa. Let P((A) = p be the prob-

ability of genotype A. Now we can con-

struct a matrix that contains the probabili-

ties to get each of the genotype in F1 from

each genotype in F0. For instance, the probabilities to get AA in F1 from AA in F0 is p: the probability to cross

with AA is p2 and this gives always AA. The probability to cross with Aa is 2pq and this give sin half the cases

AA, the probability to cross with aa in q2 and this gives never AA. The total probability to get AA in F1 from

AA in F0 is therefore p2 + 2pq/2 = p. Similar calculations lead to the Excel matrix above. Taking p = 0.6 and

q = 0.4 we obtain a probability matrix. The respective eigenvectors of the eigenvalue λ3 = 1 give the equilib-

rium frequencies. These are p = 0.6 and q = 0.4. We once again formulated the Hardy Weinberg law.

We can extend this result to a population of N di-

poloid individuals (2N genes). Consider random

genetic drift, that means a panmictic non-

overlapping population (random mating). The fre-

quency distribution of an allele A should follow a

Markov chain model (random walk) and therefore

approximate a binomial distribution. If we have i times A at the beginning of our drift the frequency of A is

i/2N. The probability to find j alleles in the next generation can be seen as a sampling of k genes out of the

original population with replacement. This is the classical Fischer-Wright model of genetic drift, the stan-

dard null model of population genetics. The probability to have j alleles in generation Fn under the condition

that there were i alleles in the previous generation Fn-1 is then

(8.19)

This probabilities can be expressed in a transition matrix where column probabilities add to 1. We see

that such a matrix has two absorbing states. In other words the process will end either in eliminating the allele

A or in a monodominance of A. The probability of loosing an allele A starting with k copies of A is then the

sum of all pi0. This sum has no trivial solution. Using diffusion theory from classical physicals it is possible to

show that the probability of extinction of an allele A with k copies is

(8.20)

Hence, a new mutation starting with 1 copy has a probability of pE = (1-1/2N) to go extinct and pS = 1-

(1-1/2N) = 1/2N to survive. The associated time to fix an allele in a population is approximately

(4.21)

In other words large effective population sizes prohibit the spread of new mutants in a population.

j 2N j

j

2N i ip 1j 2N 2N

−⎛ ⎞⎛ ⎞ ⎛ ⎞= −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

E2N kp

2N−

=

FT 4N≈

P(A)=pP(a)=q0.6 0.4

AA Aa aa AA Aa aaAA p p/2 0 AA 0.6 0.3 0Aa q 1/2 p Aa 0.4 0.5 0.6aa 0 q/2 q aa 0 0.2 0.4

Eigenvalues Frequencies p q

0 0.408 0.802 0.58 0.36 0.6 0.40.5 -0.82 -0.27 0.773 0.48

1 0.408 -0.53 0.258 0.16Sum 1.61

Eigenvectors

Gneration n0 1 2 … 2N

0 1 p10 p20 … 01 0 p11 p21 … 02 0 p12 p22 … 0… … … … … …2N 0 p12N p22N … 1

Generation n-1

Page 58: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

58 Models in Biology

Next we look at fitness. The frequencies of a genotype with respect to the genotype with the highest

frequency after selection is termed relative fitness. The sum of all relative fitness values after selection (the

sum column of our crossing schemes) is termed the average fitness of an individual. How does the average

fitness of an individual changes with respect to changes of the frequencies of A and B. We can include fitness

coefficients (1-si). To obtain new frequencies for p and q we multiply the three matrices with the respective

fitness values and get three new matrices that are shown below. Starting from initial frequencies we get then

new frequencies from the multiplication of the frequency vector with the transition matrix. After only five steps

we get fairly constant frequencies for AA = 0.04, Aa = 0.60, and aa = 0.36. Therefore p = 0.04+0.60/2 = 0.34

and q = 1-0.34= 0.66.

For simplicity we take our first scheme of eq. 8.11. The total elimination of the recessive allele. Let W

denote the average fitness. We need the derivative dW / dq, the change in average fitness with respect to

changes in the frequency of allele B.. Hence

But we are also interested in the change of W with respect to time, hence in dW / dt, For this we denote

dq / dt comes from eq. 8.11. dq / dt ≈ Δq = q1 –q0 ≡ q / (1+q) - q. We get for the change of average fit-

ness in time

This term is always positive and we see that eliminating B via selection leads to an increase in average

fitness.

2(1 ) 2(1 ) 2(1 ) 2 4 2dWW q q q q q qdq

= − + − → = − − + − = −

2dW dW dq dqqdt dq dt dt

= = −

322 ( )1 1

dW q qq qdt q q

= − − =+ +

Page 59: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 59

Literature (Coloured titles are available in the Institute or the library, red titles are of major importance)

Cornish Bowden A. 1999—Basic Mathematics for Biochemists—Oxford Univ. Press 2nd. Ed.

Ennos R. 1999—Statistical and Data Handling Skills in Biology—Longman.

Foster P. C. 1998—Easy Mathematics for Biologists— Taylor and Francis.

Jordan D. W., Smith P. 2002. Mathematical techniques. 3rd. Ed. Oxford Univ. Press

Grossman S., Tuner J. E. 1974—Mathematics for the Biological Sciences—Macmillan.

Martin J. 1972—Podstawy Matematyki i Statystyki—Warszawa.

Portenier C., Gromes W. 2003. Mathematik für Biologen und Humanbiologen. Script. Marburg

Scheiner S. M., Gurevitch J. (eds.) 2001—Design and Analysis of Ecological Experiments— Oxford Univ.

Press (2nd. Ed.).

Wilson E. O., Bossert W. H. 1971, A Primer of Population Biology. Sinauer (Stamford).

Murray J. D. 2003. Mathematical Biology. 3rd ed. Parts I and II. Springer New York.

Science Magazine special feature. 2004. Mathematics in biology. Science 303.

Napiórkowski K. 2001. Matematyka. http://info.fuw.edu.pl/~ajduk/FUW/matnkf/matematyka01_nkf.pdf

Page 60: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

60 Models in Biology

Online archives and textbooks Online Mathematical textbooks (A large collection of textbooks) http://www.math.gatech.edu/~cain/textbooks/

onlinebooks.html

General mathematics (a collection of online lecture scripts and basic text on mathematics) http://

www.geocities.com/alex_stef/mylist.html

Mathematics online (a source of educational online texts) http://www.glencoe.com/sec/math/

Mathematics Virtual Library (Many links to interesting web pages and programs)

http://www.math.fsu.edu/Science/math.html

Math on the web (Search engine for all sorts of mathematics)

http://www.ams.org/mathweb/mi-mathinfo07.html

The Math Archive (Many links to interesting web pages and programs)

http://archives.math.utk.edu/

Eric Weisstein’s Mathematics ( a large online mathematics dictionary, with many examples) http://

mathworld.wolfram.com/

The Internet Mathematics library (a large collections of topics for pupils and students, math-beginners) http://

mathforum.org/library/

Mathematic resources (a large compilation of math internet pages)

http://www.clifton.k12.nj.us/cliftonhs/chsmedia/chsmath.html

Kolegium nauczyczielski. Materiały z wykładów. (Online scripts on various topics) http://info.fuw.edu.pl/

~ajduk/lect.html

Johannes Müller. 2003. Mathematical models in biology. Lecture term at TU Munich. http://www-

m12.ma.tum.de/lehre/model_2003/skript/skript.pdf

Population growth models (a nice collection of growth models) http://www.math.duke.edu/education/postcalc/

growth/contents.html.

Population growth models (A collection of growth model an animations) http://members.optusnet.com.au/

exponentialist/Growth_Models.htm

Competition models (for persons who are interested In a discussion of the Lotka Volterra models) http://

www.ub.rug.nl/eldoc/dis/fil/r.c.looijen/c11.pdf

The MacTutor history of mathematics (a very nice page on historical topics) http://www-history.mcs.st-

andrews.ac.uk/.

Excel Turorials (Many macros) http://www.herber.de/index.html?http://www.herber.de/forum/

archiv/104to108.htm.

Computational molecular Biology. (a very good side with examples how to use mathematics in molecular biol-

ogy). http://www.cs.bc.edu/~clote/ComputationalMolecularBiology/

Page 61: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

Models in Biology 61

Mathematical software The Windows software collection (public domain and freeware)

http://archives.math.utk.edu/software/.msdos.directory.html (contains many very nice programs)

The mathematics virtual library (a collection of software pages) http://www.math.fsu.edu/Virtual/index.php?

f=21.

Guide to mathematical software (a search engine for math programs) http://gams.nist.gov//

Step by step derivatives (a very good program for computing derivatives) http://www.calc101.com/

webMathematica/derivatives.jsp#topdoit

Derivative calculator (a nice small but quite effective program for computing derivatives) http://cs.jsu.edu/

mcis/faculty/leathrum/Mathlets/derivcalc.html

JAVA Mathlets for Math Explorations (a nice collection of small math programs for everybody) http://

cs.jsu.edu/mcis/faculty/leathrum/Mathlets/

The integrator (a small but effective integration program)

http://www.integrals.com/index.en.cgi

The MathServ Calculus toolkit (a collection of Math applets for calculus computation)

http://www.math.vanderbilt.edu/~pscrooke/toolkit.html

Modelowanie reczwistości (a nice Polish page with a program collection and many further links) http://

www.wiw.pl/modelowanie/

Maple homepage. http://www.maplesoft.com/

Mathematica homepage (Wofram research) http://www.wri.com/

Mathworks homepage (Matlab) http://www.mathworks.com/

Mathtype (Office build in tool for mathematics writing) http://www.mathtype.com/en/products/mathtype/

Page 62: Modelling Biologyulrichw/MathStat/MathIII.pdf · Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part I: Mathematics Script C Introductory

62 Models in Biology

8. Important internet pages A very good elemtary math page for pupils and students: http://www.mathe-online.at/mathint.html

The best introduction to matrix algebra: http://numericalmethods.eng.usf.edu/matrixalgebrabook/downloadma/

matrixalgebra.pdf

Many links contains: http://archives.math.utk.edu/topics/linearAlgebra.html

Many good examples and a concise introduction At: http://people.hofstra.edu/faculty/Stefan_waner/RealWorld/

index.html

Matrix (a very good matrix algebra add in for excel) http://digilander.libero.it/foxes/index.htm

Markov chains and biology: http://www.statslab.cam.ac.uk/~james/Markov/