this content has been downloaded from iopscience. please ... › chapter › 978-0-7503-1094... ·...

19
This content has been downloaded from IOPscience. Please scroll down to see the full text. Download details: IP Address: 54.39.106.173 This content was downloaded on 04/06/2020 at 13:14 Please note that terms and conditions apply.

Upload: others

Post on 30-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

This content has been downloaded from IOPscience. Please scroll down to see the full text.

Download details:

IP Address: 54.39.106.173

This content was downloaded on 04/06/2020 at 13:14

Please note that terms and conditions apply.

Page 2: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

Evolutionary DynamicsThe mathematics of genes and traits

Page 3: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle
Page 4: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

Evolutionary DynamicsThe mathematics of genes and traits

Hugo van den Berg

Warwick University, UK

IOP Publishing, Bristol, UK

Page 5: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

ª IOP Publishing Ltd 2015

All rights reserved. No part of this publication may be reproduced, stored in a retrieval systemor transmitted in any form or by any means, electronic, mechanical, photocopying, recordingor otherwise, without the prior permission of the publisher, or as expressly permitted by law orunder terms agreed with the appropriate rights organization. Multiple copying is permitted inaccordance with the terms of licences issued by the Copyright Licensing Agency, the CopyrightClearance Centre and other reproduction rights organisations.

Permission to make use of IOP Publishing content other than as set out above may be soughtat [email protected].

Hugo van den Berg has asserted his right to be identified as the author of this work in accordancewith sections 77 and 78 of the Copyright, Designs and Patents Act 1988.

ISBN 978-0-7503-1094-9 (ebook)ISBN 978-0-7503-1095-6 (print)ISBN 978-0-7503-1125-0 (mobi)

DOI 10.1088/978-0-7503-1094-9

Version: 20150701

IOP Expanding PhysicsISSN 2053-2563 (online)ISSN 2054-7315 (print)

British Library Cataloguing-in-Publication Data: A catalogue record for this book is availablefrom the British Library.

Published by IOP Publishing, wholly owned by The Institute of Physics, London

IOP Publishing, Temple Circus, Temple Way, Bristol, BS1 6HG, UK

US Office: IOP Publishing, Inc., 190 North Independence Mall West, Suite 601, Philadelphia,PA 19106, USA

Page 6: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle
Page 7: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle
Page 8: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

Contents

Preface xiii

Acknowledgements xviii

1 Introduction 1-1

1.1 Of snails and snakes 1-4

1.2 The three key elements 1-8

1.2.1 Heritability 1-8

1.2.2 Selective pressure 1-12

1.2.3 Variation 1-13

1.3 Stochasticity 1-15

1.4 Towards a mathematics of evolution 1-17

1.4.1 A top-down concept of fitness 1-18

1.4.2 Trait fitness 1-21

1.5 Organization of this book 1-24

Suggested courses 1-24

Bibliography 1-25

Further reading 1-25

2 Cell biology and molecular genetics 2-1

2.1 Cellular architecture and proliferation 2-1

2.1.1 Genetics and cell division 2-3

2.2 DNA, RNA and proteins 2-9

2.2.1 Transcription and translation 2-9

2.2.2 Coding DNA, non-coding DNA and genes 2-15

2.3 Metabolism 2-18

Further reading 2-22

Exercises 2-22

3 Phylogeny and development 3-1

3.1 Phylogenic trees 3-2

3.1.1 Tree theory 3-3

3.1.2 In-groups and out-groups 3-4

3.1.3 Classification 3-5

3.1.4 Rooted and unrooted trees 3-7

3.1.5 Constructing trees 3-8

vii

Page 9: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

3.1.6 The distance matrix 3-10

3.1.7 Constructing an unrooted tree 3-11

3.2 Development 3-14

3.2.1 Developmental pathways in differentiation 3-14

3.2.2 ‘Recapitulation of phylogeny’ versus ‘bottleneck’ 3-14

3.2.3 Modification of modular development 3-17

3.2.4 Patterning and cell fate commitment 3-22

3.2.5 Genetic innovation in evolving development 3-28

Bibliography 3-32

Further reading 3-32

Exercises 3-32

4 Elementary evolutionary dynamics 4-1

4.1 Conceptual challenges and the standard assumption 4-2

4.2 Haploids 4-3

4.2.1 Two genomic variants 4-3

4.2.2 Multiple genomic variants 4-10

4.2.3 Bilinear frequency dependence 4-12

4.3 Diploids 4-19

4.3.1 Two gametotypes 4-19

4.3.2 Multiple gametotypes 4-25

4.4 Projection onto tightly linked clusters of loci 4-27

4.4.1 Decay of linkage 4-27

4.4.2 Averaged fitness: a closure problem 4-29

4.4.3 Dynamics of entire gametotypes 4-31

4.5 Drift and fixation 4-32

4.5.1 A simple model of genetic drift 4-33

4.5.2 Connected demes 4-37

4.5.3 A genetic model of latent neutral variation 4-40

Further reading 4-43

Exercises 4-43

5 Probability and measurement 5-1

5.1 Fundamental laws of probability 5-1

5.2 Random variables and their distributions 5-3

5.2.1 Sampling from a given distribution function 5-5

5.2.2 Mixed distributions 5-5

5.2.3 The survivor 5-6

Evolutionary Dynamics

viii

Page 10: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

5.3 Expectation and variance 5-7

5.4 Common distributions and their properties 5-11

5.4.1 The exponential and geometric distributions 5-11

5.4.2 Common discrete distributions 5-12

5.4.3 The normal distribution and its ilk 5-15

5.5 Measurement scales 5-19

Further reading 5-21

Exercises 5-21

6 Statistical inference and estimation 6-1

6.1 The essential ideas 6-1

6.1.1 Dinosaur eggs: which species? 6-1

6.1.2 The concept of likelihood 6-3

6.1.3 The P-value 6-4

6.1.4 Dinosaur eggs: are the mean clutch sizes different? 6-5

6.2 Justifying the likelihood ratio principle 6-8

6.2.1 The Neyman–Pearson lemma 6-9

6.2.2 The generalized likelihood ratio principle 6-10

6.3 Linking alleles to traits 6-11

6.3.1 Nominal traits 6-12

6.3.2 Ordinal traits 6-16

6.3.3 Quantitative traits 6-18

6.4 Microarrays: the stepping down procedure 6-27

6.5 Analysis of bivariate data 6-28

6.5.1 The correlation coefficient 6-29

6.5.2 A non-parametric test and a parametric test 6-30

6.5.3 Conditional statistics 6-31

Bibliography 6-32

Further reading 6-32

Exercises 6-32

7 Sequence, structure and function 7-1

7.1 Principles of dynamic programming 7-2

7.2 Sequence phylogenies 7-5

7.2.1 Maximum-likelihood assignment of ancestral states 7-6

7.2.2 Maximum-likelihood tree topology 7-7

Evolutionary Dynamics

ix

Page 11: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

7.3 Sequence alignment 7-8

7.3.1 Alignment patterns 7-9

7.3.2 Scoring: matches and penalties 7-10

7.3.3 The alignment cursor 7-12

7.4 Deep structure 7-14

7.4.1 Hidden Markov chains 7-16

7.4.2 Reconstruction of the state sequence 7-17

7.5 From sequence to function 7-18

7.5.1 Cylinder sets 7-19

7.5.2 Correlation functions 7-20

Bibliography 7-27

Further reading 7-27

Exercises 7-27

8 Analysis of quantitative trait loci 8-1

8.1 Recombinant distributions 8-2

8.1.1 Dynamics of inbreeding 8-2

8.1.2 One locus 8-6

8.1.3 Two loci 8-7

8.1.4 More than two loci 8-14

8.2 Genetic markers and mapping 8-15

8.2.1 The marker framework 8-17

8.2.2 Generalization of the likelihood function 8-18

8.2.3 Marker framework maps 8-20

8.2.4 Applications 8-25

8.2.5 Expression QTL 8-26

8.3 The number of quantitative trait loci 8-28

8.3.1 A statistical quandary 8-29

8.3.2 Justification of the normal distribution 8-29

Bibliography 8-31

Further reading 8-31

Exercises 8-31

9 Evolutionary dynamics of QTL 9-1

9.1 Heritability 9-1

9.1.1 Breeding success, breeding failure 9-2

9.1.2 The rate of evolution 9-6

Evolutionary Dynamics

x

Page 12: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

9.2 Dynamics of the additive genetic component 9-10

9.2.1 The next-generation map 9-10

9.2.2 Transfer kernel models 9-13

9.2.3 Dynamics of fully linked traits 9-17

9.2.4 Dynamics of completely unlinked traits 9-20

9.3 The persistence of sex 9-21

9.3.1 Persistence of polymorphy 9-23

9.3.2 Asexual advantage 9-27

9.3.3 The origins of sex 9-37

Bibliography 9-39

Further reading 9-39

Exercises 9-40

10 Adaptive dynamics and speciation 10-1

10.1 Adaptive dynamics 10-2

10.1.1 Invasion fitness 10-2

10.1.2 Stabilizing selection 10-4

10.1.3 Disruptive selection and speciation 10-7

10.1.4 Protected polymorphisms 10-9

10.2 Fisher’s law for adaptive dynamics 10-10

10.3 Adaptive radiations and mass extinctions 10-14

10.3.1 A simple stochastic model of adaptive radiation 10-15

10.3.2 The quasi-stationary distribution 10-16

10.3.3 Persistence and extinction 10-19

Bibliography 10-23

Further reading 10-23

Exercises 10-23

11 Traits as objects of selection 11-1

11.1 Regimenting traits 11-2

11.1.1 The trouble with traits 11-3

11.1.2 Trait probes 11-7

11.1.3 Fitness for regimented traits 11-12

11.1.4 Specifications of the trait mapping 11-13

11.2 Scope and limitations of the additive genetic model 11-16

11.2.1 Estimation of the additive coefficients 11-18

11.2.2 Higher-order interactions 11-20

11.2.3 The generalized additive genetic model 11-23

Evolutionary Dynamics

xi

Page 13: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

Bibliography 11-26

Further reading 11-26

Exercises 11-26

12 Fitness and optimality 12-1

12.1 Evolution of protandry in butterflies 12-4

12.1.1 Virgins governing reproductive success 12-8

12.2 Evolution of juvenility 12-13

12.3 Evolution of homeostasis 12-17

12.3.1 Control diagrams in engineering and in living systems 12-18

12.3.2 The objective functional 12-21

12.4 Fitness probes 12-25

12.4.1 Construction of the fitness probe 12-26

12.4.2 General application of the fitness probe 12-31

12.4.3 Fitness probes for regimented traits and genotypes 12-36

12.4.4 A variational principle 12-40

Bibliography 12-44

Further reading 12-44

Exercises 12-44

Appendices

A A Species, speciation and systematics A-1

B Dangerous ideas B-1

C Dynamics C-1

D Constrained optimization D-1

E Thermal physics E-1

Evolutionary Dynamics

xii

Page 14: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

Preface

I do not think I shall err; though I may possibly use some superfluous scientific words.Herman Melville 1851 Moby-Dick; or, The Whale

Subject and orientation of the book

Research in the life sciences attracts numerous discipline-hoppers at the post-graduateand post-doctoral levels. What these newcomers bring to the table are analytical andcomputational skills, as well as a certain creative yet disciplined way of thinking,acquired from their original training in physics, mathematics, engineering, computerscience or statistics. The happy reason for this influx is that, as a result of hugetechnological advances in instrumentation, biological systems can now be studied atunprecedented levels of resolution in time and space. The only way that we can hopeto make sense of the complex processes that are observed, not to mention managingthe analysis of petabytes of data, is by working with the theoretical tools and tech-niques that these hoppers bring along. The discipline hoppers join multidisciplinaryprogrammes with monikers such as Integrative biology, Systems biology, Syntheticbiology, and so on, names which emphasize various aspects of the same generalidea: the contribution made by the hoppers allows the rigorous analysis and sharpcharacterization and control of functional relationships across levels of biologicalorganization: from molecules to cells, from cells to whole organisms or even entireecosystems.

My long-standing experience with such programmes has imbued me with thefirm conviction that the biological education of the hoppers tends to be somewhatlopsided: they come to be extremely knowledgeable about the discipline directly

A sampling of the staggering diversity among infusoria, the microorganisms that can be found in a dropof water.

xiii

Page 15: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

concerned with their research topic, but often remain lacking in the broaderperspective that they would have absorbed effortlessly had their first degree been inthe life sciences. This is a consequence of post-graduate or post-doctoral entry, andnot necessarily a failing of the hoppers themselves.

I am thinking in particular of the broader perspective afforded by the evolu-tionary point of view; see the Dobzhansky quotation with which chapter 1 opens,which may well reflect the instincts of most biologists. It is not that the hoppers havenot heard of evolution, it is just that their understanding remains rudimentary. Moreis the pity, for the mechanisms and processes that they are uncovering are just thesort that would slot neatly into a meaner, leaner evolutionary theory, while aninsight into evolution would greatly enrich their particular research. It is, I think, fairto say that evolutionary thinking is never far beneath the surface of any topic in thelife sciences.

A related, pertinent observation is that a cultural divide remains between the lifesciences and the mathematical/physical sciences. For students on interdisciplinarydoctoral training programmes, this culture clash is one of the main hurdles to over-come. The way the textbooks are written, and the way they are meant to be studied, isremarkably distinct on either side of the divide (there are good as well as less goodreasons for this, but discussing these reasons would lead us too far afield). One areawhere this difference manifests itself is in definitions: on the mathematical side, theyare given once, fully and correctly. On the biology side, things can be diffuse, and arenot always as sharply defined as in the physical and mathematical sciences. For thehoppers, this takes a little getting used to. One otherwise excellent textbook onevolution never actually defines fitness, but simply starts using it, leaving it to thereader to cobble together a definition from various descriptive phrases scatteredthroughout the book (some of these snippets being mutually inconsistent). The pointis that this situation is common: the biologist assimilates concepts through a sort ofosmosis, the mathematician through crisp unambiguous definitions.

A belemnite shell that has been transformed into an opal gemstone. Fossils are generally not made from thesame atoms that constituted the living creature; an exchange takes place during the slow process mineralizationin which the three-dimensional imprint left by an organism (or part of an organism) becomes part of the rock.This imprint arises because soft and hard tissues mineralize differently from one another and from the sur-rounding matrix.

Evolutionary Dynamics

xiv

Page 16: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

The aimof the book is to provide an overview of evolutionary thinking for physicistsand mathematicians at the post-graduate level, one that attempts to speak their lan-guage, at least to a greater extent than a ‘native’ biology book might. The underlyingintuition is that a central problem of evolutionary thinking is intimately connected tomany of themultidisciplinary research themes that attract these hoppers. This commonground can be characterized, however loosely, as concerning the link between geneticinformation and the functional properties of an organism, i.e., its phenotype.

In a way, this is the familiar ‘nature versus nurture’ problem, but instead offacing a metaphysical perplexity, the challenge for modern science is the morepedestrian problem of too much information. We know that nature and nurtureconspire to make the finished product (indeed, it has become a platitude to observethis), but the problem is that this comes about via hundreds of thousands ofmolecular interactions which we are in the process of charting meticulously. Thisproblem can be attacked in quite diverse ways, as represented by the disciplines ofbioinformatics (e.g., sequence alignment), biometrics (e.g., quantitative trait locianalysis) and dynamical systems-based modelling (e.g., mathematical physiology).Whereas each of these can be taught and practised perfectly well on their own,there is value in bringing out the ties that bind them together: they illuminate oneanother.

A word of apology: I could have devoted more space to such topics as explicitinclusive fitness calculations based on probabilistic genetic bookkeeping, adaptivedynamics, the Price correlation formulation and Rice’s tensor formulation.No text can be exhaustive, however: a line had to be drawn somewhere. Mychoices, unavoidably subjective, have been guided by the overarching goal ofsketching, for the benefit of mathematicians and physicists, how evolutionarythinking can illuminate and cross-fertilize the multidisciplinary research in whichthey take part.

A phylogeny of the hominins. Reconstructed representatives of the following species are shown (clockwise,from the top): Homo neanderthalensis, H. sapiens, H. ergaster (centre), Sahelanthropus tchadensis,Australopithecus africanus and Paranthropus boisei.

Evolutionary Dynamics

xv

Page 17: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

Expected background of the reader

The material in the course has been taught with success to undergraduate final-yearstudents in physics and mathematics at a UK university. For the reasons set out inthe paragraph above, I hope that biologists at the post-graduate level will also beable to profit from this book. The essential requirements are a familiarity withcalculus, elementary probability and dynamical systems (both discrete time andcontinuous time, i.e., differential equations). With the exception of calculus, this textfurnishes brief introductions to the essentials of these topics, but the development ofthe material given here is probably too terse for most students who have not alreadybeen exposed to it.

The reader would benefit greatly from having followed a module (or semester, inUS parlance) on the application of differential equations to biological systems suchas might be based on van den Berg [1].

In addition, it would be highly desirable to have a working knowledge of biologyand several aspects of physics and chemistry that underpin an essential understandingof how biological systems work. Again, this book surveys the essentials, not only forthe purpose of being self-contained, but also to provide something akin to a checklistaugmented with explanations regarding the importance of the items covered, so thatreaders looking elsewhere for more detailed expositions can form a good idea of theknowledge that is presupposed here.

Sources

I produced all the graphics as well as the calculations or simulations reported in thegraphics. The mathematical graphics were produced inMathematica, the schematicswere drawn using PowerPoint and the line art was produced with traditionalimplements and coloured in using liquid watercolours or PhotoStudio.

In those cases where the results depended heavily on a specific primary publication,such as is the case, for instance, when a particular data set has been used, thatpublication has been cited in full. In those cases where the material has come to bepart of the lore of the subject, attested by its treatment in nearly all standard textbooks,no reference has been given, although the books that I found particularly usefulwhen preparing this text have been recommended as further reading at the end ofeach chapter.

Between these two categories there lies a grey area where I must ask the indulgenceof those authors who may feel their contribution should have been circumscribed andlocated in the literature more precisely. In a few cases where the personal vision of ascholar, rather than any particular source, informed the development of the material Ihave simply mentioned them by name, which in this age of rapid global electronicaccess to the literature is perhaps the fairest way of doing justice to their contributions.

There are several theoretical advances that I believe to be novel within this text(chief among these are the cat’s cradle method of unrooted tree reconstruction,cylinder sets for sequence/function mapping, the concepts of latent neutral variationand pleiotropic locking, the quasi-stationary distribution of species richness in

Evolutionary Dynamics

xvi

Page 18: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

adaptive radiations, the generalized additive linear model, the fitness probe andthe associated variational principle), but it is all too possible that these concepts arealready well-established using different terminology with which I am woefullyunfamiliar, in which case I must beg the forgiveness of both reader and insufficientlylauded predecessors.

A note on the problems

In addition to longer problems at the end of each chapter, which are often the lengthof a typical examination question and are sometimes a starting point for a researchproject, numerous problems have been provided in boxes within the text. The latterclass is intended to help the reader check that he or she has understood the dis-cussion. If some sort of calculation is expected, a symbol depicting a hand holding awriting implement features, whereas if the reader is invited to ponder the questionsin more depth, a symbol derived from Rodin’s thinker is included. It may seem oddthat in many cases I can do no better than to invite readers to think somethingthrough for themselves, but evolutionary theory is one of those fields whereWittgenstein’s droll–desperate bon mot is often apposite: ‘Dieses Buch wird viel-leicht nur der verstehen, der die Gedanken, die darin ausgedrückt sind—oder dochähnliche Gedanken—schon selbst einmal gedacht hat.’

Coventry,March 23, 2015

Bibliography[1] van den Berg H 2011 Mathematical Models of Biological Systems (Oxford: Oxford University

Press)

Evolutionary Dynamics

xvii

Page 19: This content has been downloaded from IOPscience. Please ... › chapter › 978-0-7503-1094... · 6.2.1 The Neyman–Pearson lemma 6-9 6.2.2 The generalized likelihood ratio principle

Acknowledgements

Students, teaching assistants and colleagues at the University of Warwick providedinvaluable feedback on the text, excerpts of which have been in use as lecture notesover a number of years, and on the problems and exercises, which have served ashomework problems and examination questions. I am especially indebted to MartinBoer for introducing me to genomic science and its statistical analysis, and also forhis meticulous feedback on draft versions of this book. I am also particularlygrateful to Andrew Blanks, whose wide-ranging knowledge of genetics and phy-siology allowed me to enrich the material with numerous illuminating case studies.Above all, I would like to express my gratitude to Jolanda van den Berg, who wasthe first reader of every draft and whose kind criticisms helped me to eliminate theworst blunders.

xviii