positional information, positional error, and...

21
INVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis: A Mathematical Framework Ga ˇ sper Tka ˇ cik,* ,1 Julien O. Dubuis, ,Mariela D. Petkova, and Thomas Gregor ,*Institute of Science and Technology Austria, A-3400 Klosterneuburg, Austria, and Joseph Henry Laboratories of Physics and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544 ABSTRACT The concept of positional information is central to our understanding of how cells determine their location in a multi- cellular structure and thereby their developmental fates. Nevertheless, positional information has neither been dened mathematically nor quantied in a principled way. Here we provide an information-theoretic denition in the context of developmental gene ex- pression patterns and examine the features of expression patterns that affect positional information quantitatively. We connect positional information with the concept of positional error and develop tools to directly measure information and error from experimental data. We illustrate our framework for the case of gap gene expression patterns in the early Drosophila embryo and show how information that is distributed among only four genes is sufcient to determine developmental fates with nearly single-cell resolution. Our approach can be generalized to a variety of different model systems; procedures and examples are discussed in detail. C ENTRAL to the formation of multicellular organisms is the ability of cells with identical genetic material to acquire distinct cell fates according to their position in a de- veloping tissue (Lawrence 1992; Kirschner and Gerhart 1997). While many mechanistic details remain unsolved, there is a wide consensus that cells acquire knowledge about their location by measuring local concentrations of various form-generating molecules, called morphogens(Turing 1952; Wolpert 1969). In most cases, these morphogens di- rectly or indirectly control the activity of other genes, often coding for transcription factors, resulting in a regulatory network whose successive layers produce rened spatial pat- terns of gene expression (von Dassow et al. 2000; Tomancak et al. 2007; Fakhouri et al. 2010; Jaeger 2011). The system- atic variation in the concentrations of these morphogens with position denes a chemical coordinate system, used by cells to determine their location (Nüsslein-Volhard 1991; St. Johnston and Nüsslein-Volhard 1992; Grossniklaus et al. 1994). Morphogens are thus said to contain positional in- formation,which is processed by the genetic network, ulti- mately giving rise to cell fate assignments that are very re- producible across the embryos of the same species (Wolpert 2011). The concept of positional information has been widely used as a qualitative descriptor and has had an enormous success in shaping our current understanding of spatial pat- terning in developing organisms (Wolpert 1969; Tickle et al. 1975; French et al. 1976; Driever and Nüsslein-Volhard 1988a,b; Meinhardt 1988; Struhl et al. 1989; Reinitz et al. 1995; Schier and Talbot 2005; Ashe and Briscoe 2006; Jaeger and Reinitz 2006; Bökel and Brand 2013; Witchley et al. 2013). Mathematically, however, positional information has not been rigorously dened. Specic morphological features during early development have been studied in great detail and shown to occur reproducibly across wild-type embryos (Gierer 1991; Gregor et al. 2007a; Gregor et al. 2007b; Okabe- Oho et al. 2009; Dubuis et al. 2013a; Liu et al. 2013), while perturbations to the morphogen system resulted in systematic shifts of these same features (Driever and Nüsslein-Volhard 1988a,b; Struhl et al. 1989; Liu et al. 2013). This established a causalbut not quantitativelink between the positional in- formation encoded in the morphogens and the resulting body plan. In Drosophila, the body plan along the major axis of the future adult y is established by a hierarchical network of inter- acting genes during the rst 3 hr of embryonic development Copyright © 2015 by the Genetics Society of America doi: 10.1534/genetics.114.171850 Manuscript received April 22, 2014; accepted for publication October 27, 2014; published Early Online October 31, 2014. 1 Corresponding author: Institute of Science and Technology Austria, Am Campus 1, A-3400 Klosterneuburg, Austria. E-mail: [email protected] Genetics, Vol. 199, 3959 January 2015 39

Upload: others

Post on 08-Jan-2020

41 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

INVESTIGATION

Positional Information Positional Error andReadout Precision in Morphogenesis A

Mathematical FrameworkGasper Tkacik1 Julien O DubuisdaggerDagger Mariela D Petkovadagger and Thomas GregordaggerDagger

Institute of Science and Technology Austria A-3400 Klosterneuburg Austria and daggerJoseph Henry Laboratoriesof Physics and DaggerLewis Sigler Institute for Integrative Genomics Princeton University Princeton New Jersey 08544

ABSTRACT The concept of positional information is central to our understanding of how cells determine their location in a multi-cellular structure and thereby their developmental fates Nevertheless positional information has neither been defined mathematicallynor quantified in a principled way Here we provide an information-theoretic definition in the context of developmental gene ex-pression patterns and examine the features of expression patterns that affect positional information quantitatively We connectpositional information with the concept of positional error and develop tools to directly measure information and error fromexperimental data We illustrate our framework for the case of gap gene expression patterns in the early Drosophila embryo and showhow information that is distributed among only four genes is sufficient to determine developmental fates with nearly single-cellresolution Our approach can be generalized to a variety of different model systems procedures and examples are discussed in detail

CENTRAL to the formation of multicellular organisms isthe ability of cells with identical genetic material to

acquire distinct cell fates according to their position in a de-veloping tissue (Lawrence 1992 Kirschner and Gerhart1997) While many mechanistic details remain unsolvedthere is a wide consensus that cells acquire knowledge abouttheir location by measuring local concentrations of variousform-generating molecules called ldquomorphogensrdquo (Turing1952 Wolpert 1969) In most cases these morphogens di-rectly or indirectly control the activity of other genes oftencoding for transcription factors resulting in a regulatorynetwork whose successive layers produce refined spatial pat-terns of gene expression (von Dassow et al 2000 Tomancaket al 2007 Fakhouri et al 2010 Jaeger 2011) The system-atic variation in the concentrations of these morphogens withposition defines a chemical coordinate system used by cellsto determine their location (Nuumlsslein-Volhard 1991 StJohnston and Nuumlsslein-Volhard 1992 Grossniklaus et al1994) Morphogens are thus said to contain ldquopositional in-formationrdquo which is processed by the genetic network ulti-

mately giving rise to cell fate assignments that are very re-producible across the embryos of the same species (Wolpert2011)

The concept of positional information has been widelyused as a qualitative descriptor and has had an enormoussuccess in shaping our current understanding of spatial pat-terning in developing organisms (Wolpert 1969 Tickle et al1975 French et al 1976 Driever and Nuumlsslein-Volhard1988ab Meinhardt 1988 Struhl et al 1989 Reinitz et al1995 Schier and Talbot 2005 Ashe and Briscoe 2006 Jaegerand Reinitz 2006 Boumlkel and Brand 2013 Witchley et al2013) Mathematically however positional information hasnot been rigorously defined Specific morphological featuresduring early development have been studied in great detailand shown to occur reproducibly across wild-type embryos(Gierer 1991 Gregor et al 2007a Gregor et al 2007b Okabe-Oho et al 2009 Dubuis et al 2013a Liu et al 2013) whileperturbations to the morphogen system resulted in systematicshifts of these same features (Driever and Nuumlsslein-Volhard1988ab Struhl et al 1989 Liu et al 2013) This establisheda causalmdashbut not quantitativemdashlink between the positional in-formation encoded in the morphogens and the resulting bodyplan

In Drosophila the body plan along the major axis of thefuture adult fly is established by a hierarchical network of inter-acting genes during the first 3 hr of embryonic development

Copyright copy 2015 by the Genetics Society of Americadoi 101534genetics114171850Manuscript received April 22 2014 accepted for publication October 27 2014published Early Online October 31 20141Corresponding author Institute of Science and Technology Austria Am Campus 1A-3400 Klosterneuburg Austria E-mail gaspertkacikistacat

Genetics Vol 199 39ndash59 January 2015 39

(Nuumlsslein-Volhard and Wieschaus 1980 Akam 1987 Ingham1988 Spradling 1993 Papatsenko 2009) The hierarchy iscomposed of three layers long-range protein gradients thatspan the entire long axis of the egg (Driever and Nuumlsslein-Volhard 1988a) gap genes expressed in broad bands (Jaeger2011) and pair-rule genes that are expressed in a regularseven-striped pattern (Lawrence and Johnston 1989) Posi-tional information is provided to the system solely via the firstlayer which is established from maternally supplied andhighly localized messenger RNAs (mRNAs) that act as proteinsources for the maternal gradients (Nuumlsslein-Volhard 1991St Johnston and Nuumlsslein-Volhard 1992 Ferrandon et al1994 Anderson 1998 Little et al 2011) The network usesthese inputs to specify a blueprint for the segments of theadult fly in the form of gene expression patterns that cir-cumferentially span the embryo in the transverse direc-tion to the anteriorndashposterior axis These patterns definedistinct single-nucleus wide segments with nuclei expressingthe downstream genes in unique and distinguishable combi-nations (Gergen et al 1986 Gregor et al 2007a Dubuis et al2013ab)

It is remarkable that such precision can be achieved insuch a short amount of time using only a few handfuls ofgenes Gene expression is subject to intrinsic fluctuationswhich trace back to the randomness associated with regula-tory interactions between molecules present at low absolutecopy numbers (van Kampen 2011 Tsimring 2014) Moreoverthere is random variability not only within but also betweenembryos for instance in the strength of the morphogen sources(Bollenbach et al 2008) These biophysical limitationsmdashegin the number of signaling molecules the time available formorphogen readout and the reproducibility of initial andenvironmental conditionsmdashplace severe constraints on theability of the developmental system to generate reproduc-ible gene expression patterns (Gregor et al 2007a Tkaciket al 2008)

Given these constraints how precisely can gene expressionlevels encode information about position in the embryo Toaddress this question rigorously we need to be able to makequantitative statements about the positional information ofspatial gene expression profiles without presupposing whichfeatures of the profile (eg sharpness of the boundary size ofthe domains position-dependent variability etc) encode theinformation Here we make the case that the relevant mea-sure for positional information is the mutual informationImdasha central information-theoretic quantity (Shannon 1948)mdashbetween expression profiles of the gap genes and positionin the embryo We show how positional information putsmathematical limits on the ability with which cells in thedeveloping Drosophila embryo can infer their position alongthe major axis by simultaneously reading the concentrationsof multiple gap gene proteins This framework allows us (i) toquantitatively determine how many truly distinct and mean-ingful levels of gene expression are generated by the Drosoph-ila gap gene system (ii) to assess how many distinguishablecell fates such a system can support and (iii) to ask how

variability across embryos impedes the ability of the pattern-ing system to transmit positional information

This study builds on two previously published articles InDubuis et al (2013a) the general data acquisition and erroranalysis frameworks were presented with a subset of thedata analyzed here (referred to as data set A below) herewe analyze roughly eight times as many samples to assessthe experimental reproducibility of the results and performtests that would be impossible with the original small sam-ple In Dubuis et al (2013b) data set A was used to estimatepositional information and positional error as outlined ingreater detail here The study aimed at understanding posi-tional information in the Drosophila gap gene network butthe treatment of the conceptual and data analysis frame-works was very cursory In the present methodological arti-cle we provide a full account of both frameworks andinclude a number of previously unreported results relatingto (i) the mathematical connection between positional errorand information (ii) different estimators of informationfrom data (iii) various data normalization techniques (iv) com-bining data from multiple experiments and (v) the validity ofvarious approximation schemes We report on these techniquesin detail and benchmark them on real data to prepare ourapproach for a straightforward generalization to other devel-opmental systems

Results

Theoretical foundations

In this section we establish the information-theoretic frame-work for positional information carried by spatial patterns ofgene expression To develop an intuition we start witha one-dimensional toy example of a single gene which willbe generalized later to a many-gene system We presentscenarios where positional information is stored in differentqualitative features of gene expression patterns To capturethat intuition mathematically we give a precise definition ofpositional information for one and for multiple genesFinally we show how a quantitative formulation of posi-tional information is related to ldquodecodingrdquo ie the ability ofthe nuclei to infer their position in the embryo

Positional information in spatial gene expression profilesLet us consider the simplest possible example where theexpression of a single gene G(x) varies with position x alongthe axis of a one-dimensional embryo We choose units oflength such that x = 0 and x = 1 correspond to the anteriorand posterior poles of the embryo respectively Suppose weare able to quantitatively measure the profile of such a genealong the anteriorndashposterior axis inN embryos labeled withan index m frac14 1 N Such measurements of the light in-tensity profile of fluorescently labeled antibodies againsta particular gene product yield G(m)(x) where G is the quan-titative readout in embryo m of the gene expression levelFrom a collection of embryosmdashafter suitable data processingsteps described latermdashwe can then extract two statistics the

40 G Tkacik et al

position-dependent ldquomean profilerdquo capturing the prototyp-ical gene expression pattern and the position-dependentvariance across embryos which measures the degree ofembryo-to-embryo variability or the reproducibility of the meanprofile We can transform the measurements G(x) into pro-files g(x) with rescaled units such that the mean profile gethxTHORNis normalized to 1 at the maximum and to 0 at the minimumalong x After these steps our description of the system consistsof the mean profile gethxTHORN and the variance in the profile sg

2ethxTHORNHow much can a nucleus learn about its position if it

expresses a gene at level g We compare three idealizedcases where we pick the shape of gethxTHORN by hand and assumefor the start that the variance is constant sg

2ethxTHORN frac14 c Thefirst case is illustrated in Figure 1A where a step-like profilein g splits the embryo into two domains of gene expressionan anterior ldquoonrdquo domain where gethxTHORN frac14 1 for x x0 and anldquooffrdquo domain in the posterior x x0 where gethxTHORN frac14 0 Thisarrangement has an extremely precise indeed infinitelysharp boundary at x0 if we think that the sharpness ofthe boundary is the biologically relevant feature in this sys-tem this arrangement would correspond to an ideal pattern-ing gene But how much information can nuclei extract fromsuch a profile If x0 were 12 the boundary would repro-ducibly split the embryo into two equal domains based onreading out the expression of g the nucleus could decidewhether it is in the anterior or the posterior a binary choicethat is equally likely prior to reading out g As we will seethe positional information needed (and provided by sucha sharp profile) to make a clear two-way choice betweentwo a priori equally likely possibilities = 1 bit

Can a profile of a different shape do better Figure 1Bshows a somewhat more realistic sigmoidal shape that hasa steep but not infinitely sharp transition region If the var-iance is small enough s2

g $ 1 this profile can be more in-formative about the position Nuclei far at the anterior stillhave g 1 (full induction or the on state) while nuclei at theposterior still have g 0 (the off state) But the graded re-sponse in the middle defines new expression levels in g thatare significantly different from both 0 and 1 A nucleus withg 05 will thus ldquoknowrdquo that it is neither in the anterior norin the posterior This system will therefore be able to providemore positional information than the sharp boundary that islimited by 1 bit Clearly this conclusion is valid only insofar asthe variance s2

g is low enough if it gets too big the interme-diate levels of expression in the transition region can no lon-ger be distinguished and we are back to the 1-bit case

In contrast to the infinitely sharp gradient a linear gra-dient as depicted in Figure 1C has already been proposed asefficient in encoding positional information (Wolpert 1969)By extending the argument for sigmoidal shapes above if s2

gis not a function of position the linear gradient is in fact theoptimal choice Consider starting at the anterior and movingtoward the posterior as soon as we move far enough in x thatthe change in gethxTHORN is above sg we have created one moredistinguishable level of expression in g and thus a group ofnuclei that by measuring g can differentiate themselves from

their anteriorly positioned neighbors Finally this reasoninggives us a hint about how to generalize to the case where thevariance s2

g depends on position x What is important is tocount as x covers the range from anterior to posterior howmuch gethxTHORN changes in units of the local variability sg(x)mdashitwill turn out that this is directly related to the mutual infor-mation between g and position

Thus a sharp and reproducible boundary can correspondto a profile that does not encode a lot of positional infor-mation and a linear profile where the boundary is not evenwell defined can encode a high amount of positional in-formation Ultimately whether or not there are intermediatedistinguishable levels of gene expression depends on thevariability in the profile Therefore any measure of positionalinformation must be a function of both gethxTHORN and s2

gethxTHORNDoes the ability of the nuclei to infer their position

automatically improve if they can simultaneously read outthe expression levels of more than one gene Figure 2A showsthe case where two genes g1 and g2 do not provide any moreinformation than each one of them provides separately be-cause they are completely redundant Redundant does notmean equalmdashindeed in Figure 2A the profiles are differentat every xmdashbut they are perfectly correlated (or dependent)knowing the expression level of g1 one knows exactly thelevel of g2 so g2 cannot provide any additional new informationabout the position In general redundancy can help compen-sate for detrimental effects of noise when noise is significantbut this is not the case in the toy example at hand

The situation is completely different if the two profilesare shifted relative to each other by say 25 embryo lengthNote that none of the individual profile properties havechanged however the two genes now partition the embryointo four different segments the posteriormost domain thatis combinatorially encoded by the gene expression patterng1 frac14 g2 frac14 0 the third domain with g1 frac14 0 thinsp g2 frac14 1 the sec-ond domain with g1 frac14 g2 frac14 1 and the anteriormost domain

Figure 1 Positional information encoded by a single gene Shown arethree hypothetical mean gene expression profiles gethxTHORN as a function ofposition x with spatially constant variability sg (shaded area) (A) A stepfunction carries (at most) 1 bit of positional information by perfectlydistinguishing between ldquooffrdquo (not induced posterior) and ldquoonrdquo (fullyinduced anterior) states (B) The boundary of the sigmoidal g(x) functionis now wider but the total amount of encoded positional information canbe 1 bit because the transition region itself is distinguishable from theon and off domains (C) A linear gradient has no well-defined boundarybut nevertheless provides a further increase in informationmdashif sg is lowenoughmdashby being equally sensitive to position at every x

Positional Information and Error 41

with g1 frac14 1 thinsp g2 frac14 0 Upon reading out g1 and g2 a nucleusa priori located in any of the four domains can unambigu-ously decide on a single one of the four possibilities This isequivalent to making two binary decisions and as we willlater show to 2 bits of positional information

Finally the most subtle case is depicted in Figure 2CHere the mean profiles have exactly the same shapes as inFigure 2B What is different however is the correlationstructure of the fluctuations In certain areas of the embryothe two genes are strongly positively correlated while in theothers they are strongly negatively correlated If these areasare overlaid appropriately on top of the domains definedby the mean expression patterns an additional increase inpositional information is possible In the admittedly con-trived but pedagogical example of Figure 2C the mean pro-files and the correlations together define eight distinguishabledomains of expression combinatorially encoded by two genesNuclei having simultaneous access to the concentrations of thetwo genes can compute which of the eight domains they re-side in although it might not be easy to implement such a com-putation in molecular hardware Picking one of eight choicescorresponds to making three binary decisions and thus to 3bits of positional information Note that in this case each geneconsidered in isolation still carries 1 bit as before so that thesystem of two genes carries more information than the sum ofits partsmdashsuch a scheme is called synergistic encoding

In sum we have shown that the mean shapes of theprofiles as well as their variances and correlations can carrypositional information Extrapolating to three or more geneswe see that the number of pairwise correlations increases andin addition higher-order correlation terms start appearing For-mally positional information could be encoded in all of thesefeatures but would become progressively harder to extractusing plausible biological mechanisms Nevertheless a princi-pled and assumption-free measure should combine all statisti-cal structure into a single number a scalar quantity measured

in bits that can ldquocountrdquo the number of distinguishable expres-sion states (and thus positions) as illustrated in the examplesabove

Defining positional information Determining the numberof ldquodistinguishable statesrdquo of gene expression is more com-plicated in real data sets than in our toy models the meanprofiles have complex shapes and their (co)variabilitydepends on position In the ideal case we can measure jointexpression patterns of N genes gi i = 1 N (for exam-ple N = 4 for four gap genes in Drosophila) in a large set ofembryos In such a scenario the position dependence of theexpression levels can be fully described with a conditionalprobability distribution P(gi|x) Concretely for every po-sition x in the embryo we construct an N-dimensional histo-gram of expression levels across all recorded embryoswhich (when normalized) yields the desired P(gi|x) Thisdistribution contains all the information about how expres-sion levels vary across embryos in a position-dependentfashion For instance

giethxTHORN frac14Z

dNgthinsp giPfglgjx

(1)

s2i ethxTHORN frac14

ZdNgethgi2giethxTHORNTHORN

2PethfglgjxTHORN (2)

CijethxTHORN frac14Z

dNggigj2 giethxTHORNgjethxTHORN

$PethfglgjxTHORN (3)

are the mean profile of gene gi the variance across embryos ofgene gi and the covariance between genes gi and gj respec-tively In principle the conditional distribution contains also allhigher-order moments that we can extract by integrating overappropriate sets of variables Realistically we are often limitedin our ability to collect enough samples to construct P(gi|x)by histogram counts especially when considering several

Figure 2 (AndashC) Positional information encoded by twogenes Spatial gene expression profiles are shown at thetop an enumeration of all distinguishable combinations ofgene expression (roman numerals) across the embryo isshown at the bottom (A) Step function profiles of g1 g2each encode 1 bit of information but are perfectly redun-dant jointly encoding only two distinguishable states andthus 1 bit of positional information the same as each genealone (B) The mean profiles of g1 g2 have been displacedremoving the redundancy and bringing the total informa-tion to 2 bits (four distinct states of joint gene expression)(C) If in addition to the mean profile shape the down-stream layer can read out the (correlated) fluctuations ofg1 g2 a further increase in information is possible In thistoy example fluctuations are correlated (+) and anticorre-lated (2) in various spatially separated regions whichallows the embryo to use this information along withthe mean profile shape to distinguish eight distinctregions bringing the total encoded information to 3 bits

42 G Tkacik et al

genes simultaneously the number of samples needed growsexponentially with the number of genes However estimatingthe mean profiles on the left-hand sides of Equations 1ndash3 canoften be achieved from data directly A reasonable first step(but one that has to be independently verified) is to assumethat the joint distribution P(gi|x) of N expression levels giat a given position x is Gaussian which can be constructedusing the measured mean values and covariances

P ethfgigjxTHORN

frac14 eth2pTHORN2N=2jCethxTHORNj21=2

3 exp

2

4212

XN

i jfrac141ethgi 2 giethxTHORNTHORN

C21ethxTHORN

ampij

gj2 gjethxTHORN

$3

5

(4)

We emphasize that the Gaussian approximation is not re-quired to theoretically define positional information but itwill turn out to be practical when working with experimen-tal data in the cases of one or two genes it is often possibleto proceed without making this approximation which pro-vides a convenient check for its validity

While the conditional distribution P(gi|x) capturesthe behavior of gene expression levels at a given x establish-ing how much information in total the expression levelscarry about position requires us to know also how frequentlyeach combination of gene expression levels gi is usedacross all positions Recall for instance that our argumentsrelated to the information encoded in patterns in Figure 2rested on counting how often a pair of genes will be found inexpression states 00 01 10 and 11 across all x This globalstructure is encoded in the total distribution of expressionlevels which can be obtained by averaging the conditionaldistribution over all positions

PgethfgigTHORN frac14DPfgigjx

E

xfrac14

Z 1

0dx thinsp PethfgigjxTHORN (5)

where hampix denotes averaging over x Note that we can thinkof Equation 5 as a special case of averaging with a position-dependent weight

PgethfgigTHORN frac14Z

dx thinsp PxethxTHORNPethfgigjxTHORN (6)

where Px(x) is chosen to be uniform As we shall see in thecase of Drosophila anteriorndashposterior (AP) patterning Px(x)will be the distribution of possible nuclear locations alongthe AP axis which indeed is very close to uniform

When formulated in the language of probabilities therelationship between the position x and the gene expressionlevels can be seen as a statistical dependency If we knewthis dependency were linear we could measure it using ega linear correlation analysis between x and gi Shannon(1948) has shown that there is an alternative measure oftotal statistical dependence (not just of its linear component)

called the mutual information which is a functional of theprobability distributions Px(x) and P(gi|x) and is defined by

IethxfgigTHORN

frac14Rdx thinsp PxethxTHORN

RdNgthinsp PethfgigjxTHORNlog2

PethfgigjxTHORNPgethfgigTHORN

(7)

This positive quantity measured in bits tells us how muchone can know about the gene expression pattern if oneknows the position x It is not hard to convince oneself thatthe mutual information is symmetric ie I(gi x) = I(x gi) = I(gi x) This is very attractive we do theexperiments by sampling the distribution of expression lev-els given position while the nuclei in a developing embryoimplicitly solve the inverse problemmdashknowing a set of geneexpression levels they need to infer their position A funda-mental result of information theory states that both prob-lems are quantified by the same symmetric quantity theinformation I(gi x) Furthermore mutual information isnot just one out of many possible ways of quantifying thetotal statistical dependency but rather the unique way thatsatisfies a number of basic requirements for example thatinformation from independent sources is additive (Shannon1948 Cover and Thomas 1991)

The definition of mutual information in Equation 7 can berewritten as a difference of two entropies

Iethfgig xTHORN frac14 SPgethfgigTHORN

amp2

DSfrac12PethfgigjxTHORN(

E

x (8)

where S[p(x)] is the standard entropy of the distribution p(x)measured in bits (hence log base 2)

Sfrac12 pethxTHORN( frac14 2Z

dx thinsp pethxTHORNlog2pethxTHORN (9)

Equation 8 provides an alternative interpretation of themutual information I(gi x) which is illustrated in Figure 3In the case of a single gene g the ldquototal entropyrdquo S[Pg(g)]represented on the left measures the range of gene expres-sion available across the whole embryo This total entropyor dynamic range can be written as the sum of two contri-butions One part is due to the systematic modulation of gwith position x and this is the useful part (the ldquosignalrdquo) orthe mutual information I(gi x) The other contribution isthe variability in g that remains even at constant position x thisrepresents pure ldquonoiserdquo that carries no information about posi-tion and is formally measured by the average entropy of theconditional distribution (the noise entropy) hS[P(g|x)]ix Po-sitional information carried by g is thus the difference betweenthe total and noise entropies as expressed in Equation 8

Mutual information is theoretically well founded andis always nonnegative being 0 if and only if there is nostatistical dependence of any kind between the positionand the gene expression level Conversely if there are Ibits of mutual information between the position and theexpression level there are ) 2IethfgigxTHORN distinguishable gene

Positional Information and Error 43

expression patterns that can be generated by movingalong the AP axis from the head at x = 0 to the tail atx = 1 This is precisely the property we require from anysuitable measure of positional information We thereforesuggest that mathematically positional informationshould be defined as the mutual information between ex-pression level and position I(gi x)

Defining positional error Thus far we have discussedpositional information in terms of the statistical dependencyand the number of distinguishable levels of gene expressionalong the position coordinate To present an alternativeinterpretation we start by using the symmetry property ofthe mutual information and rewrite I(gi x) as

Iethfgig xTHORN frac14 Sfrac12Px ethxTHORN(2Sfrac12PethxjfgigTHORN(

(PgethfgigTHORN (10)

ie the difference between the (uniform) distribution overall possible positions of a cell in the embryo and the distri-bution of positions consistent with a given expression levelHere P(x|gi) can be obtained using Bayesrsquo rule from theknown quantities

PethxjfgigTHORN frac14PethfgigjxTHORNPxethxTHORN

PgethfgigTHORN (11)

The total entropy of all positions S[Px(x)] in Equation 10 isindependent of the particular regulatory systemmdashit simply

measures the prior uncertainty about the location of the cells inthe absence of knowing any gene expression level If howeverthe cell has access to the expression levels of a particular set ofgenes this uncertainty is reduced and it is hence possible tolocalize the cell much more precisely the reduction in uncer-tainty is captured by the second term in Equation 10 This formof positional information emphasizes the decoding view that isthat cells can infer their positions by simultaneously readingout protein concentrations of various genes (Figure 3)

Positional information is a single number it is a globalmeasure of the reproducibility in the patterning system Isthere a local quantity that would tell us position by positionhow well cells can read out their gene expression levels andinfer their location Is positional information ldquodistributedequallyrdquo along the AP axis or is it very nonuniform suchthat cells in some regions of the embryo are much betterat reproducibly assuming their roles

The optimal estimator of the true location x of a cell oncewe (or the cell) measure the gene expression levels fgi gis the maximum a posteriori (MAP) estimate xethfgi gTHORN frac14argmaxthinsp Pethxjfgi gTHORN In cases like ours where the prior distri-bution Px(x) is uniform this equals the maximum-likelihood(ML) estimate

xn

gio$

frac14 argmaxthinsp Pn

gio)))x

$ (12)

thus for each expression level readout this ldquodecoding rulerdquogives us the most likely position of the cell x The inset inFigure 3 illustrates the decoding in the case of one gene

How well can this (optimal) rule perform The expectederror of the estimated x is given by s2

xethxTHORN frac14ethx2xTHORN2

(

where brackets denote averaging over Pethxjfgi gTHORN This erroris a function of the gene expression levels however we canalso evaluate it for every x since we know the mean geneexpression profiles giethxTHORN for every x Thus we define a newquantity the positional error sx(x) which measures howwell cells at a true position x are able to estimate their positionbased on the gene expression levels alone This is the localmeasure of positional information that we were aiming for

Independently of how cells actually read out the concen-trations mechanistically it can be shown that sx(x) cannotbe lower than the limit set by the CramerndashRao bound (Coverand Thomas 1991)

sx2ethxTHORN$ 1

IethxTHORN (13)

where IethxTHORN is the Fisher information given by

IethxTHORN frac14 2

2 log PethfgigjxTHORN

x2

+

PethfgigjxTHORN (14)

Despite its name the Fisher information I is not an information-theoretic quantity and unlike the mutual information I theFisher information depends on position Is there a connectionbetween the positional error sx(x) and the mutual infor-mation I(gi x) Below we sketch the derivation following

Figure 3 The mathematics of positional information and positional errorfor one gene The mean profile of gene g (thick solid line) and its vari-ability (shaded envelope) across embryos are shown Nuclei are distrib-uted uniformly along the AP axis ie the prior distribution of nuclearpositions Px(x) is uniform (shown at the bottom) The total distribution ofexpression levels across the embryo Pg(g) is determined by averagingP(g|x) over all positions and is shown on the left The positional informationI(g x) is related to these two distributions as in Equation 8 Inset showsdecoding (ie estimating) the position of a nucleus from a measuredexpression level g Prior to the ldquomeasurementrdquo all positions are equallylikely After observing the value g the positions consistent with thismeasured values are drawn from P(x|g) The best estimate of the trueposition x is at the peak of this distribution and the positional errorsx(x) is the distributionrsquos width Positional information I(g x) can equallybe computed from Px(x) and P(x|g) by Equation 10

44 G Tkacik et al

Brunel and Nadal (1998) demonstrating the link for the caseof one gene g

Let us assume that the Gaussian approximation of Equa-tion 4 holds and that the distribution of the levels of a singlegene at a given position is

PethgjxTHORN frac14 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ps2

gethxTHORNq exp

(212ethg2gethxTHORNTHORN2

s2gethxTHORN

) (15)

We can use the Gaussian distribution to compute the Fisherinformation in Equation 14 We find that

I frac14 g92ethxTHORNs2gethxTHORN

thorn 2s92gethxTHORNs2gethxTHORN

(16)

where (amp)9 denotes a derivative with respect to position xInformation about position is thus carried by the change inmean profile with position as well as the change in the var-iability itself with position If the noise is small sg $ g thentypically it will be the case that

))s9g)) $ jg9j (a potentially

important issue arises when the profiles and their variabilityare estimated from noisy sampled data in this case oneneeds to be careful about spurious contributions to Fisherinformation due to noise-induced gradients of the profilesand the variability) If we formally require

))s9g)) $ jg9j we

can retain only the first term to obtain a bound on positionalerror

s2xethxTHORN$

1IethxTHORN

-dgdx

22s2gethxTHORN (17)

This result is intuitively straightforward it is simply thetransformation of the variability in gene expression s2

gethxTHORN intoan effective variance in the position estimate s2

x ethxTHORN and thetwo are related by slope of the inputoutput relation gethxTHORN

A crucial next step is to think of x as determining geneexpressions gi probabilistically and the x as being a functionof these gene expression levelsmdashthat is when computing xneither we nor the nuclei have access to the true positionThis forms a dependency chain x gi x Since eachof these steps is probabilistic it can only lose informationsuch that by information-processing inequality (Cover andThomas 1991) we must have I(gi x) $ I(x x) The mu-tual information between the true location and its estimateis given by

Iethx xTHORN frac14 Sfrac12PxethxTHORN(2DSfrac12PethxjxTHORN(

E

PxethxTHORN (18)

Under weak assumptions the first term in our case is ap-proximately the entropy of a uniform distribution While wedo not know the full distribution P(x|x) and thus cannotcompute its entropy directly we know its variance which isjust the square of the positional error s2

xethxTHORN Regardless ofwhat the full distribution is its entropy must be less or equal

to the entropy of the Gaussian distribution of the same var-iance which is Sfrac12PethxjxTHORN( frac14 log2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

bits Putting ev-erything together we find that

Iethfgig xTHORN$ I ethx xTHORN$log2

PxethxTHORNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(19)

Therefore positional information I(gi x) puts an upperbound to the average ability of the cells to infer their loca-tions that is to the smallness of the positional error sx(x)In a straightforward generalization of a single-gene case theFisher information for a multivariate Gaussian distributionwritten for compactness in matrix notation yields

I frac14 g9TC21g9thorn 12TrhC21C9C21C9

i (20)

Under the same assumptions we made for the case of a singlegene we retain only the first term so that the expression forpositional error written out explicitly in the componentnotation reads

s2x ethxTHORN$

1IethxTHORN

0

XN

ijfrac141

dgidx

hCethxTHORN21

i

ij

dgjdx

1

A21

(21)

where Cij is the covariance matrix of the profiles as definedin Equation 3 This extends the fundamental connectionEquation 19 between the positional information and posi-tional error to the case of multiple genes Importantly allquantitiesmdashthe mean profiles and their covariancemdashinEquation 21 can be obtained from experimental data sosx(x) is a quantity that can be estimated directly

Interpretation in a spatially discrete (cellular) system Inthe setup presented above gene expression levels carryinformation about a continuous position variable x But ina real biological system there exists a minimal spatial scalemdashthe scale of individual nuclei (or cells)mdashbelow which theconcept of gene expression at a position is no longer welldefined This is particularly the case in systems that interpretmolecular concentrations to make decisions that determinecell fates such interpretations have no meaning at the spa-tial scale of a molecule but only at cellular scales How shouldpositional information be interpreted in such a context

When noise is small enough and the distribution ofpositions consistent with observed gene expression levelsP(x|gi) of Equation 11 is nearly Gaussian and Equations19 and 21 become tight bounds In this case we can applyEquation 10 directly In developmental systems cells ornuclei are often distributed in space such that the intercel-lular (or internuclear) spacings are to a good approxima-tion equal at least in systems we are studying there areno significant local rarefactions or overabundances of cellsor nuclei Mathematically this amounts to assuming thatPx(x) = 1L (where L is the linear spatial extent over whichthe cells or nuclei are distributed) This assumption of

Positional Information and Error 45

uniformity is not crucial for any calculation in this articleand can be easily relaxed but it makes the equations some-what simpler to display and interpret Using a uniform dis-tribution for Px(x) the information is

Iethfgig xTHORN log2

Lffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(22)

Note that in the problem setup we have normalized thespatial axis so that the length L= 1 in the formula above (orif we restrict the attention to some section of the embryothe fractional size of that section) For any real data set oneneeds to verify that the approximations leading to this resultare warranted (see below) Assuming that Equation 22 fur-ther illustrates the connection between positional error andpositional information Consider a one-dimensional row ofnuclei spaced by internuclear distance d along the AP axisas illustrated in Figure 4 To determine its position along theAP axis each nucleus has to generate an estimate x of itstrue position x by reading out gap gene expression levels Inthe simplest case the errors of these estimates are indepen-dent normally distributed with a mean of zero and a vari-ance s2

x Given these parameters there is some probabilityPerror that the positional estimate deviates by more than thelattice spacing d in which case the nucleus would be assignedto the wrong position and perfectly unique nucleus-by-nucleus identifiability would be impossible Figure 4 showshow the positional information of Equation 22 and the proba-bility of fate misassignment Perror vary as functions of sxImportantly even when sx d that is the positional erroris smaller than the internuclear spacing (as in Figure 4A) thereis still some probability of nuclear misidentification and there-fore the positional information has not yet saturated Onlywhen the positional error is sufficiently small that the proba-bility weight in the tails of the Gaussian distribution is negligi-ble (at sx $ d) can each of the Nn nuclei be perfectlyidentified and the information saturates at log2(Nn) bits

In sum we have shown that a rigorous mathematicalframework of positional information can quantify the repro-ducibility of gene expression profiles in a global manner By

framing the cellsrsquo problem of finding their location in theembryo in terms of an estimation problem we have shownthat the same mathematical framework of positional infor-mation places precise constraints on how well the cells caninfer their positions by reading out a set of genes Theseconstraints are universal regardless of how complex themechanistic details of the cellsrsquo readout of the gene concen-tration levels are the expression level variability preventsthe cells from decreasing the positional error below sx Theconcept of positional error easily generalizes to the case ofmultiple genes and is diagnostic about how positional infor-mation is distributed along the AP axis

Estimation methodology

There are a number of technical details that need to befulfilled when applying the formalisms of positional in-formation and positional error to real data sets Estimatinginformation theoretic quantities from finite data is generallychallenging (Paninski 2003 Slonim et al 2005) In practiceit involves combining general theory and algorithms for in-formation estimation with problem-specific approximationsHere we present the information content of the four majorgap genes in early Drosophila embryos as a test case First webriefly recapitulate our experimental and data-processingmethods (for details see Dubuis et al 2013a) Next wepresent the statistical techniques necessary to consistentlymerge data from separate pairwise gap gene immunostain-ing experiments into a single data set Then we estimatemutual information directly from data for one gene usingdifferent profile normalization and alignment methods Fi-nally we introduce the Gaussian noise approximation andan adaptive Monte Carlo integration scheme to extract theinformation carried jointly by pairs of genes and by the fullset of four genes

Extracting expression level profiles from imaging dataDrosophila embryos were fixed and simultaneously immu-nostained for the four major gap genes hunchback (hb)Kruumlppel (Kr) knirps (kni) and giant (gt) using fluorescentantibodies with minimal spectral overlap as described in

Figure 4 Positional information error and perfect identi-fiability (A) Nuclei separated by distance d decode theirposition from gap gene expression levels (top) The prob-ability P(x|x) for the estimated position x is Gaussian(solid black line for the central nucleus and dashed linefor the right next nucleus) its width is the positionalerror sx The probability Perror that the central nucleus isassigned to the wrong lattice position is equal to the in-tegral of the tails |x| d of the distribution P(x|x) (redarea) (B) Positional information (blue) and probability offalse assignment (red) as a function of positional error sxBlue arrow indicates the value of positional error sx =d used for the toy example in A specifically we usedNn = 59 nuclei uniformly tightly packing the central 80of the AP axis (L = 08 and thus d = LNn) In comparison

the 1 positional error inferred from data in Dubuis et al (2013b) corresponds to the green arrow The information is computed using Equation 22and made to saturate at Imax = log2(Nn) bits (perfect identifiability) All parameters are chosen to roughly match the results of the Drosophila gapgene analysis

46 G Tkacik et al

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 2: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

(Nuumlsslein-Volhard and Wieschaus 1980 Akam 1987 Ingham1988 Spradling 1993 Papatsenko 2009) The hierarchy iscomposed of three layers long-range protein gradients thatspan the entire long axis of the egg (Driever and Nuumlsslein-Volhard 1988a) gap genes expressed in broad bands (Jaeger2011) and pair-rule genes that are expressed in a regularseven-striped pattern (Lawrence and Johnston 1989) Posi-tional information is provided to the system solely via the firstlayer which is established from maternally supplied andhighly localized messenger RNAs (mRNAs) that act as proteinsources for the maternal gradients (Nuumlsslein-Volhard 1991St Johnston and Nuumlsslein-Volhard 1992 Ferrandon et al1994 Anderson 1998 Little et al 2011) The network usesthese inputs to specify a blueprint for the segments of theadult fly in the form of gene expression patterns that cir-cumferentially span the embryo in the transverse direc-tion to the anteriorndashposterior axis These patterns definedistinct single-nucleus wide segments with nuclei expressingthe downstream genes in unique and distinguishable combi-nations (Gergen et al 1986 Gregor et al 2007a Dubuis et al2013ab)

It is remarkable that such precision can be achieved insuch a short amount of time using only a few handfuls ofgenes Gene expression is subject to intrinsic fluctuationswhich trace back to the randomness associated with regula-tory interactions between molecules present at low absolutecopy numbers (van Kampen 2011 Tsimring 2014) Moreoverthere is random variability not only within but also betweenembryos for instance in the strength of the morphogen sources(Bollenbach et al 2008) These biophysical limitationsmdashegin the number of signaling molecules the time available formorphogen readout and the reproducibility of initial andenvironmental conditionsmdashplace severe constraints on theability of the developmental system to generate reproduc-ible gene expression patterns (Gregor et al 2007a Tkaciket al 2008)

Given these constraints how precisely can gene expressionlevels encode information about position in the embryo Toaddress this question rigorously we need to be able to makequantitative statements about the positional information ofspatial gene expression profiles without presupposing whichfeatures of the profile (eg sharpness of the boundary size ofthe domains position-dependent variability etc) encode theinformation Here we make the case that the relevant mea-sure for positional information is the mutual informationImdasha central information-theoretic quantity (Shannon 1948)mdashbetween expression profiles of the gap genes and positionin the embryo We show how positional information putsmathematical limits on the ability with which cells in thedeveloping Drosophila embryo can infer their position alongthe major axis by simultaneously reading the concentrationsof multiple gap gene proteins This framework allows us (i) toquantitatively determine how many truly distinct and mean-ingful levels of gene expression are generated by the Drosoph-ila gap gene system (ii) to assess how many distinguishablecell fates such a system can support and (iii) to ask how

variability across embryos impedes the ability of the pattern-ing system to transmit positional information

This study builds on two previously published articles InDubuis et al (2013a) the general data acquisition and erroranalysis frameworks were presented with a subset of thedata analyzed here (referred to as data set A below) herewe analyze roughly eight times as many samples to assessthe experimental reproducibility of the results and performtests that would be impossible with the original small sam-ple In Dubuis et al (2013b) data set A was used to estimatepositional information and positional error as outlined ingreater detail here The study aimed at understanding posi-tional information in the Drosophila gap gene network butthe treatment of the conceptual and data analysis frame-works was very cursory In the present methodological arti-cle we provide a full account of both frameworks andinclude a number of previously unreported results relatingto (i) the mathematical connection between positional errorand information (ii) different estimators of informationfrom data (iii) various data normalization techniques (iv) com-bining data from multiple experiments and (v) the validity ofvarious approximation schemes We report on these techniquesin detail and benchmark them on real data to prepare ourapproach for a straightforward generalization to other devel-opmental systems

Results

Theoretical foundations

In this section we establish the information-theoretic frame-work for positional information carried by spatial patterns ofgene expression To develop an intuition we start witha one-dimensional toy example of a single gene which willbe generalized later to a many-gene system We presentscenarios where positional information is stored in differentqualitative features of gene expression patterns To capturethat intuition mathematically we give a precise definition ofpositional information for one and for multiple genesFinally we show how a quantitative formulation of posi-tional information is related to ldquodecodingrdquo ie the ability ofthe nuclei to infer their position in the embryo

Positional information in spatial gene expression profilesLet us consider the simplest possible example where theexpression of a single gene G(x) varies with position x alongthe axis of a one-dimensional embryo We choose units oflength such that x = 0 and x = 1 correspond to the anteriorand posterior poles of the embryo respectively Suppose weare able to quantitatively measure the profile of such a genealong the anteriorndashposterior axis inN embryos labeled withan index m frac14 1 N Such measurements of the light in-tensity profile of fluorescently labeled antibodies againsta particular gene product yield G(m)(x) where G is the quan-titative readout in embryo m of the gene expression levelFrom a collection of embryosmdashafter suitable data processingsteps described latermdashwe can then extract two statistics the

40 G Tkacik et al

position-dependent ldquomean profilerdquo capturing the prototyp-ical gene expression pattern and the position-dependentvariance across embryos which measures the degree ofembryo-to-embryo variability or the reproducibility of the meanprofile We can transform the measurements G(x) into pro-files g(x) with rescaled units such that the mean profile gethxTHORNis normalized to 1 at the maximum and to 0 at the minimumalong x After these steps our description of the system consistsof the mean profile gethxTHORN and the variance in the profile sg

2ethxTHORNHow much can a nucleus learn about its position if it

expresses a gene at level g We compare three idealizedcases where we pick the shape of gethxTHORN by hand and assumefor the start that the variance is constant sg

2ethxTHORN frac14 c Thefirst case is illustrated in Figure 1A where a step-like profilein g splits the embryo into two domains of gene expressionan anterior ldquoonrdquo domain where gethxTHORN frac14 1 for x x0 and anldquooffrdquo domain in the posterior x x0 where gethxTHORN frac14 0 Thisarrangement has an extremely precise indeed infinitelysharp boundary at x0 if we think that the sharpness ofthe boundary is the biologically relevant feature in this sys-tem this arrangement would correspond to an ideal pattern-ing gene But how much information can nuclei extract fromsuch a profile If x0 were 12 the boundary would repro-ducibly split the embryo into two equal domains based onreading out the expression of g the nucleus could decidewhether it is in the anterior or the posterior a binary choicethat is equally likely prior to reading out g As we will seethe positional information needed (and provided by sucha sharp profile) to make a clear two-way choice betweentwo a priori equally likely possibilities = 1 bit

Can a profile of a different shape do better Figure 1Bshows a somewhat more realistic sigmoidal shape that hasa steep but not infinitely sharp transition region If the var-iance is small enough s2

g $ 1 this profile can be more in-formative about the position Nuclei far at the anterior stillhave g 1 (full induction or the on state) while nuclei at theposterior still have g 0 (the off state) But the graded re-sponse in the middle defines new expression levels in g thatare significantly different from both 0 and 1 A nucleus withg 05 will thus ldquoknowrdquo that it is neither in the anterior norin the posterior This system will therefore be able to providemore positional information than the sharp boundary that islimited by 1 bit Clearly this conclusion is valid only insofar asthe variance s2

g is low enough if it gets too big the interme-diate levels of expression in the transition region can no lon-ger be distinguished and we are back to the 1-bit case

In contrast to the infinitely sharp gradient a linear gra-dient as depicted in Figure 1C has already been proposed asefficient in encoding positional information (Wolpert 1969)By extending the argument for sigmoidal shapes above if s2

gis not a function of position the linear gradient is in fact theoptimal choice Consider starting at the anterior and movingtoward the posterior as soon as we move far enough in x thatthe change in gethxTHORN is above sg we have created one moredistinguishable level of expression in g and thus a group ofnuclei that by measuring g can differentiate themselves from

their anteriorly positioned neighbors Finally this reasoninggives us a hint about how to generalize to the case where thevariance s2

g depends on position x What is important is tocount as x covers the range from anterior to posterior howmuch gethxTHORN changes in units of the local variability sg(x)mdashitwill turn out that this is directly related to the mutual infor-mation between g and position

Thus a sharp and reproducible boundary can correspondto a profile that does not encode a lot of positional infor-mation and a linear profile where the boundary is not evenwell defined can encode a high amount of positional in-formation Ultimately whether or not there are intermediatedistinguishable levels of gene expression depends on thevariability in the profile Therefore any measure of positionalinformation must be a function of both gethxTHORN and s2

gethxTHORNDoes the ability of the nuclei to infer their position

automatically improve if they can simultaneously read outthe expression levels of more than one gene Figure 2A showsthe case where two genes g1 and g2 do not provide any moreinformation than each one of them provides separately be-cause they are completely redundant Redundant does notmean equalmdashindeed in Figure 2A the profiles are differentat every xmdashbut they are perfectly correlated (or dependent)knowing the expression level of g1 one knows exactly thelevel of g2 so g2 cannot provide any additional new informationabout the position In general redundancy can help compen-sate for detrimental effects of noise when noise is significantbut this is not the case in the toy example at hand

The situation is completely different if the two profilesare shifted relative to each other by say 25 embryo lengthNote that none of the individual profile properties havechanged however the two genes now partition the embryointo four different segments the posteriormost domain thatis combinatorially encoded by the gene expression patterng1 frac14 g2 frac14 0 the third domain with g1 frac14 0 thinsp g2 frac14 1 the sec-ond domain with g1 frac14 g2 frac14 1 and the anteriormost domain

Figure 1 Positional information encoded by a single gene Shown arethree hypothetical mean gene expression profiles gethxTHORN as a function ofposition x with spatially constant variability sg (shaded area) (A) A stepfunction carries (at most) 1 bit of positional information by perfectlydistinguishing between ldquooffrdquo (not induced posterior) and ldquoonrdquo (fullyinduced anterior) states (B) The boundary of the sigmoidal g(x) functionis now wider but the total amount of encoded positional information canbe 1 bit because the transition region itself is distinguishable from theon and off domains (C) A linear gradient has no well-defined boundarybut nevertheless provides a further increase in informationmdashif sg is lowenoughmdashby being equally sensitive to position at every x

Positional Information and Error 41

with g1 frac14 1 thinsp g2 frac14 0 Upon reading out g1 and g2 a nucleusa priori located in any of the four domains can unambigu-ously decide on a single one of the four possibilities This isequivalent to making two binary decisions and as we willlater show to 2 bits of positional information

Finally the most subtle case is depicted in Figure 2CHere the mean profiles have exactly the same shapes as inFigure 2B What is different however is the correlationstructure of the fluctuations In certain areas of the embryothe two genes are strongly positively correlated while in theothers they are strongly negatively correlated If these areasare overlaid appropriately on top of the domains definedby the mean expression patterns an additional increase inpositional information is possible In the admittedly con-trived but pedagogical example of Figure 2C the mean pro-files and the correlations together define eight distinguishabledomains of expression combinatorially encoded by two genesNuclei having simultaneous access to the concentrations of thetwo genes can compute which of the eight domains they re-side in although it might not be easy to implement such a com-putation in molecular hardware Picking one of eight choicescorresponds to making three binary decisions and thus to 3bits of positional information Note that in this case each geneconsidered in isolation still carries 1 bit as before so that thesystem of two genes carries more information than the sum ofits partsmdashsuch a scheme is called synergistic encoding

In sum we have shown that the mean shapes of theprofiles as well as their variances and correlations can carrypositional information Extrapolating to three or more geneswe see that the number of pairwise correlations increases andin addition higher-order correlation terms start appearing For-mally positional information could be encoded in all of thesefeatures but would become progressively harder to extractusing plausible biological mechanisms Nevertheless a princi-pled and assumption-free measure should combine all statisti-cal structure into a single number a scalar quantity measured

in bits that can ldquocountrdquo the number of distinguishable expres-sion states (and thus positions) as illustrated in the examplesabove

Defining positional information Determining the numberof ldquodistinguishable statesrdquo of gene expression is more com-plicated in real data sets than in our toy models the meanprofiles have complex shapes and their (co)variabilitydepends on position In the ideal case we can measure jointexpression patterns of N genes gi i = 1 N (for exam-ple N = 4 for four gap genes in Drosophila) in a large set ofembryos In such a scenario the position dependence of theexpression levels can be fully described with a conditionalprobability distribution P(gi|x) Concretely for every po-sition x in the embryo we construct an N-dimensional histo-gram of expression levels across all recorded embryoswhich (when normalized) yields the desired P(gi|x) Thisdistribution contains all the information about how expres-sion levels vary across embryos in a position-dependentfashion For instance

giethxTHORN frac14Z

dNgthinsp giPfglgjx

(1)

s2i ethxTHORN frac14

ZdNgethgi2giethxTHORNTHORN

2PethfglgjxTHORN (2)

CijethxTHORN frac14Z

dNggigj2 giethxTHORNgjethxTHORN

$PethfglgjxTHORN (3)

are the mean profile of gene gi the variance across embryos ofgene gi and the covariance between genes gi and gj respec-tively In principle the conditional distribution contains also allhigher-order moments that we can extract by integrating overappropriate sets of variables Realistically we are often limitedin our ability to collect enough samples to construct P(gi|x)by histogram counts especially when considering several

Figure 2 (AndashC) Positional information encoded by twogenes Spatial gene expression profiles are shown at thetop an enumeration of all distinguishable combinations ofgene expression (roman numerals) across the embryo isshown at the bottom (A) Step function profiles of g1 g2each encode 1 bit of information but are perfectly redun-dant jointly encoding only two distinguishable states andthus 1 bit of positional information the same as each genealone (B) The mean profiles of g1 g2 have been displacedremoving the redundancy and bringing the total informa-tion to 2 bits (four distinct states of joint gene expression)(C) If in addition to the mean profile shape the down-stream layer can read out the (correlated) fluctuations ofg1 g2 a further increase in information is possible In thistoy example fluctuations are correlated (+) and anticorre-lated (2) in various spatially separated regions whichallows the embryo to use this information along withthe mean profile shape to distinguish eight distinctregions bringing the total encoded information to 3 bits

42 G Tkacik et al

genes simultaneously the number of samples needed growsexponentially with the number of genes However estimatingthe mean profiles on the left-hand sides of Equations 1ndash3 canoften be achieved from data directly A reasonable first step(but one that has to be independently verified) is to assumethat the joint distribution P(gi|x) of N expression levels giat a given position x is Gaussian which can be constructedusing the measured mean values and covariances

P ethfgigjxTHORN

frac14 eth2pTHORN2N=2jCethxTHORNj21=2

3 exp

2

4212

XN

i jfrac141ethgi 2 giethxTHORNTHORN

C21ethxTHORN

ampij

gj2 gjethxTHORN

$3

5

(4)

We emphasize that the Gaussian approximation is not re-quired to theoretically define positional information but itwill turn out to be practical when working with experimen-tal data in the cases of one or two genes it is often possibleto proceed without making this approximation which pro-vides a convenient check for its validity

While the conditional distribution P(gi|x) capturesthe behavior of gene expression levels at a given x establish-ing how much information in total the expression levelscarry about position requires us to know also how frequentlyeach combination of gene expression levels gi is usedacross all positions Recall for instance that our argumentsrelated to the information encoded in patterns in Figure 2rested on counting how often a pair of genes will be found inexpression states 00 01 10 and 11 across all x This globalstructure is encoded in the total distribution of expressionlevels which can be obtained by averaging the conditionaldistribution over all positions

PgethfgigTHORN frac14DPfgigjx

E

xfrac14

Z 1

0dx thinsp PethfgigjxTHORN (5)

where hampix denotes averaging over x Note that we can thinkof Equation 5 as a special case of averaging with a position-dependent weight

PgethfgigTHORN frac14Z

dx thinsp PxethxTHORNPethfgigjxTHORN (6)

where Px(x) is chosen to be uniform As we shall see in thecase of Drosophila anteriorndashposterior (AP) patterning Px(x)will be the distribution of possible nuclear locations alongthe AP axis which indeed is very close to uniform

When formulated in the language of probabilities therelationship between the position x and the gene expressionlevels can be seen as a statistical dependency If we knewthis dependency were linear we could measure it using ega linear correlation analysis between x and gi Shannon(1948) has shown that there is an alternative measure oftotal statistical dependence (not just of its linear component)

called the mutual information which is a functional of theprobability distributions Px(x) and P(gi|x) and is defined by

IethxfgigTHORN

frac14Rdx thinsp PxethxTHORN

RdNgthinsp PethfgigjxTHORNlog2

PethfgigjxTHORNPgethfgigTHORN

(7)

This positive quantity measured in bits tells us how muchone can know about the gene expression pattern if oneknows the position x It is not hard to convince oneself thatthe mutual information is symmetric ie I(gi x) = I(x gi) = I(gi x) This is very attractive we do theexperiments by sampling the distribution of expression lev-els given position while the nuclei in a developing embryoimplicitly solve the inverse problemmdashknowing a set of geneexpression levels they need to infer their position A funda-mental result of information theory states that both prob-lems are quantified by the same symmetric quantity theinformation I(gi x) Furthermore mutual information isnot just one out of many possible ways of quantifying thetotal statistical dependency but rather the unique way thatsatisfies a number of basic requirements for example thatinformation from independent sources is additive (Shannon1948 Cover and Thomas 1991)

The definition of mutual information in Equation 7 can berewritten as a difference of two entropies

Iethfgig xTHORN frac14 SPgethfgigTHORN

amp2

DSfrac12PethfgigjxTHORN(

E

x (8)

where S[p(x)] is the standard entropy of the distribution p(x)measured in bits (hence log base 2)

Sfrac12 pethxTHORN( frac14 2Z

dx thinsp pethxTHORNlog2pethxTHORN (9)

Equation 8 provides an alternative interpretation of themutual information I(gi x) which is illustrated in Figure 3In the case of a single gene g the ldquototal entropyrdquo S[Pg(g)]represented on the left measures the range of gene expres-sion available across the whole embryo This total entropyor dynamic range can be written as the sum of two contri-butions One part is due to the systematic modulation of gwith position x and this is the useful part (the ldquosignalrdquo) orthe mutual information I(gi x) The other contribution isthe variability in g that remains even at constant position x thisrepresents pure ldquonoiserdquo that carries no information about posi-tion and is formally measured by the average entropy of theconditional distribution (the noise entropy) hS[P(g|x)]ix Po-sitional information carried by g is thus the difference betweenthe total and noise entropies as expressed in Equation 8

Mutual information is theoretically well founded andis always nonnegative being 0 if and only if there is nostatistical dependence of any kind between the positionand the gene expression level Conversely if there are Ibits of mutual information between the position and theexpression level there are ) 2IethfgigxTHORN distinguishable gene

Positional Information and Error 43

expression patterns that can be generated by movingalong the AP axis from the head at x = 0 to the tail atx = 1 This is precisely the property we require from anysuitable measure of positional information We thereforesuggest that mathematically positional informationshould be defined as the mutual information between ex-pression level and position I(gi x)

Defining positional error Thus far we have discussedpositional information in terms of the statistical dependencyand the number of distinguishable levels of gene expressionalong the position coordinate To present an alternativeinterpretation we start by using the symmetry property ofthe mutual information and rewrite I(gi x) as

Iethfgig xTHORN frac14 Sfrac12Px ethxTHORN(2Sfrac12PethxjfgigTHORN(

(PgethfgigTHORN (10)

ie the difference between the (uniform) distribution overall possible positions of a cell in the embryo and the distri-bution of positions consistent with a given expression levelHere P(x|gi) can be obtained using Bayesrsquo rule from theknown quantities

PethxjfgigTHORN frac14PethfgigjxTHORNPxethxTHORN

PgethfgigTHORN (11)

The total entropy of all positions S[Px(x)] in Equation 10 isindependent of the particular regulatory systemmdashit simply

measures the prior uncertainty about the location of the cells inthe absence of knowing any gene expression level If howeverthe cell has access to the expression levels of a particular set ofgenes this uncertainty is reduced and it is hence possible tolocalize the cell much more precisely the reduction in uncer-tainty is captured by the second term in Equation 10 This formof positional information emphasizes the decoding view that isthat cells can infer their positions by simultaneously readingout protein concentrations of various genes (Figure 3)

Positional information is a single number it is a globalmeasure of the reproducibility in the patterning system Isthere a local quantity that would tell us position by positionhow well cells can read out their gene expression levels andinfer their location Is positional information ldquodistributedequallyrdquo along the AP axis or is it very nonuniform suchthat cells in some regions of the embryo are much betterat reproducibly assuming their roles

The optimal estimator of the true location x of a cell oncewe (or the cell) measure the gene expression levels fgi gis the maximum a posteriori (MAP) estimate xethfgi gTHORN frac14argmaxthinsp Pethxjfgi gTHORN In cases like ours where the prior distri-bution Px(x) is uniform this equals the maximum-likelihood(ML) estimate

xn

gio$

frac14 argmaxthinsp Pn

gio)))x

$ (12)

thus for each expression level readout this ldquodecoding rulerdquogives us the most likely position of the cell x The inset inFigure 3 illustrates the decoding in the case of one gene

How well can this (optimal) rule perform The expectederror of the estimated x is given by s2

xethxTHORN frac14ethx2xTHORN2

(

where brackets denote averaging over Pethxjfgi gTHORN This erroris a function of the gene expression levels however we canalso evaluate it for every x since we know the mean geneexpression profiles giethxTHORN for every x Thus we define a newquantity the positional error sx(x) which measures howwell cells at a true position x are able to estimate their positionbased on the gene expression levels alone This is the localmeasure of positional information that we were aiming for

Independently of how cells actually read out the concen-trations mechanistically it can be shown that sx(x) cannotbe lower than the limit set by the CramerndashRao bound (Coverand Thomas 1991)

sx2ethxTHORN$ 1

IethxTHORN (13)

where IethxTHORN is the Fisher information given by

IethxTHORN frac14 2

2 log PethfgigjxTHORN

x2

+

PethfgigjxTHORN (14)

Despite its name the Fisher information I is not an information-theoretic quantity and unlike the mutual information I theFisher information depends on position Is there a connectionbetween the positional error sx(x) and the mutual infor-mation I(gi x) Below we sketch the derivation following

Figure 3 The mathematics of positional information and positional errorfor one gene The mean profile of gene g (thick solid line) and its vari-ability (shaded envelope) across embryos are shown Nuclei are distrib-uted uniformly along the AP axis ie the prior distribution of nuclearpositions Px(x) is uniform (shown at the bottom) The total distribution ofexpression levels across the embryo Pg(g) is determined by averagingP(g|x) over all positions and is shown on the left The positional informationI(g x) is related to these two distributions as in Equation 8 Inset showsdecoding (ie estimating) the position of a nucleus from a measuredexpression level g Prior to the ldquomeasurementrdquo all positions are equallylikely After observing the value g the positions consistent with thismeasured values are drawn from P(x|g) The best estimate of the trueposition x is at the peak of this distribution and the positional errorsx(x) is the distributionrsquos width Positional information I(g x) can equallybe computed from Px(x) and P(x|g) by Equation 10

44 G Tkacik et al

Brunel and Nadal (1998) demonstrating the link for the caseof one gene g

Let us assume that the Gaussian approximation of Equa-tion 4 holds and that the distribution of the levels of a singlegene at a given position is

PethgjxTHORN frac14 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ps2

gethxTHORNq exp

(212ethg2gethxTHORNTHORN2

s2gethxTHORN

) (15)

We can use the Gaussian distribution to compute the Fisherinformation in Equation 14 We find that

I frac14 g92ethxTHORNs2gethxTHORN

thorn 2s92gethxTHORNs2gethxTHORN

(16)

where (amp)9 denotes a derivative with respect to position xInformation about position is thus carried by the change inmean profile with position as well as the change in the var-iability itself with position If the noise is small sg $ g thentypically it will be the case that

))s9g)) $ jg9j (a potentially

important issue arises when the profiles and their variabilityare estimated from noisy sampled data in this case oneneeds to be careful about spurious contributions to Fisherinformation due to noise-induced gradients of the profilesand the variability) If we formally require

))s9g)) $ jg9j we

can retain only the first term to obtain a bound on positionalerror

s2xethxTHORN$

1IethxTHORN

-dgdx

22s2gethxTHORN (17)

This result is intuitively straightforward it is simply thetransformation of the variability in gene expression s2

gethxTHORN intoan effective variance in the position estimate s2

x ethxTHORN and thetwo are related by slope of the inputoutput relation gethxTHORN

A crucial next step is to think of x as determining geneexpressions gi probabilistically and the x as being a functionof these gene expression levelsmdashthat is when computing xneither we nor the nuclei have access to the true positionThis forms a dependency chain x gi x Since eachof these steps is probabilistic it can only lose informationsuch that by information-processing inequality (Cover andThomas 1991) we must have I(gi x) $ I(x x) The mu-tual information between the true location and its estimateis given by

Iethx xTHORN frac14 Sfrac12PxethxTHORN(2DSfrac12PethxjxTHORN(

E

PxethxTHORN (18)

Under weak assumptions the first term in our case is ap-proximately the entropy of a uniform distribution While wedo not know the full distribution P(x|x) and thus cannotcompute its entropy directly we know its variance which isjust the square of the positional error s2

xethxTHORN Regardless ofwhat the full distribution is its entropy must be less or equal

to the entropy of the Gaussian distribution of the same var-iance which is Sfrac12PethxjxTHORN( frac14 log2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

bits Putting ev-erything together we find that

Iethfgig xTHORN$ I ethx xTHORN$log2

PxethxTHORNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(19)

Therefore positional information I(gi x) puts an upperbound to the average ability of the cells to infer their loca-tions that is to the smallness of the positional error sx(x)In a straightforward generalization of a single-gene case theFisher information for a multivariate Gaussian distributionwritten for compactness in matrix notation yields

I frac14 g9TC21g9thorn 12TrhC21C9C21C9

i (20)

Under the same assumptions we made for the case of a singlegene we retain only the first term so that the expression forpositional error written out explicitly in the componentnotation reads

s2x ethxTHORN$

1IethxTHORN

0

XN

ijfrac141

dgidx

hCethxTHORN21

i

ij

dgjdx

1

A21

(21)

where Cij is the covariance matrix of the profiles as definedin Equation 3 This extends the fundamental connectionEquation 19 between the positional information and posi-tional error to the case of multiple genes Importantly allquantitiesmdashthe mean profiles and their covariancemdashinEquation 21 can be obtained from experimental data sosx(x) is a quantity that can be estimated directly

Interpretation in a spatially discrete (cellular) system Inthe setup presented above gene expression levels carryinformation about a continuous position variable x But ina real biological system there exists a minimal spatial scalemdashthe scale of individual nuclei (or cells)mdashbelow which theconcept of gene expression at a position is no longer welldefined This is particularly the case in systems that interpretmolecular concentrations to make decisions that determinecell fates such interpretations have no meaning at the spa-tial scale of a molecule but only at cellular scales How shouldpositional information be interpreted in such a context

When noise is small enough and the distribution ofpositions consistent with observed gene expression levelsP(x|gi) of Equation 11 is nearly Gaussian and Equations19 and 21 become tight bounds In this case we can applyEquation 10 directly In developmental systems cells ornuclei are often distributed in space such that the intercel-lular (or internuclear) spacings are to a good approxima-tion equal at least in systems we are studying there areno significant local rarefactions or overabundances of cellsor nuclei Mathematically this amounts to assuming thatPx(x) = 1L (where L is the linear spatial extent over whichthe cells or nuclei are distributed) This assumption of

Positional Information and Error 45

uniformity is not crucial for any calculation in this articleand can be easily relaxed but it makes the equations some-what simpler to display and interpret Using a uniform dis-tribution for Px(x) the information is

Iethfgig xTHORN log2

Lffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(22)

Note that in the problem setup we have normalized thespatial axis so that the length L= 1 in the formula above (orif we restrict the attention to some section of the embryothe fractional size of that section) For any real data set oneneeds to verify that the approximations leading to this resultare warranted (see below) Assuming that Equation 22 fur-ther illustrates the connection between positional error andpositional information Consider a one-dimensional row ofnuclei spaced by internuclear distance d along the AP axisas illustrated in Figure 4 To determine its position along theAP axis each nucleus has to generate an estimate x of itstrue position x by reading out gap gene expression levels Inthe simplest case the errors of these estimates are indepen-dent normally distributed with a mean of zero and a vari-ance s2

x Given these parameters there is some probabilityPerror that the positional estimate deviates by more than thelattice spacing d in which case the nucleus would be assignedto the wrong position and perfectly unique nucleus-by-nucleus identifiability would be impossible Figure 4 showshow the positional information of Equation 22 and the proba-bility of fate misassignment Perror vary as functions of sxImportantly even when sx d that is the positional erroris smaller than the internuclear spacing (as in Figure 4A) thereis still some probability of nuclear misidentification and there-fore the positional information has not yet saturated Onlywhen the positional error is sufficiently small that the proba-bility weight in the tails of the Gaussian distribution is negligi-ble (at sx $ d) can each of the Nn nuclei be perfectlyidentified and the information saturates at log2(Nn) bits

In sum we have shown that a rigorous mathematicalframework of positional information can quantify the repro-ducibility of gene expression profiles in a global manner By

framing the cellsrsquo problem of finding their location in theembryo in terms of an estimation problem we have shownthat the same mathematical framework of positional infor-mation places precise constraints on how well the cells caninfer their positions by reading out a set of genes Theseconstraints are universal regardless of how complex themechanistic details of the cellsrsquo readout of the gene concen-tration levels are the expression level variability preventsthe cells from decreasing the positional error below sx Theconcept of positional error easily generalizes to the case ofmultiple genes and is diagnostic about how positional infor-mation is distributed along the AP axis

Estimation methodology

There are a number of technical details that need to befulfilled when applying the formalisms of positional in-formation and positional error to real data sets Estimatinginformation theoretic quantities from finite data is generallychallenging (Paninski 2003 Slonim et al 2005) In practiceit involves combining general theory and algorithms for in-formation estimation with problem-specific approximationsHere we present the information content of the four majorgap genes in early Drosophila embryos as a test case First webriefly recapitulate our experimental and data-processingmethods (for details see Dubuis et al 2013a) Next wepresent the statistical techniques necessary to consistentlymerge data from separate pairwise gap gene immunostain-ing experiments into a single data set Then we estimatemutual information directly from data for one gene usingdifferent profile normalization and alignment methods Fi-nally we introduce the Gaussian noise approximation andan adaptive Monte Carlo integration scheme to extract theinformation carried jointly by pairs of genes and by the fullset of four genes

Extracting expression level profiles from imaging dataDrosophila embryos were fixed and simultaneously immu-nostained for the four major gap genes hunchback (hb)Kruumlppel (Kr) knirps (kni) and giant (gt) using fluorescentantibodies with minimal spectral overlap as described in

Figure 4 Positional information error and perfect identi-fiability (A) Nuclei separated by distance d decode theirposition from gap gene expression levels (top) The prob-ability P(x|x) for the estimated position x is Gaussian(solid black line for the central nucleus and dashed linefor the right next nucleus) its width is the positionalerror sx The probability Perror that the central nucleus isassigned to the wrong lattice position is equal to the in-tegral of the tails |x| d of the distribution P(x|x) (redarea) (B) Positional information (blue) and probability offalse assignment (red) as a function of positional error sxBlue arrow indicates the value of positional error sx =d used for the toy example in A specifically we usedNn = 59 nuclei uniformly tightly packing the central 80of the AP axis (L = 08 and thus d = LNn) In comparison

the 1 positional error inferred from data in Dubuis et al (2013b) corresponds to the green arrow The information is computed using Equation 22and made to saturate at Imax = log2(Nn) bits (perfect identifiability) All parameters are chosen to roughly match the results of the Drosophila gapgene analysis

46 G Tkacik et al

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 3: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

position-dependent ldquomean profilerdquo capturing the prototyp-ical gene expression pattern and the position-dependentvariance across embryos which measures the degree ofembryo-to-embryo variability or the reproducibility of the meanprofile We can transform the measurements G(x) into pro-files g(x) with rescaled units such that the mean profile gethxTHORNis normalized to 1 at the maximum and to 0 at the minimumalong x After these steps our description of the system consistsof the mean profile gethxTHORN and the variance in the profile sg

2ethxTHORNHow much can a nucleus learn about its position if it

expresses a gene at level g We compare three idealizedcases where we pick the shape of gethxTHORN by hand and assumefor the start that the variance is constant sg

2ethxTHORN frac14 c Thefirst case is illustrated in Figure 1A where a step-like profilein g splits the embryo into two domains of gene expressionan anterior ldquoonrdquo domain where gethxTHORN frac14 1 for x x0 and anldquooffrdquo domain in the posterior x x0 where gethxTHORN frac14 0 Thisarrangement has an extremely precise indeed infinitelysharp boundary at x0 if we think that the sharpness ofthe boundary is the biologically relevant feature in this sys-tem this arrangement would correspond to an ideal pattern-ing gene But how much information can nuclei extract fromsuch a profile If x0 were 12 the boundary would repro-ducibly split the embryo into two equal domains based onreading out the expression of g the nucleus could decidewhether it is in the anterior or the posterior a binary choicethat is equally likely prior to reading out g As we will seethe positional information needed (and provided by sucha sharp profile) to make a clear two-way choice betweentwo a priori equally likely possibilities = 1 bit

Can a profile of a different shape do better Figure 1Bshows a somewhat more realistic sigmoidal shape that hasa steep but not infinitely sharp transition region If the var-iance is small enough s2

g $ 1 this profile can be more in-formative about the position Nuclei far at the anterior stillhave g 1 (full induction or the on state) while nuclei at theposterior still have g 0 (the off state) But the graded re-sponse in the middle defines new expression levels in g thatare significantly different from both 0 and 1 A nucleus withg 05 will thus ldquoknowrdquo that it is neither in the anterior norin the posterior This system will therefore be able to providemore positional information than the sharp boundary that islimited by 1 bit Clearly this conclusion is valid only insofar asthe variance s2

g is low enough if it gets too big the interme-diate levels of expression in the transition region can no lon-ger be distinguished and we are back to the 1-bit case

In contrast to the infinitely sharp gradient a linear gra-dient as depicted in Figure 1C has already been proposed asefficient in encoding positional information (Wolpert 1969)By extending the argument for sigmoidal shapes above if s2

gis not a function of position the linear gradient is in fact theoptimal choice Consider starting at the anterior and movingtoward the posterior as soon as we move far enough in x thatthe change in gethxTHORN is above sg we have created one moredistinguishable level of expression in g and thus a group ofnuclei that by measuring g can differentiate themselves from

their anteriorly positioned neighbors Finally this reasoninggives us a hint about how to generalize to the case where thevariance s2

g depends on position x What is important is tocount as x covers the range from anterior to posterior howmuch gethxTHORN changes in units of the local variability sg(x)mdashitwill turn out that this is directly related to the mutual infor-mation between g and position

Thus a sharp and reproducible boundary can correspondto a profile that does not encode a lot of positional infor-mation and a linear profile where the boundary is not evenwell defined can encode a high amount of positional in-formation Ultimately whether or not there are intermediatedistinguishable levels of gene expression depends on thevariability in the profile Therefore any measure of positionalinformation must be a function of both gethxTHORN and s2

gethxTHORNDoes the ability of the nuclei to infer their position

automatically improve if they can simultaneously read outthe expression levels of more than one gene Figure 2A showsthe case where two genes g1 and g2 do not provide any moreinformation than each one of them provides separately be-cause they are completely redundant Redundant does notmean equalmdashindeed in Figure 2A the profiles are differentat every xmdashbut they are perfectly correlated (or dependent)knowing the expression level of g1 one knows exactly thelevel of g2 so g2 cannot provide any additional new informationabout the position In general redundancy can help compen-sate for detrimental effects of noise when noise is significantbut this is not the case in the toy example at hand

The situation is completely different if the two profilesare shifted relative to each other by say 25 embryo lengthNote that none of the individual profile properties havechanged however the two genes now partition the embryointo four different segments the posteriormost domain thatis combinatorially encoded by the gene expression patterng1 frac14 g2 frac14 0 the third domain with g1 frac14 0 thinsp g2 frac14 1 the sec-ond domain with g1 frac14 g2 frac14 1 and the anteriormost domain

Figure 1 Positional information encoded by a single gene Shown arethree hypothetical mean gene expression profiles gethxTHORN as a function ofposition x with spatially constant variability sg (shaded area) (A) A stepfunction carries (at most) 1 bit of positional information by perfectlydistinguishing between ldquooffrdquo (not induced posterior) and ldquoonrdquo (fullyinduced anterior) states (B) The boundary of the sigmoidal g(x) functionis now wider but the total amount of encoded positional information canbe 1 bit because the transition region itself is distinguishable from theon and off domains (C) A linear gradient has no well-defined boundarybut nevertheless provides a further increase in informationmdashif sg is lowenoughmdashby being equally sensitive to position at every x

Positional Information and Error 41

with g1 frac14 1 thinsp g2 frac14 0 Upon reading out g1 and g2 a nucleusa priori located in any of the four domains can unambigu-ously decide on a single one of the four possibilities This isequivalent to making two binary decisions and as we willlater show to 2 bits of positional information

Finally the most subtle case is depicted in Figure 2CHere the mean profiles have exactly the same shapes as inFigure 2B What is different however is the correlationstructure of the fluctuations In certain areas of the embryothe two genes are strongly positively correlated while in theothers they are strongly negatively correlated If these areasare overlaid appropriately on top of the domains definedby the mean expression patterns an additional increase inpositional information is possible In the admittedly con-trived but pedagogical example of Figure 2C the mean pro-files and the correlations together define eight distinguishabledomains of expression combinatorially encoded by two genesNuclei having simultaneous access to the concentrations of thetwo genes can compute which of the eight domains they re-side in although it might not be easy to implement such a com-putation in molecular hardware Picking one of eight choicescorresponds to making three binary decisions and thus to 3bits of positional information Note that in this case each geneconsidered in isolation still carries 1 bit as before so that thesystem of two genes carries more information than the sum ofits partsmdashsuch a scheme is called synergistic encoding

In sum we have shown that the mean shapes of theprofiles as well as their variances and correlations can carrypositional information Extrapolating to three or more geneswe see that the number of pairwise correlations increases andin addition higher-order correlation terms start appearing For-mally positional information could be encoded in all of thesefeatures but would become progressively harder to extractusing plausible biological mechanisms Nevertheless a princi-pled and assumption-free measure should combine all statisti-cal structure into a single number a scalar quantity measured

in bits that can ldquocountrdquo the number of distinguishable expres-sion states (and thus positions) as illustrated in the examplesabove

Defining positional information Determining the numberof ldquodistinguishable statesrdquo of gene expression is more com-plicated in real data sets than in our toy models the meanprofiles have complex shapes and their (co)variabilitydepends on position In the ideal case we can measure jointexpression patterns of N genes gi i = 1 N (for exam-ple N = 4 for four gap genes in Drosophila) in a large set ofembryos In such a scenario the position dependence of theexpression levels can be fully described with a conditionalprobability distribution P(gi|x) Concretely for every po-sition x in the embryo we construct an N-dimensional histo-gram of expression levels across all recorded embryoswhich (when normalized) yields the desired P(gi|x) Thisdistribution contains all the information about how expres-sion levels vary across embryos in a position-dependentfashion For instance

giethxTHORN frac14Z

dNgthinsp giPfglgjx

(1)

s2i ethxTHORN frac14

ZdNgethgi2giethxTHORNTHORN

2PethfglgjxTHORN (2)

CijethxTHORN frac14Z

dNggigj2 giethxTHORNgjethxTHORN

$PethfglgjxTHORN (3)

are the mean profile of gene gi the variance across embryos ofgene gi and the covariance between genes gi and gj respec-tively In principle the conditional distribution contains also allhigher-order moments that we can extract by integrating overappropriate sets of variables Realistically we are often limitedin our ability to collect enough samples to construct P(gi|x)by histogram counts especially when considering several

Figure 2 (AndashC) Positional information encoded by twogenes Spatial gene expression profiles are shown at thetop an enumeration of all distinguishable combinations ofgene expression (roman numerals) across the embryo isshown at the bottom (A) Step function profiles of g1 g2each encode 1 bit of information but are perfectly redun-dant jointly encoding only two distinguishable states andthus 1 bit of positional information the same as each genealone (B) The mean profiles of g1 g2 have been displacedremoving the redundancy and bringing the total informa-tion to 2 bits (four distinct states of joint gene expression)(C) If in addition to the mean profile shape the down-stream layer can read out the (correlated) fluctuations ofg1 g2 a further increase in information is possible In thistoy example fluctuations are correlated (+) and anticorre-lated (2) in various spatially separated regions whichallows the embryo to use this information along withthe mean profile shape to distinguish eight distinctregions bringing the total encoded information to 3 bits

42 G Tkacik et al

genes simultaneously the number of samples needed growsexponentially with the number of genes However estimatingthe mean profiles on the left-hand sides of Equations 1ndash3 canoften be achieved from data directly A reasonable first step(but one that has to be independently verified) is to assumethat the joint distribution P(gi|x) of N expression levels giat a given position x is Gaussian which can be constructedusing the measured mean values and covariances

P ethfgigjxTHORN

frac14 eth2pTHORN2N=2jCethxTHORNj21=2

3 exp

2

4212

XN

i jfrac141ethgi 2 giethxTHORNTHORN

C21ethxTHORN

ampij

gj2 gjethxTHORN

$3

5

(4)

We emphasize that the Gaussian approximation is not re-quired to theoretically define positional information but itwill turn out to be practical when working with experimen-tal data in the cases of one or two genes it is often possibleto proceed without making this approximation which pro-vides a convenient check for its validity

While the conditional distribution P(gi|x) capturesthe behavior of gene expression levels at a given x establish-ing how much information in total the expression levelscarry about position requires us to know also how frequentlyeach combination of gene expression levels gi is usedacross all positions Recall for instance that our argumentsrelated to the information encoded in patterns in Figure 2rested on counting how often a pair of genes will be found inexpression states 00 01 10 and 11 across all x This globalstructure is encoded in the total distribution of expressionlevels which can be obtained by averaging the conditionaldistribution over all positions

PgethfgigTHORN frac14DPfgigjx

E

xfrac14

Z 1

0dx thinsp PethfgigjxTHORN (5)

where hampix denotes averaging over x Note that we can thinkof Equation 5 as a special case of averaging with a position-dependent weight

PgethfgigTHORN frac14Z

dx thinsp PxethxTHORNPethfgigjxTHORN (6)

where Px(x) is chosen to be uniform As we shall see in thecase of Drosophila anteriorndashposterior (AP) patterning Px(x)will be the distribution of possible nuclear locations alongthe AP axis which indeed is very close to uniform

When formulated in the language of probabilities therelationship between the position x and the gene expressionlevels can be seen as a statistical dependency If we knewthis dependency were linear we could measure it using ega linear correlation analysis between x and gi Shannon(1948) has shown that there is an alternative measure oftotal statistical dependence (not just of its linear component)

called the mutual information which is a functional of theprobability distributions Px(x) and P(gi|x) and is defined by

IethxfgigTHORN

frac14Rdx thinsp PxethxTHORN

RdNgthinsp PethfgigjxTHORNlog2

PethfgigjxTHORNPgethfgigTHORN

(7)

This positive quantity measured in bits tells us how muchone can know about the gene expression pattern if oneknows the position x It is not hard to convince oneself thatthe mutual information is symmetric ie I(gi x) = I(x gi) = I(gi x) This is very attractive we do theexperiments by sampling the distribution of expression lev-els given position while the nuclei in a developing embryoimplicitly solve the inverse problemmdashknowing a set of geneexpression levels they need to infer their position A funda-mental result of information theory states that both prob-lems are quantified by the same symmetric quantity theinformation I(gi x) Furthermore mutual information isnot just one out of many possible ways of quantifying thetotal statistical dependency but rather the unique way thatsatisfies a number of basic requirements for example thatinformation from independent sources is additive (Shannon1948 Cover and Thomas 1991)

The definition of mutual information in Equation 7 can berewritten as a difference of two entropies

Iethfgig xTHORN frac14 SPgethfgigTHORN

amp2

DSfrac12PethfgigjxTHORN(

E

x (8)

where S[p(x)] is the standard entropy of the distribution p(x)measured in bits (hence log base 2)

Sfrac12 pethxTHORN( frac14 2Z

dx thinsp pethxTHORNlog2pethxTHORN (9)

Equation 8 provides an alternative interpretation of themutual information I(gi x) which is illustrated in Figure 3In the case of a single gene g the ldquototal entropyrdquo S[Pg(g)]represented on the left measures the range of gene expres-sion available across the whole embryo This total entropyor dynamic range can be written as the sum of two contri-butions One part is due to the systematic modulation of gwith position x and this is the useful part (the ldquosignalrdquo) orthe mutual information I(gi x) The other contribution isthe variability in g that remains even at constant position x thisrepresents pure ldquonoiserdquo that carries no information about posi-tion and is formally measured by the average entropy of theconditional distribution (the noise entropy) hS[P(g|x)]ix Po-sitional information carried by g is thus the difference betweenthe total and noise entropies as expressed in Equation 8

Mutual information is theoretically well founded andis always nonnegative being 0 if and only if there is nostatistical dependence of any kind between the positionand the gene expression level Conversely if there are Ibits of mutual information between the position and theexpression level there are ) 2IethfgigxTHORN distinguishable gene

Positional Information and Error 43

expression patterns that can be generated by movingalong the AP axis from the head at x = 0 to the tail atx = 1 This is precisely the property we require from anysuitable measure of positional information We thereforesuggest that mathematically positional informationshould be defined as the mutual information between ex-pression level and position I(gi x)

Defining positional error Thus far we have discussedpositional information in terms of the statistical dependencyand the number of distinguishable levels of gene expressionalong the position coordinate To present an alternativeinterpretation we start by using the symmetry property ofthe mutual information and rewrite I(gi x) as

Iethfgig xTHORN frac14 Sfrac12Px ethxTHORN(2Sfrac12PethxjfgigTHORN(

(PgethfgigTHORN (10)

ie the difference between the (uniform) distribution overall possible positions of a cell in the embryo and the distri-bution of positions consistent with a given expression levelHere P(x|gi) can be obtained using Bayesrsquo rule from theknown quantities

PethxjfgigTHORN frac14PethfgigjxTHORNPxethxTHORN

PgethfgigTHORN (11)

The total entropy of all positions S[Px(x)] in Equation 10 isindependent of the particular regulatory systemmdashit simply

measures the prior uncertainty about the location of the cells inthe absence of knowing any gene expression level If howeverthe cell has access to the expression levels of a particular set ofgenes this uncertainty is reduced and it is hence possible tolocalize the cell much more precisely the reduction in uncer-tainty is captured by the second term in Equation 10 This formof positional information emphasizes the decoding view that isthat cells can infer their positions by simultaneously readingout protein concentrations of various genes (Figure 3)

Positional information is a single number it is a globalmeasure of the reproducibility in the patterning system Isthere a local quantity that would tell us position by positionhow well cells can read out their gene expression levels andinfer their location Is positional information ldquodistributedequallyrdquo along the AP axis or is it very nonuniform suchthat cells in some regions of the embryo are much betterat reproducibly assuming their roles

The optimal estimator of the true location x of a cell oncewe (or the cell) measure the gene expression levels fgi gis the maximum a posteriori (MAP) estimate xethfgi gTHORN frac14argmaxthinsp Pethxjfgi gTHORN In cases like ours where the prior distri-bution Px(x) is uniform this equals the maximum-likelihood(ML) estimate

xn

gio$

frac14 argmaxthinsp Pn

gio)))x

$ (12)

thus for each expression level readout this ldquodecoding rulerdquogives us the most likely position of the cell x The inset inFigure 3 illustrates the decoding in the case of one gene

How well can this (optimal) rule perform The expectederror of the estimated x is given by s2

xethxTHORN frac14ethx2xTHORN2

(

where brackets denote averaging over Pethxjfgi gTHORN This erroris a function of the gene expression levels however we canalso evaluate it for every x since we know the mean geneexpression profiles giethxTHORN for every x Thus we define a newquantity the positional error sx(x) which measures howwell cells at a true position x are able to estimate their positionbased on the gene expression levels alone This is the localmeasure of positional information that we were aiming for

Independently of how cells actually read out the concen-trations mechanistically it can be shown that sx(x) cannotbe lower than the limit set by the CramerndashRao bound (Coverand Thomas 1991)

sx2ethxTHORN$ 1

IethxTHORN (13)

where IethxTHORN is the Fisher information given by

IethxTHORN frac14 2

2 log PethfgigjxTHORN

x2

+

PethfgigjxTHORN (14)

Despite its name the Fisher information I is not an information-theoretic quantity and unlike the mutual information I theFisher information depends on position Is there a connectionbetween the positional error sx(x) and the mutual infor-mation I(gi x) Below we sketch the derivation following

Figure 3 The mathematics of positional information and positional errorfor one gene The mean profile of gene g (thick solid line) and its vari-ability (shaded envelope) across embryos are shown Nuclei are distrib-uted uniformly along the AP axis ie the prior distribution of nuclearpositions Px(x) is uniform (shown at the bottom) The total distribution ofexpression levels across the embryo Pg(g) is determined by averagingP(g|x) over all positions and is shown on the left The positional informationI(g x) is related to these two distributions as in Equation 8 Inset showsdecoding (ie estimating) the position of a nucleus from a measuredexpression level g Prior to the ldquomeasurementrdquo all positions are equallylikely After observing the value g the positions consistent with thismeasured values are drawn from P(x|g) The best estimate of the trueposition x is at the peak of this distribution and the positional errorsx(x) is the distributionrsquos width Positional information I(g x) can equallybe computed from Px(x) and P(x|g) by Equation 10

44 G Tkacik et al

Brunel and Nadal (1998) demonstrating the link for the caseof one gene g

Let us assume that the Gaussian approximation of Equa-tion 4 holds and that the distribution of the levels of a singlegene at a given position is

PethgjxTHORN frac14 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ps2

gethxTHORNq exp

(212ethg2gethxTHORNTHORN2

s2gethxTHORN

) (15)

We can use the Gaussian distribution to compute the Fisherinformation in Equation 14 We find that

I frac14 g92ethxTHORNs2gethxTHORN

thorn 2s92gethxTHORNs2gethxTHORN

(16)

where (amp)9 denotes a derivative with respect to position xInformation about position is thus carried by the change inmean profile with position as well as the change in the var-iability itself with position If the noise is small sg $ g thentypically it will be the case that

))s9g)) $ jg9j (a potentially

important issue arises when the profiles and their variabilityare estimated from noisy sampled data in this case oneneeds to be careful about spurious contributions to Fisherinformation due to noise-induced gradients of the profilesand the variability) If we formally require

))s9g)) $ jg9j we

can retain only the first term to obtain a bound on positionalerror

s2xethxTHORN$

1IethxTHORN

-dgdx

22s2gethxTHORN (17)

This result is intuitively straightforward it is simply thetransformation of the variability in gene expression s2

gethxTHORN intoan effective variance in the position estimate s2

x ethxTHORN and thetwo are related by slope of the inputoutput relation gethxTHORN

A crucial next step is to think of x as determining geneexpressions gi probabilistically and the x as being a functionof these gene expression levelsmdashthat is when computing xneither we nor the nuclei have access to the true positionThis forms a dependency chain x gi x Since eachof these steps is probabilistic it can only lose informationsuch that by information-processing inequality (Cover andThomas 1991) we must have I(gi x) $ I(x x) The mu-tual information between the true location and its estimateis given by

Iethx xTHORN frac14 Sfrac12PxethxTHORN(2DSfrac12PethxjxTHORN(

E

PxethxTHORN (18)

Under weak assumptions the first term in our case is ap-proximately the entropy of a uniform distribution While wedo not know the full distribution P(x|x) and thus cannotcompute its entropy directly we know its variance which isjust the square of the positional error s2

xethxTHORN Regardless ofwhat the full distribution is its entropy must be less or equal

to the entropy of the Gaussian distribution of the same var-iance which is Sfrac12PethxjxTHORN( frac14 log2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

bits Putting ev-erything together we find that

Iethfgig xTHORN$ I ethx xTHORN$log2

PxethxTHORNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(19)

Therefore positional information I(gi x) puts an upperbound to the average ability of the cells to infer their loca-tions that is to the smallness of the positional error sx(x)In a straightforward generalization of a single-gene case theFisher information for a multivariate Gaussian distributionwritten for compactness in matrix notation yields

I frac14 g9TC21g9thorn 12TrhC21C9C21C9

i (20)

Under the same assumptions we made for the case of a singlegene we retain only the first term so that the expression forpositional error written out explicitly in the componentnotation reads

s2x ethxTHORN$

1IethxTHORN

0

XN

ijfrac141

dgidx

hCethxTHORN21

i

ij

dgjdx

1

A21

(21)

where Cij is the covariance matrix of the profiles as definedin Equation 3 This extends the fundamental connectionEquation 19 between the positional information and posi-tional error to the case of multiple genes Importantly allquantitiesmdashthe mean profiles and their covariancemdashinEquation 21 can be obtained from experimental data sosx(x) is a quantity that can be estimated directly

Interpretation in a spatially discrete (cellular) system Inthe setup presented above gene expression levels carryinformation about a continuous position variable x But ina real biological system there exists a minimal spatial scalemdashthe scale of individual nuclei (or cells)mdashbelow which theconcept of gene expression at a position is no longer welldefined This is particularly the case in systems that interpretmolecular concentrations to make decisions that determinecell fates such interpretations have no meaning at the spa-tial scale of a molecule but only at cellular scales How shouldpositional information be interpreted in such a context

When noise is small enough and the distribution ofpositions consistent with observed gene expression levelsP(x|gi) of Equation 11 is nearly Gaussian and Equations19 and 21 become tight bounds In this case we can applyEquation 10 directly In developmental systems cells ornuclei are often distributed in space such that the intercel-lular (or internuclear) spacings are to a good approxima-tion equal at least in systems we are studying there areno significant local rarefactions or overabundances of cellsor nuclei Mathematically this amounts to assuming thatPx(x) = 1L (where L is the linear spatial extent over whichthe cells or nuclei are distributed) This assumption of

Positional Information and Error 45

uniformity is not crucial for any calculation in this articleand can be easily relaxed but it makes the equations some-what simpler to display and interpret Using a uniform dis-tribution for Px(x) the information is

Iethfgig xTHORN log2

Lffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(22)

Note that in the problem setup we have normalized thespatial axis so that the length L= 1 in the formula above (orif we restrict the attention to some section of the embryothe fractional size of that section) For any real data set oneneeds to verify that the approximations leading to this resultare warranted (see below) Assuming that Equation 22 fur-ther illustrates the connection between positional error andpositional information Consider a one-dimensional row ofnuclei spaced by internuclear distance d along the AP axisas illustrated in Figure 4 To determine its position along theAP axis each nucleus has to generate an estimate x of itstrue position x by reading out gap gene expression levels Inthe simplest case the errors of these estimates are indepen-dent normally distributed with a mean of zero and a vari-ance s2

x Given these parameters there is some probabilityPerror that the positional estimate deviates by more than thelattice spacing d in which case the nucleus would be assignedto the wrong position and perfectly unique nucleus-by-nucleus identifiability would be impossible Figure 4 showshow the positional information of Equation 22 and the proba-bility of fate misassignment Perror vary as functions of sxImportantly even when sx d that is the positional erroris smaller than the internuclear spacing (as in Figure 4A) thereis still some probability of nuclear misidentification and there-fore the positional information has not yet saturated Onlywhen the positional error is sufficiently small that the proba-bility weight in the tails of the Gaussian distribution is negligi-ble (at sx $ d) can each of the Nn nuclei be perfectlyidentified and the information saturates at log2(Nn) bits

In sum we have shown that a rigorous mathematicalframework of positional information can quantify the repro-ducibility of gene expression profiles in a global manner By

framing the cellsrsquo problem of finding their location in theembryo in terms of an estimation problem we have shownthat the same mathematical framework of positional infor-mation places precise constraints on how well the cells caninfer their positions by reading out a set of genes Theseconstraints are universal regardless of how complex themechanistic details of the cellsrsquo readout of the gene concen-tration levels are the expression level variability preventsthe cells from decreasing the positional error below sx Theconcept of positional error easily generalizes to the case ofmultiple genes and is diagnostic about how positional infor-mation is distributed along the AP axis

Estimation methodology

There are a number of technical details that need to befulfilled when applying the formalisms of positional in-formation and positional error to real data sets Estimatinginformation theoretic quantities from finite data is generallychallenging (Paninski 2003 Slonim et al 2005) In practiceit involves combining general theory and algorithms for in-formation estimation with problem-specific approximationsHere we present the information content of the four majorgap genes in early Drosophila embryos as a test case First webriefly recapitulate our experimental and data-processingmethods (for details see Dubuis et al 2013a) Next wepresent the statistical techniques necessary to consistentlymerge data from separate pairwise gap gene immunostain-ing experiments into a single data set Then we estimatemutual information directly from data for one gene usingdifferent profile normalization and alignment methods Fi-nally we introduce the Gaussian noise approximation andan adaptive Monte Carlo integration scheme to extract theinformation carried jointly by pairs of genes and by the fullset of four genes

Extracting expression level profiles from imaging dataDrosophila embryos were fixed and simultaneously immu-nostained for the four major gap genes hunchback (hb)Kruumlppel (Kr) knirps (kni) and giant (gt) using fluorescentantibodies with minimal spectral overlap as described in

Figure 4 Positional information error and perfect identi-fiability (A) Nuclei separated by distance d decode theirposition from gap gene expression levels (top) The prob-ability P(x|x) for the estimated position x is Gaussian(solid black line for the central nucleus and dashed linefor the right next nucleus) its width is the positionalerror sx The probability Perror that the central nucleus isassigned to the wrong lattice position is equal to the in-tegral of the tails |x| d of the distribution P(x|x) (redarea) (B) Positional information (blue) and probability offalse assignment (red) as a function of positional error sxBlue arrow indicates the value of positional error sx =d used for the toy example in A specifically we usedNn = 59 nuclei uniformly tightly packing the central 80of the AP axis (L = 08 and thus d = LNn) In comparison

the 1 positional error inferred from data in Dubuis et al (2013b) corresponds to the green arrow The information is computed using Equation 22and made to saturate at Imax = log2(Nn) bits (perfect identifiability) All parameters are chosen to roughly match the results of the Drosophila gapgene analysis

46 G Tkacik et al

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 4: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

with g1 frac14 1 thinsp g2 frac14 0 Upon reading out g1 and g2 a nucleusa priori located in any of the four domains can unambigu-ously decide on a single one of the four possibilities This isequivalent to making two binary decisions and as we willlater show to 2 bits of positional information

Finally the most subtle case is depicted in Figure 2CHere the mean profiles have exactly the same shapes as inFigure 2B What is different however is the correlationstructure of the fluctuations In certain areas of the embryothe two genes are strongly positively correlated while in theothers they are strongly negatively correlated If these areasare overlaid appropriately on top of the domains definedby the mean expression patterns an additional increase inpositional information is possible In the admittedly con-trived but pedagogical example of Figure 2C the mean pro-files and the correlations together define eight distinguishabledomains of expression combinatorially encoded by two genesNuclei having simultaneous access to the concentrations of thetwo genes can compute which of the eight domains they re-side in although it might not be easy to implement such a com-putation in molecular hardware Picking one of eight choicescorresponds to making three binary decisions and thus to 3bits of positional information Note that in this case each geneconsidered in isolation still carries 1 bit as before so that thesystem of two genes carries more information than the sum ofits partsmdashsuch a scheme is called synergistic encoding

In sum we have shown that the mean shapes of theprofiles as well as their variances and correlations can carrypositional information Extrapolating to three or more geneswe see that the number of pairwise correlations increases andin addition higher-order correlation terms start appearing For-mally positional information could be encoded in all of thesefeatures but would become progressively harder to extractusing plausible biological mechanisms Nevertheless a princi-pled and assumption-free measure should combine all statisti-cal structure into a single number a scalar quantity measured

in bits that can ldquocountrdquo the number of distinguishable expres-sion states (and thus positions) as illustrated in the examplesabove

Defining positional information Determining the numberof ldquodistinguishable statesrdquo of gene expression is more com-plicated in real data sets than in our toy models the meanprofiles have complex shapes and their (co)variabilitydepends on position In the ideal case we can measure jointexpression patterns of N genes gi i = 1 N (for exam-ple N = 4 for four gap genes in Drosophila) in a large set ofembryos In such a scenario the position dependence of theexpression levels can be fully described with a conditionalprobability distribution P(gi|x) Concretely for every po-sition x in the embryo we construct an N-dimensional histo-gram of expression levels across all recorded embryoswhich (when normalized) yields the desired P(gi|x) Thisdistribution contains all the information about how expres-sion levels vary across embryos in a position-dependentfashion For instance

giethxTHORN frac14Z

dNgthinsp giPfglgjx

(1)

s2i ethxTHORN frac14

ZdNgethgi2giethxTHORNTHORN

2PethfglgjxTHORN (2)

CijethxTHORN frac14Z

dNggigj2 giethxTHORNgjethxTHORN

$PethfglgjxTHORN (3)

are the mean profile of gene gi the variance across embryos ofgene gi and the covariance between genes gi and gj respec-tively In principle the conditional distribution contains also allhigher-order moments that we can extract by integrating overappropriate sets of variables Realistically we are often limitedin our ability to collect enough samples to construct P(gi|x)by histogram counts especially when considering several

Figure 2 (AndashC) Positional information encoded by twogenes Spatial gene expression profiles are shown at thetop an enumeration of all distinguishable combinations ofgene expression (roman numerals) across the embryo isshown at the bottom (A) Step function profiles of g1 g2each encode 1 bit of information but are perfectly redun-dant jointly encoding only two distinguishable states andthus 1 bit of positional information the same as each genealone (B) The mean profiles of g1 g2 have been displacedremoving the redundancy and bringing the total informa-tion to 2 bits (four distinct states of joint gene expression)(C) If in addition to the mean profile shape the down-stream layer can read out the (correlated) fluctuations ofg1 g2 a further increase in information is possible In thistoy example fluctuations are correlated (+) and anticorre-lated (2) in various spatially separated regions whichallows the embryo to use this information along withthe mean profile shape to distinguish eight distinctregions bringing the total encoded information to 3 bits

42 G Tkacik et al

genes simultaneously the number of samples needed growsexponentially with the number of genes However estimatingthe mean profiles on the left-hand sides of Equations 1ndash3 canoften be achieved from data directly A reasonable first step(but one that has to be independently verified) is to assumethat the joint distribution P(gi|x) of N expression levels giat a given position x is Gaussian which can be constructedusing the measured mean values and covariances

P ethfgigjxTHORN

frac14 eth2pTHORN2N=2jCethxTHORNj21=2

3 exp

2

4212

XN

i jfrac141ethgi 2 giethxTHORNTHORN

C21ethxTHORN

ampij

gj2 gjethxTHORN

$3

5

(4)

We emphasize that the Gaussian approximation is not re-quired to theoretically define positional information but itwill turn out to be practical when working with experimen-tal data in the cases of one or two genes it is often possibleto proceed without making this approximation which pro-vides a convenient check for its validity

While the conditional distribution P(gi|x) capturesthe behavior of gene expression levels at a given x establish-ing how much information in total the expression levelscarry about position requires us to know also how frequentlyeach combination of gene expression levels gi is usedacross all positions Recall for instance that our argumentsrelated to the information encoded in patterns in Figure 2rested on counting how often a pair of genes will be found inexpression states 00 01 10 and 11 across all x This globalstructure is encoded in the total distribution of expressionlevels which can be obtained by averaging the conditionaldistribution over all positions

PgethfgigTHORN frac14DPfgigjx

E

xfrac14

Z 1

0dx thinsp PethfgigjxTHORN (5)

where hampix denotes averaging over x Note that we can thinkof Equation 5 as a special case of averaging with a position-dependent weight

PgethfgigTHORN frac14Z

dx thinsp PxethxTHORNPethfgigjxTHORN (6)

where Px(x) is chosen to be uniform As we shall see in thecase of Drosophila anteriorndashposterior (AP) patterning Px(x)will be the distribution of possible nuclear locations alongthe AP axis which indeed is very close to uniform

When formulated in the language of probabilities therelationship between the position x and the gene expressionlevels can be seen as a statistical dependency If we knewthis dependency were linear we could measure it using ega linear correlation analysis between x and gi Shannon(1948) has shown that there is an alternative measure oftotal statistical dependence (not just of its linear component)

called the mutual information which is a functional of theprobability distributions Px(x) and P(gi|x) and is defined by

IethxfgigTHORN

frac14Rdx thinsp PxethxTHORN

RdNgthinsp PethfgigjxTHORNlog2

PethfgigjxTHORNPgethfgigTHORN

(7)

This positive quantity measured in bits tells us how muchone can know about the gene expression pattern if oneknows the position x It is not hard to convince oneself thatthe mutual information is symmetric ie I(gi x) = I(x gi) = I(gi x) This is very attractive we do theexperiments by sampling the distribution of expression lev-els given position while the nuclei in a developing embryoimplicitly solve the inverse problemmdashknowing a set of geneexpression levels they need to infer their position A funda-mental result of information theory states that both prob-lems are quantified by the same symmetric quantity theinformation I(gi x) Furthermore mutual information isnot just one out of many possible ways of quantifying thetotal statistical dependency but rather the unique way thatsatisfies a number of basic requirements for example thatinformation from independent sources is additive (Shannon1948 Cover and Thomas 1991)

The definition of mutual information in Equation 7 can berewritten as a difference of two entropies

Iethfgig xTHORN frac14 SPgethfgigTHORN

amp2

DSfrac12PethfgigjxTHORN(

E

x (8)

where S[p(x)] is the standard entropy of the distribution p(x)measured in bits (hence log base 2)

Sfrac12 pethxTHORN( frac14 2Z

dx thinsp pethxTHORNlog2pethxTHORN (9)

Equation 8 provides an alternative interpretation of themutual information I(gi x) which is illustrated in Figure 3In the case of a single gene g the ldquototal entropyrdquo S[Pg(g)]represented on the left measures the range of gene expres-sion available across the whole embryo This total entropyor dynamic range can be written as the sum of two contri-butions One part is due to the systematic modulation of gwith position x and this is the useful part (the ldquosignalrdquo) orthe mutual information I(gi x) The other contribution isthe variability in g that remains even at constant position x thisrepresents pure ldquonoiserdquo that carries no information about posi-tion and is formally measured by the average entropy of theconditional distribution (the noise entropy) hS[P(g|x)]ix Po-sitional information carried by g is thus the difference betweenthe total and noise entropies as expressed in Equation 8

Mutual information is theoretically well founded andis always nonnegative being 0 if and only if there is nostatistical dependence of any kind between the positionand the gene expression level Conversely if there are Ibits of mutual information between the position and theexpression level there are ) 2IethfgigxTHORN distinguishable gene

Positional Information and Error 43

expression patterns that can be generated by movingalong the AP axis from the head at x = 0 to the tail atx = 1 This is precisely the property we require from anysuitable measure of positional information We thereforesuggest that mathematically positional informationshould be defined as the mutual information between ex-pression level and position I(gi x)

Defining positional error Thus far we have discussedpositional information in terms of the statistical dependencyand the number of distinguishable levels of gene expressionalong the position coordinate To present an alternativeinterpretation we start by using the symmetry property ofthe mutual information and rewrite I(gi x) as

Iethfgig xTHORN frac14 Sfrac12Px ethxTHORN(2Sfrac12PethxjfgigTHORN(

(PgethfgigTHORN (10)

ie the difference between the (uniform) distribution overall possible positions of a cell in the embryo and the distri-bution of positions consistent with a given expression levelHere P(x|gi) can be obtained using Bayesrsquo rule from theknown quantities

PethxjfgigTHORN frac14PethfgigjxTHORNPxethxTHORN

PgethfgigTHORN (11)

The total entropy of all positions S[Px(x)] in Equation 10 isindependent of the particular regulatory systemmdashit simply

measures the prior uncertainty about the location of the cells inthe absence of knowing any gene expression level If howeverthe cell has access to the expression levels of a particular set ofgenes this uncertainty is reduced and it is hence possible tolocalize the cell much more precisely the reduction in uncer-tainty is captured by the second term in Equation 10 This formof positional information emphasizes the decoding view that isthat cells can infer their positions by simultaneously readingout protein concentrations of various genes (Figure 3)

Positional information is a single number it is a globalmeasure of the reproducibility in the patterning system Isthere a local quantity that would tell us position by positionhow well cells can read out their gene expression levels andinfer their location Is positional information ldquodistributedequallyrdquo along the AP axis or is it very nonuniform suchthat cells in some regions of the embryo are much betterat reproducibly assuming their roles

The optimal estimator of the true location x of a cell oncewe (or the cell) measure the gene expression levels fgi gis the maximum a posteriori (MAP) estimate xethfgi gTHORN frac14argmaxthinsp Pethxjfgi gTHORN In cases like ours where the prior distri-bution Px(x) is uniform this equals the maximum-likelihood(ML) estimate

xn

gio$

frac14 argmaxthinsp Pn

gio)))x

$ (12)

thus for each expression level readout this ldquodecoding rulerdquogives us the most likely position of the cell x The inset inFigure 3 illustrates the decoding in the case of one gene

How well can this (optimal) rule perform The expectederror of the estimated x is given by s2

xethxTHORN frac14ethx2xTHORN2

(

where brackets denote averaging over Pethxjfgi gTHORN This erroris a function of the gene expression levels however we canalso evaluate it for every x since we know the mean geneexpression profiles giethxTHORN for every x Thus we define a newquantity the positional error sx(x) which measures howwell cells at a true position x are able to estimate their positionbased on the gene expression levels alone This is the localmeasure of positional information that we were aiming for

Independently of how cells actually read out the concen-trations mechanistically it can be shown that sx(x) cannotbe lower than the limit set by the CramerndashRao bound (Coverand Thomas 1991)

sx2ethxTHORN$ 1

IethxTHORN (13)

where IethxTHORN is the Fisher information given by

IethxTHORN frac14 2

2 log PethfgigjxTHORN

x2

+

PethfgigjxTHORN (14)

Despite its name the Fisher information I is not an information-theoretic quantity and unlike the mutual information I theFisher information depends on position Is there a connectionbetween the positional error sx(x) and the mutual infor-mation I(gi x) Below we sketch the derivation following

Figure 3 The mathematics of positional information and positional errorfor one gene The mean profile of gene g (thick solid line) and its vari-ability (shaded envelope) across embryos are shown Nuclei are distrib-uted uniformly along the AP axis ie the prior distribution of nuclearpositions Px(x) is uniform (shown at the bottom) The total distribution ofexpression levels across the embryo Pg(g) is determined by averagingP(g|x) over all positions and is shown on the left The positional informationI(g x) is related to these two distributions as in Equation 8 Inset showsdecoding (ie estimating) the position of a nucleus from a measuredexpression level g Prior to the ldquomeasurementrdquo all positions are equallylikely After observing the value g the positions consistent with thismeasured values are drawn from P(x|g) The best estimate of the trueposition x is at the peak of this distribution and the positional errorsx(x) is the distributionrsquos width Positional information I(g x) can equallybe computed from Px(x) and P(x|g) by Equation 10

44 G Tkacik et al

Brunel and Nadal (1998) demonstrating the link for the caseof one gene g

Let us assume that the Gaussian approximation of Equa-tion 4 holds and that the distribution of the levels of a singlegene at a given position is

PethgjxTHORN frac14 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ps2

gethxTHORNq exp

(212ethg2gethxTHORNTHORN2

s2gethxTHORN

) (15)

We can use the Gaussian distribution to compute the Fisherinformation in Equation 14 We find that

I frac14 g92ethxTHORNs2gethxTHORN

thorn 2s92gethxTHORNs2gethxTHORN

(16)

where (amp)9 denotes a derivative with respect to position xInformation about position is thus carried by the change inmean profile with position as well as the change in the var-iability itself with position If the noise is small sg $ g thentypically it will be the case that

))s9g)) $ jg9j (a potentially

important issue arises when the profiles and their variabilityare estimated from noisy sampled data in this case oneneeds to be careful about spurious contributions to Fisherinformation due to noise-induced gradients of the profilesand the variability) If we formally require

))s9g)) $ jg9j we

can retain only the first term to obtain a bound on positionalerror

s2xethxTHORN$

1IethxTHORN

-dgdx

22s2gethxTHORN (17)

This result is intuitively straightforward it is simply thetransformation of the variability in gene expression s2

gethxTHORN intoan effective variance in the position estimate s2

x ethxTHORN and thetwo are related by slope of the inputoutput relation gethxTHORN

A crucial next step is to think of x as determining geneexpressions gi probabilistically and the x as being a functionof these gene expression levelsmdashthat is when computing xneither we nor the nuclei have access to the true positionThis forms a dependency chain x gi x Since eachof these steps is probabilistic it can only lose informationsuch that by information-processing inequality (Cover andThomas 1991) we must have I(gi x) $ I(x x) The mu-tual information between the true location and its estimateis given by

Iethx xTHORN frac14 Sfrac12PxethxTHORN(2DSfrac12PethxjxTHORN(

E

PxethxTHORN (18)

Under weak assumptions the first term in our case is ap-proximately the entropy of a uniform distribution While wedo not know the full distribution P(x|x) and thus cannotcompute its entropy directly we know its variance which isjust the square of the positional error s2

xethxTHORN Regardless ofwhat the full distribution is its entropy must be less or equal

to the entropy of the Gaussian distribution of the same var-iance which is Sfrac12PethxjxTHORN( frac14 log2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

bits Putting ev-erything together we find that

Iethfgig xTHORN$ I ethx xTHORN$log2

PxethxTHORNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(19)

Therefore positional information I(gi x) puts an upperbound to the average ability of the cells to infer their loca-tions that is to the smallness of the positional error sx(x)In a straightforward generalization of a single-gene case theFisher information for a multivariate Gaussian distributionwritten for compactness in matrix notation yields

I frac14 g9TC21g9thorn 12TrhC21C9C21C9

i (20)

Under the same assumptions we made for the case of a singlegene we retain only the first term so that the expression forpositional error written out explicitly in the componentnotation reads

s2x ethxTHORN$

1IethxTHORN

0

XN

ijfrac141

dgidx

hCethxTHORN21

i

ij

dgjdx

1

A21

(21)

where Cij is the covariance matrix of the profiles as definedin Equation 3 This extends the fundamental connectionEquation 19 between the positional information and posi-tional error to the case of multiple genes Importantly allquantitiesmdashthe mean profiles and their covariancemdashinEquation 21 can be obtained from experimental data sosx(x) is a quantity that can be estimated directly

Interpretation in a spatially discrete (cellular) system Inthe setup presented above gene expression levels carryinformation about a continuous position variable x But ina real biological system there exists a minimal spatial scalemdashthe scale of individual nuclei (or cells)mdashbelow which theconcept of gene expression at a position is no longer welldefined This is particularly the case in systems that interpretmolecular concentrations to make decisions that determinecell fates such interpretations have no meaning at the spa-tial scale of a molecule but only at cellular scales How shouldpositional information be interpreted in such a context

When noise is small enough and the distribution ofpositions consistent with observed gene expression levelsP(x|gi) of Equation 11 is nearly Gaussian and Equations19 and 21 become tight bounds In this case we can applyEquation 10 directly In developmental systems cells ornuclei are often distributed in space such that the intercel-lular (or internuclear) spacings are to a good approxima-tion equal at least in systems we are studying there areno significant local rarefactions or overabundances of cellsor nuclei Mathematically this amounts to assuming thatPx(x) = 1L (where L is the linear spatial extent over whichthe cells or nuclei are distributed) This assumption of

Positional Information and Error 45

uniformity is not crucial for any calculation in this articleand can be easily relaxed but it makes the equations some-what simpler to display and interpret Using a uniform dis-tribution for Px(x) the information is

Iethfgig xTHORN log2

Lffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(22)

Note that in the problem setup we have normalized thespatial axis so that the length L= 1 in the formula above (orif we restrict the attention to some section of the embryothe fractional size of that section) For any real data set oneneeds to verify that the approximations leading to this resultare warranted (see below) Assuming that Equation 22 fur-ther illustrates the connection between positional error andpositional information Consider a one-dimensional row ofnuclei spaced by internuclear distance d along the AP axisas illustrated in Figure 4 To determine its position along theAP axis each nucleus has to generate an estimate x of itstrue position x by reading out gap gene expression levels Inthe simplest case the errors of these estimates are indepen-dent normally distributed with a mean of zero and a vari-ance s2

x Given these parameters there is some probabilityPerror that the positional estimate deviates by more than thelattice spacing d in which case the nucleus would be assignedto the wrong position and perfectly unique nucleus-by-nucleus identifiability would be impossible Figure 4 showshow the positional information of Equation 22 and the proba-bility of fate misassignment Perror vary as functions of sxImportantly even when sx d that is the positional erroris smaller than the internuclear spacing (as in Figure 4A) thereis still some probability of nuclear misidentification and there-fore the positional information has not yet saturated Onlywhen the positional error is sufficiently small that the proba-bility weight in the tails of the Gaussian distribution is negligi-ble (at sx $ d) can each of the Nn nuclei be perfectlyidentified and the information saturates at log2(Nn) bits

In sum we have shown that a rigorous mathematicalframework of positional information can quantify the repro-ducibility of gene expression profiles in a global manner By

framing the cellsrsquo problem of finding their location in theembryo in terms of an estimation problem we have shownthat the same mathematical framework of positional infor-mation places precise constraints on how well the cells caninfer their positions by reading out a set of genes Theseconstraints are universal regardless of how complex themechanistic details of the cellsrsquo readout of the gene concen-tration levels are the expression level variability preventsthe cells from decreasing the positional error below sx Theconcept of positional error easily generalizes to the case ofmultiple genes and is diagnostic about how positional infor-mation is distributed along the AP axis

Estimation methodology

There are a number of technical details that need to befulfilled when applying the formalisms of positional in-formation and positional error to real data sets Estimatinginformation theoretic quantities from finite data is generallychallenging (Paninski 2003 Slonim et al 2005) In practiceit involves combining general theory and algorithms for in-formation estimation with problem-specific approximationsHere we present the information content of the four majorgap genes in early Drosophila embryos as a test case First webriefly recapitulate our experimental and data-processingmethods (for details see Dubuis et al 2013a) Next wepresent the statistical techniques necessary to consistentlymerge data from separate pairwise gap gene immunostain-ing experiments into a single data set Then we estimatemutual information directly from data for one gene usingdifferent profile normalization and alignment methods Fi-nally we introduce the Gaussian noise approximation andan adaptive Monte Carlo integration scheme to extract theinformation carried jointly by pairs of genes and by the fullset of four genes

Extracting expression level profiles from imaging dataDrosophila embryos were fixed and simultaneously immu-nostained for the four major gap genes hunchback (hb)Kruumlppel (Kr) knirps (kni) and giant (gt) using fluorescentantibodies with minimal spectral overlap as described in

Figure 4 Positional information error and perfect identi-fiability (A) Nuclei separated by distance d decode theirposition from gap gene expression levels (top) The prob-ability P(x|x) for the estimated position x is Gaussian(solid black line for the central nucleus and dashed linefor the right next nucleus) its width is the positionalerror sx The probability Perror that the central nucleus isassigned to the wrong lattice position is equal to the in-tegral of the tails |x| d of the distribution P(x|x) (redarea) (B) Positional information (blue) and probability offalse assignment (red) as a function of positional error sxBlue arrow indicates the value of positional error sx =d used for the toy example in A specifically we usedNn = 59 nuclei uniformly tightly packing the central 80of the AP axis (L = 08 and thus d = LNn) In comparison

the 1 positional error inferred from data in Dubuis et al (2013b) corresponds to the green arrow The information is computed using Equation 22and made to saturate at Imax = log2(Nn) bits (perfect identifiability) All parameters are chosen to roughly match the results of the Drosophila gapgene analysis

46 G Tkacik et al

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 5: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

genes simultaneously the number of samples needed growsexponentially with the number of genes However estimatingthe mean profiles on the left-hand sides of Equations 1ndash3 canoften be achieved from data directly A reasonable first step(but one that has to be independently verified) is to assumethat the joint distribution P(gi|x) of N expression levels giat a given position x is Gaussian which can be constructedusing the measured mean values and covariances

P ethfgigjxTHORN

frac14 eth2pTHORN2N=2jCethxTHORNj21=2

3 exp

2

4212

XN

i jfrac141ethgi 2 giethxTHORNTHORN

C21ethxTHORN

ampij

gj2 gjethxTHORN

$3

5

(4)

We emphasize that the Gaussian approximation is not re-quired to theoretically define positional information but itwill turn out to be practical when working with experimen-tal data in the cases of one or two genes it is often possibleto proceed without making this approximation which pro-vides a convenient check for its validity

While the conditional distribution P(gi|x) capturesthe behavior of gene expression levels at a given x establish-ing how much information in total the expression levelscarry about position requires us to know also how frequentlyeach combination of gene expression levels gi is usedacross all positions Recall for instance that our argumentsrelated to the information encoded in patterns in Figure 2rested on counting how often a pair of genes will be found inexpression states 00 01 10 and 11 across all x This globalstructure is encoded in the total distribution of expressionlevels which can be obtained by averaging the conditionaldistribution over all positions

PgethfgigTHORN frac14DPfgigjx

E

xfrac14

Z 1

0dx thinsp PethfgigjxTHORN (5)

where hampix denotes averaging over x Note that we can thinkof Equation 5 as a special case of averaging with a position-dependent weight

PgethfgigTHORN frac14Z

dx thinsp PxethxTHORNPethfgigjxTHORN (6)

where Px(x) is chosen to be uniform As we shall see in thecase of Drosophila anteriorndashposterior (AP) patterning Px(x)will be the distribution of possible nuclear locations alongthe AP axis which indeed is very close to uniform

When formulated in the language of probabilities therelationship between the position x and the gene expressionlevels can be seen as a statistical dependency If we knewthis dependency were linear we could measure it using ega linear correlation analysis between x and gi Shannon(1948) has shown that there is an alternative measure oftotal statistical dependence (not just of its linear component)

called the mutual information which is a functional of theprobability distributions Px(x) and P(gi|x) and is defined by

IethxfgigTHORN

frac14Rdx thinsp PxethxTHORN

RdNgthinsp PethfgigjxTHORNlog2

PethfgigjxTHORNPgethfgigTHORN

(7)

This positive quantity measured in bits tells us how muchone can know about the gene expression pattern if oneknows the position x It is not hard to convince oneself thatthe mutual information is symmetric ie I(gi x) = I(x gi) = I(gi x) This is very attractive we do theexperiments by sampling the distribution of expression lev-els given position while the nuclei in a developing embryoimplicitly solve the inverse problemmdashknowing a set of geneexpression levels they need to infer their position A funda-mental result of information theory states that both prob-lems are quantified by the same symmetric quantity theinformation I(gi x) Furthermore mutual information isnot just one out of many possible ways of quantifying thetotal statistical dependency but rather the unique way thatsatisfies a number of basic requirements for example thatinformation from independent sources is additive (Shannon1948 Cover and Thomas 1991)

The definition of mutual information in Equation 7 can berewritten as a difference of two entropies

Iethfgig xTHORN frac14 SPgethfgigTHORN

amp2

DSfrac12PethfgigjxTHORN(

E

x (8)

where S[p(x)] is the standard entropy of the distribution p(x)measured in bits (hence log base 2)

Sfrac12 pethxTHORN( frac14 2Z

dx thinsp pethxTHORNlog2pethxTHORN (9)

Equation 8 provides an alternative interpretation of themutual information I(gi x) which is illustrated in Figure 3In the case of a single gene g the ldquototal entropyrdquo S[Pg(g)]represented on the left measures the range of gene expres-sion available across the whole embryo This total entropyor dynamic range can be written as the sum of two contri-butions One part is due to the systematic modulation of gwith position x and this is the useful part (the ldquosignalrdquo) orthe mutual information I(gi x) The other contribution isthe variability in g that remains even at constant position x thisrepresents pure ldquonoiserdquo that carries no information about posi-tion and is formally measured by the average entropy of theconditional distribution (the noise entropy) hS[P(g|x)]ix Po-sitional information carried by g is thus the difference betweenthe total and noise entropies as expressed in Equation 8

Mutual information is theoretically well founded andis always nonnegative being 0 if and only if there is nostatistical dependence of any kind between the positionand the gene expression level Conversely if there are Ibits of mutual information between the position and theexpression level there are ) 2IethfgigxTHORN distinguishable gene

Positional Information and Error 43

expression patterns that can be generated by movingalong the AP axis from the head at x = 0 to the tail atx = 1 This is precisely the property we require from anysuitable measure of positional information We thereforesuggest that mathematically positional informationshould be defined as the mutual information between ex-pression level and position I(gi x)

Defining positional error Thus far we have discussedpositional information in terms of the statistical dependencyand the number of distinguishable levels of gene expressionalong the position coordinate To present an alternativeinterpretation we start by using the symmetry property ofthe mutual information and rewrite I(gi x) as

Iethfgig xTHORN frac14 Sfrac12Px ethxTHORN(2Sfrac12PethxjfgigTHORN(

(PgethfgigTHORN (10)

ie the difference between the (uniform) distribution overall possible positions of a cell in the embryo and the distri-bution of positions consistent with a given expression levelHere P(x|gi) can be obtained using Bayesrsquo rule from theknown quantities

PethxjfgigTHORN frac14PethfgigjxTHORNPxethxTHORN

PgethfgigTHORN (11)

The total entropy of all positions S[Px(x)] in Equation 10 isindependent of the particular regulatory systemmdashit simply

measures the prior uncertainty about the location of the cells inthe absence of knowing any gene expression level If howeverthe cell has access to the expression levels of a particular set ofgenes this uncertainty is reduced and it is hence possible tolocalize the cell much more precisely the reduction in uncer-tainty is captured by the second term in Equation 10 This formof positional information emphasizes the decoding view that isthat cells can infer their positions by simultaneously readingout protein concentrations of various genes (Figure 3)

Positional information is a single number it is a globalmeasure of the reproducibility in the patterning system Isthere a local quantity that would tell us position by positionhow well cells can read out their gene expression levels andinfer their location Is positional information ldquodistributedequallyrdquo along the AP axis or is it very nonuniform suchthat cells in some regions of the embryo are much betterat reproducibly assuming their roles

The optimal estimator of the true location x of a cell oncewe (or the cell) measure the gene expression levels fgi gis the maximum a posteriori (MAP) estimate xethfgi gTHORN frac14argmaxthinsp Pethxjfgi gTHORN In cases like ours where the prior distri-bution Px(x) is uniform this equals the maximum-likelihood(ML) estimate

xn

gio$

frac14 argmaxthinsp Pn

gio)))x

$ (12)

thus for each expression level readout this ldquodecoding rulerdquogives us the most likely position of the cell x The inset inFigure 3 illustrates the decoding in the case of one gene

How well can this (optimal) rule perform The expectederror of the estimated x is given by s2

xethxTHORN frac14ethx2xTHORN2

(

where brackets denote averaging over Pethxjfgi gTHORN This erroris a function of the gene expression levels however we canalso evaluate it for every x since we know the mean geneexpression profiles giethxTHORN for every x Thus we define a newquantity the positional error sx(x) which measures howwell cells at a true position x are able to estimate their positionbased on the gene expression levels alone This is the localmeasure of positional information that we were aiming for

Independently of how cells actually read out the concen-trations mechanistically it can be shown that sx(x) cannotbe lower than the limit set by the CramerndashRao bound (Coverand Thomas 1991)

sx2ethxTHORN$ 1

IethxTHORN (13)

where IethxTHORN is the Fisher information given by

IethxTHORN frac14 2

2 log PethfgigjxTHORN

x2

+

PethfgigjxTHORN (14)

Despite its name the Fisher information I is not an information-theoretic quantity and unlike the mutual information I theFisher information depends on position Is there a connectionbetween the positional error sx(x) and the mutual infor-mation I(gi x) Below we sketch the derivation following

Figure 3 The mathematics of positional information and positional errorfor one gene The mean profile of gene g (thick solid line) and its vari-ability (shaded envelope) across embryos are shown Nuclei are distrib-uted uniformly along the AP axis ie the prior distribution of nuclearpositions Px(x) is uniform (shown at the bottom) The total distribution ofexpression levels across the embryo Pg(g) is determined by averagingP(g|x) over all positions and is shown on the left The positional informationI(g x) is related to these two distributions as in Equation 8 Inset showsdecoding (ie estimating) the position of a nucleus from a measuredexpression level g Prior to the ldquomeasurementrdquo all positions are equallylikely After observing the value g the positions consistent with thismeasured values are drawn from P(x|g) The best estimate of the trueposition x is at the peak of this distribution and the positional errorsx(x) is the distributionrsquos width Positional information I(g x) can equallybe computed from Px(x) and P(x|g) by Equation 10

44 G Tkacik et al

Brunel and Nadal (1998) demonstrating the link for the caseof one gene g

Let us assume that the Gaussian approximation of Equa-tion 4 holds and that the distribution of the levels of a singlegene at a given position is

PethgjxTHORN frac14 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ps2

gethxTHORNq exp

(212ethg2gethxTHORNTHORN2

s2gethxTHORN

) (15)

We can use the Gaussian distribution to compute the Fisherinformation in Equation 14 We find that

I frac14 g92ethxTHORNs2gethxTHORN

thorn 2s92gethxTHORNs2gethxTHORN

(16)

where (amp)9 denotes a derivative with respect to position xInformation about position is thus carried by the change inmean profile with position as well as the change in the var-iability itself with position If the noise is small sg $ g thentypically it will be the case that

))s9g)) $ jg9j (a potentially

important issue arises when the profiles and their variabilityare estimated from noisy sampled data in this case oneneeds to be careful about spurious contributions to Fisherinformation due to noise-induced gradients of the profilesand the variability) If we formally require

))s9g)) $ jg9j we

can retain only the first term to obtain a bound on positionalerror

s2xethxTHORN$

1IethxTHORN

-dgdx

22s2gethxTHORN (17)

This result is intuitively straightforward it is simply thetransformation of the variability in gene expression s2

gethxTHORN intoan effective variance in the position estimate s2

x ethxTHORN and thetwo are related by slope of the inputoutput relation gethxTHORN

A crucial next step is to think of x as determining geneexpressions gi probabilistically and the x as being a functionof these gene expression levelsmdashthat is when computing xneither we nor the nuclei have access to the true positionThis forms a dependency chain x gi x Since eachof these steps is probabilistic it can only lose informationsuch that by information-processing inequality (Cover andThomas 1991) we must have I(gi x) $ I(x x) The mu-tual information between the true location and its estimateis given by

Iethx xTHORN frac14 Sfrac12PxethxTHORN(2DSfrac12PethxjxTHORN(

E

PxethxTHORN (18)

Under weak assumptions the first term in our case is ap-proximately the entropy of a uniform distribution While wedo not know the full distribution P(x|x) and thus cannotcompute its entropy directly we know its variance which isjust the square of the positional error s2

xethxTHORN Regardless ofwhat the full distribution is its entropy must be less or equal

to the entropy of the Gaussian distribution of the same var-iance which is Sfrac12PethxjxTHORN( frac14 log2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

bits Putting ev-erything together we find that

Iethfgig xTHORN$ I ethx xTHORN$log2

PxethxTHORNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(19)

Therefore positional information I(gi x) puts an upperbound to the average ability of the cells to infer their loca-tions that is to the smallness of the positional error sx(x)In a straightforward generalization of a single-gene case theFisher information for a multivariate Gaussian distributionwritten for compactness in matrix notation yields

I frac14 g9TC21g9thorn 12TrhC21C9C21C9

i (20)

Under the same assumptions we made for the case of a singlegene we retain only the first term so that the expression forpositional error written out explicitly in the componentnotation reads

s2x ethxTHORN$

1IethxTHORN

0

XN

ijfrac141

dgidx

hCethxTHORN21

i

ij

dgjdx

1

A21

(21)

where Cij is the covariance matrix of the profiles as definedin Equation 3 This extends the fundamental connectionEquation 19 between the positional information and posi-tional error to the case of multiple genes Importantly allquantitiesmdashthe mean profiles and their covariancemdashinEquation 21 can be obtained from experimental data sosx(x) is a quantity that can be estimated directly

Interpretation in a spatially discrete (cellular) system Inthe setup presented above gene expression levels carryinformation about a continuous position variable x But ina real biological system there exists a minimal spatial scalemdashthe scale of individual nuclei (or cells)mdashbelow which theconcept of gene expression at a position is no longer welldefined This is particularly the case in systems that interpretmolecular concentrations to make decisions that determinecell fates such interpretations have no meaning at the spa-tial scale of a molecule but only at cellular scales How shouldpositional information be interpreted in such a context

When noise is small enough and the distribution ofpositions consistent with observed gene expression levelsP(x|gi) of Equation 11 is nearly Gaussian and Equations19 and 21 become tight bounds In this case we can applyEquation 10 directly In developmental systems cells ornuclei are often distributed in space such that the intercel-lular (or internuclear) spacings are to a good approxima-tion equal at least in systems we are studying there areno significant local rarefactions or overabundances of cellsor nuclei Mathematically this amounts to assuming thatPx(x) = 1L (where L is the linear spatial extent over whichthe cells or nuclei are distributed) This assumption of

Positional Information and Error 45

uniformity is not crucial for any calculation in this articleand can be easily relaxed but it makes the equations some-what simpler to display and interpret Using a uniform dis-tribution for Px(x) the information is

Iethfgig xTHORN log2

Lffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(22)

Note that in the problem setup we have normalized thespatial axis so that the length L= 1 in the formula above (orif we restrict the attention to some section of the embryothe fractional size of that section) For any real data set oneneeds to verify that the approximations leading to this resultare warranted (see below) Assuming that Equation 22 fur-ther illustrates the connection between positional error andpositional information Consider a one-dimensional row ofnuclei spaced by internuclear distance d along the AP axisas illustrated in Figure 4 To determine its position along theAP axis each nucleus has to generate an estimate x of itstrue position x by reading out gap gene expression levels Inthe simplest case the errors of these estimates are indepen-dent normally distributed with a mean of zero and a vari-ance s2

x Given these parameters there is some probabilityPerror that the positional estimate deviates by more than thelattice spacing d in which case the nucleus would be assignedto the wrong position and perfectly unique nucleus-by-nucleus identifiability would be impossible Figure 4 showshow the positional information of Equation 22 and the proba-bility of fate misassignment Perror vary as functions of sxImportantly even when sx d that is the positional erroris smaller than the internuclear spacing (as in Figure 4A) thereis still some probability of nuclear misidentification and there-fore the positional information has not yet saturated Onlywhen the positional error is sufficiently small that the proba-bility weight in the tails of the Gaussian distribution is negligi-ble (at sx $ d) can each of the Nn nuclei be perfectlyidentified and the information saturates at log2(Nn) bits

In sum we have shown that a rigorous mathematicalframework of positional information can quantify the repro-ducibility of gene expression profiles in a global manner By

framing the cellsrsquo problem of finding their location in theembryo in terms of an estimation problem we have shownthat the same mathematical framework of positional infor-mation places precise constraints on how well the cells caninfer their positions by reading out a set of genes Theseconstraints are universal regardless of how complex themechanistic details of the cellsrsquo readout of the gene concen-tration levels are the expression level variability preventsthe cells from decreasing the positional error below sx Theconcept of positional error easily generalizes to the case ofmultiple genes and is diagnostic about how positional infor-mation is distributed along the AP axis

Estimation methodology

There are a number of technical details that need to befulfilled when applying the formalisms of positional in-formation and positional error to real data sets Estimatinginformation theoretic quantities from finite data is generallychallenging (Paninski 2003 Slonim et al 2005) In practiceit involves combining general theory and algorithms for in-formation estimation with problem-specific approximationsHere we present the information content of the four majorgap genes in early Drosophila embryos as a test case First webriefly recapitulate our experimental and data-processingmethods (for details see Dubuis et al 2013a) Next wepresent the statistical techniques necessary to consistentlymerge data from separate pairwise gap gene immunostain-ing experiments into a single data set Then we estimatemutual information directly from data for one gene usingdifferent profile normalization and alignment methods Fi-nally we introduce the Gaussian noise approximation andan adaptive Monte Carlo integration scheme to extract theinformation carried jointly by pairs of genes and by the fullset of four genes

Extracting expression level profiles from imaging dataDrosophila embryos were fixed and simultaneously immu-nostained for the four major gap genes hunchback (hb)Kruumlppel (Kr) knirps (kni) and giant (gt) using fluorescentantibodies with minimal spectral overlap as described in

Figure 4 Positional information error and perfect identi-fiability (A) Nuclei separated by distance d decode theirposition from gap gene expression levels (top) The prob-ability P(x|x) for the estimated position x is Gaussian(solid black line for the central nucleus and dashed linefor the right next nucleus) its width is the positionalerror sx The probability Perror that the central nucleus isassigned to the wrong lattice position is equal to the in-tegral of the tails |x| d of the distribution P(x|x) (redarea) (B) Positional information (blue) and probability offalse assignment (red) as a function of positional error sxBlue arrow indicates the value of positional error sx =d used for the toy example in A specifically we usedNn = 59 nuclei uniformly tightly packing the central 80of the AP axis (L = 08 and thus d = LNn) In comparison

the 1 positional error inferred from data in Dubuis et al (2013b) corresponds to the green arrow The information is computed using Equation 22and made to saturate at Imax = log2(Nn) bits (perfect identifiability) All parameters are chosen to roughly match the results of the Drosophila gapgene analysis

46 G Tkacik et al

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 6: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

expression patterns that can be generated by movingalong the AP axis from the head at x = 0 to the tail atx = 1 This is precisely the property we require from anysuitable measure of positional information We thereforesuggest that mathematically positional informationshould be defined as the mutual information between ex-pression level and position I(gi x)

Defining positional error Thus far we have discussedpositional information in terms of the statistical dependencyand the number of distinguishable levels of gene expressionalong the position coordinate To present an alternativeinterpretation we start by using the symmetry property ofthe mutual information and rewrite I(gi x) as

Iethfgig xTHORN frac14 Sfrac12Px ethxTHORN(2Sfrac12PethxjfgigTHORN(

(PgethfgigTHORN (10)

ie the difference between the (uniform) distribution overall possible positions of a cell in the embryo and the distri-bution of positions consistent with a given expression levelHere P(x|gi) can be obtained using Bayesrsquo rule from theknown quantities

PethxjfgigTHORN frac14PethfgigjxTHORNPxethxTHORN

PgethfgigTHORN (11)

The total entropy of all positions S[Px(x)] in Equation 10 isindependent of the particular regulatory systemmdashit simply

measures the prior uncertainty about the location of the cells inthe absence of knowing any gene expression level If howeverthe cell has access to the expression levels of a particular set ofgenes this uncertainty is reduced and it is hence possible tolocalize the cell much more precisely the reduction in uncer-tainty is captured by the second term in Equation 10 This formof positional information emphasizes the decoding view that isthat cells can infer their positions by simultaneously readingout protein concentrations of various genes (Figure 3)

Positional information is a single number it is a globalmeasure of the reproducibility in the patterning system Isthere a local quantity that would tell us position by positionhow well cells can read out their gene expression levels andinfer their location Is positional information ldquodistributedequallyrdquo along the AP axis or is it very nonuniform suchthat cells in some regions of the embryo are much betterat reproducibly assuming their roles

The optimal estimator of the true location x of a cell oncewe (or the cell) measure the gene expression levels fgi gis the maximum a posteriori (MAP) estimate xethfgi gTHORN frac14argmaxthinsp Pethxjfgi gTHORN In cases like ours where the prior distri-bution Px(x) is uniform this equals the maximum-likelihood(ML) estimate

xn

gio$

frac14 argmaxthinsp Pn

gio)))x

$ (12)

thus for each expression level readout this ldquodecoding rulerdquogives us the most likely position of the cell x The inset inFigure 3 illustrates the decoding in the case of one gene

How well can this (optimal) rule perform The expectederror of the estimated x is given by s2

xethxTHORN frac14ethx2xTHORN2

(

where brackets denote averaging over Pethxjfgi gTHORN This erroris a function of the gene expression levels however we canalso evaluate it for every x since we know the mean geneexpression profiles giethxTHORN for every x Thus we define a newquantity the positional error sx(x) which measures howwell cells at a true position x are able to estimate their positionbased on the gene expression levels alone This is the localmeasure of positional information that we were aiming for

Independently of how cells actually read out the concen-trations mechanistically it can be shown that sx(x) cannotbe lower than the limit set by the CramerndashRao bound (Coverand Thomas 1991)

sx2ethxTHORN$ 1

IethxTHORN (13)

where IethxTHORN is the Fisher information given by

IethxTHORN frac14 2

2 log PethfgigjxTHORN

x2

+

PethfgigjxTHORN (14)

Despite its name the Fisher information I is not an information-theoretic quantity and unlike the mutual information I theFisher information depends on position Is there a connectionbetween the positional error sx(x) and the mutual infor-mation I(gi x) Below we sketch the derivation following

Figure 3 The mathematics of positional information and positional errorfor one gene The mean profile of gene g (thick solid line) and its vari-ability (shaded envelope) across embryos are shown Nuclei are distrib-uted uniformly along the AP axis ie the prior distribution of nuclearpositions Px(x) is uniform (shown at the bottom) The total distribution ofexpression levels across the embryo Pg(g) is determined by averagingP(g|x) over all positions and is shown on the left The positional informationI(g x) is related to these two distributions as in Equation 8 Inset showsdecoding (ie estimating) the position of a nucleus from a measuredexpression level g Prior to the ldquomeasurementrdquo all positions are equallylikely After observing the value g the positions consistent with thismeasured values are drawn from P(x|g) The best estimate of the trueposition x is at the peak of this distribution and the positional errorsx(x) is the distributionrsquos width Positional information I(g x) can equallybe computed from Px(x) and P(x|g) by Equation 10

44 G Tkacik et al

Brunel and Nadal (1998) demonstrating the link for the caseof one gene g

Let us assume that the Gaussian approximation of Equa-tion 4 holds and that the distribution of the levels of a singlegene at a given position is

PethgjxTHORN frac14 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ps2

gethxTHORNq exp

(212ethg2gethxTHORNTHORN2

s2gethxTHORN

) (15)

We can use the Gaussian distribution to compute the Fisherinformation in Equation 14 We find that

I frac14 g92ethxTHORNs2gethxTHORN

thorn 2s92gethxTHORNs2gethxTHORN

(16)

where (amp)9 denotes a derivative with respect to position xInformation about position is thus carried by the change inmean profile with position as well as the change in the var-iability itself with position If the noise is small sg $ g thentypically it will be the case that

))s9g)) $ jg9j (a potentially

important issue arises when the profiles and their variabilityare estimated from noisy sampled data in this case oneneeds to be careful about spurious contributions to Fisherinformation due to noise-induced gradients of the profilesand the variability) If we formally require

))s9g)) $ jg9j we

can retain only the first term to obtain a bound on positionalerror

s2xethxTHORN$

1IethxTHORN

-dgdx

22s2gethxTHORN (17)

This result is intuitively straightforward it is simply thetransformation of the variability in gene expression s2

gethxTHORN intoan effective variance in the position estimate s2

x ethxTHORN and thetwo are related by slope of the inputoutput relation gethxTHORN

A crucial next step is to think of x as determining geneexpressions gi probabilistically and the x as being a functionof these gene expression levelsmdashthat is when computing xneither we nor the nuclei have access to the true positionThis forms a dependency chain x gi x Since eachof these steps is probabilistic it can only lose informationsuch that by information-processing inequality (Cover andThomas 1991) we must have I(gi x) $ I(x x) The mu-tual information between the true location and its estimateis given by

Iethx xTHORN frac14 Sfrac12PxethxTHORN(2DSfrac12PethxjxTHORN(

E

PxethxTHORN (18)

Under weak assumptions the first term in our case is ap-proximately the entropy of a uniform distribution While wedo not know the full distribution P(x|x) and thus cannotcompute its entropy directly we know its variance which isjust the square of the positional error s2

xethxTHORN Regardless ofwhat the full distribution is its entropy must be less or equal

to the entropy of the Gaussian distribution of the same var-iance which is Sfrac12PethxjxTHORN( frac14 log2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

bits Putting ev-erything together we find that

Iethfgig xTHORN$ I ethx xTHORN$log2

PxethxTHORNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(19)

Therefore positional information I(gi x) puts an upperbound to the average ability of the cells to infer their loca-tions that is to the smallness of the positional error sx(x)In a straightforward generalization of a single-gene case theFisher information for a multivariate Gaussian distributionwritten for compactness in matrix notation yields

I frac14 g9TC21g9thorn 12TrhC21C9C21C9

i (20)

Under the same assumptions we made for the case of a singlegene we retain only the first term so that the expression forpositional error written out explicitly in the componentnotation reads

s2x ethxTHORN$

1IethxTHORN

0

XN

ijfrac141

dgidx

hCethxTHORN21

i

ij

dgjdx

1

A21

(21)

where Cij is the covariance matrix of the profiles as definedin Equation 3 This extends the fundamental connectionEquation 19 between the positional information and posi-tional error to the case of multiple genes Importantly allquantitiesmdashthe mean profiles and their covariancemdashinEquation 21 can be obtained from experimental data sosx(x) is a quantity that can be estimated directly

Interpretation in a spatially discrete (cellular) system Inthe setup presented above gene expression levels carryinformation about a continuous position variable x But ina real biological system there exists a minimal spatial scalemdashthe scale of individual nuclei (or cells)mdashbelow which theconcept of gene expression at a position is no longer welldefined This is particularly the case in systems that interpretmolecular concentrations to make decisions that determinecell fates such interpretations have no meaning at the spa-tial scale of a molecule but only at cellular scales How shouldpositional information be interpreted in such a context

When noise is small enough and the distribution ofpositions consistent with observed gene expression levelsP(x|gi) of Equation 11 is nearly Gaussian and Equations19 and 21 become tight bounds In this case we can applyEquation 10 directly In developmental systems cells ornuclei are often distributed in space such that the intercel-lular (or internuclear) spacings are to a good approxima-tion equal at least in systems we are studying there areno significant local rarefactions or overabundances of cellsor nuclei Mathematically this amounts to assuming thatPx(x) = 1L (where L is the linear spatial extent over whichthe cells or nuclei are distributed) This assumption of

Positional Information and Error 45

uniformity is not crucial for any calculation in this articleand can be easily relaxed but it makes the equations some-what simpler to display and interpret Using a uniform dis-tribution for Px(x) the information is

Iethfgig xTHORN log2

Lffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(22)

Note that in the problem setup we have normalized thespatial axis so that the length L= 1 in the formula above (orif we restrict the attention to some section of the embryothe fractional size of that section) For any real data set oneneeds to verify that the approximations leading to this resultare warranted (see below) Assuming that Equation 22 fur-ther illustrates the connection between positional error andpositional information Consider a one-dimensional row ofnuclei spaced by internuclear distance d along the AP axisas illustrated in Figure 4 To determine its position along theAP axis each nucleus has to generate an estimate x of itstrue position x by reading out gap gene expression levels Inthe simplest case the errors of these estimates are indepen-dent normally distributed with a mean of zero and a vari-ance s2

x Given these parameters there is some probabilityPerror that the positional estimate deviates by more than thelattice spacing d in which case the nucleus would be assignedto the wrong position and perfectly unique nucleus-by-nucleus identifiability would be impossible Figure 4 showshow the positional information of Equation 22 and the proba-bility of fate misassignment Perror vary as functions of sxImportantly even when sx d that is the positional erroris smaller than the internuclear spacing (as in Figure 4A) thereis still some probability of nuclear misidentification and there-fore the positional information has not yet saturated Onlywhen the positional error is sufficiently small that the proba-bility weight in the tails of the Gaussian distribution is negligi-ble (at sx $ d) can each of the Nn nuclei be perfectlyidentified and the information saturates at log2(Nn) bits

In sum we have shown that a rigorous mathematicalframework of positional information can quantify the repro-ducibility of gene expression profiles in a global manner By

framing the cellsrsquo problem of finding their location in theembryo in terms of an estimation problem we have shownthat the same mathematical framework of positional infor-mation places precise constraints on how well the cells caninfer their positions by reading out a set of genes Theseconstraints are universal regardless of how complex themechanistic details of the cellsrsquo readout of the gene concen-tration levels are the expression level variability preventsthe cells from decreasing the positional error below sx Theconcept of positional error easily generalizes to the case ofmultiple genes and is diagnostic about how positional infor-mation is distributed along the AP axis

Estimation methodology

There are a number of technical details that need to befulfilled when applying the formalisms of positional in-formation and positional error to real data sets Estimatinginformation theoretic quantities from finite data is generallychallenging (Paninski 2003 Slonim et al 2005) In practiceit involves combining general theory and algorithms for in-formation estimation with problem-specific approximationsHere we present the information content of the four majorgap genes in early Drosophila embryos as a test case First webriefly recapitulate our experimental and data-processingmethods (for details see Dubuis et al 2013a) Next wepresent the statistical techniques necessary to consistentlymerge data from separate pairwise gap gene immunostain-ing experiments into a single data set Then we estimatemutual information directly from data for one gene usingdifferent profile normalization and alignment methods Fi-nally we introduce the Gaussian noise approximation andan adaptive Monte Carlo integration scheme to extract theinformation carried jointly by pairs of genes and by the fullset of four genes

Extracting expression level profiles from imaging dataDrosophila embryos were fixed and simultaneously immu-nostained for the four major gap genes hunchback (hb)Kruumlppel (Kr) knirps (kni) and giant (gt) using fluorescentantibodies with minimal spectral overlap as described in

Figure 4 Positional information error and perfect identi-fiability (A) Nuclei separated by distance d decode theirposition from gap gene expression levels (top) The prob-ability P(x|x) for the estimated position x is Gaussian(solid black line for the central nucleus and dashed linefor the right next nucleus) its width is the positionalerror sx The probability Perror that the central nucleus isassigned to the wrong lattice position is equal to the in-tegral of the tails |x| d of the distribution P(x|x) (redarea) (B) Positional information (blue) and probability offalse assignment (red) as a function of positional error sxBlue arrow indicates the value of positional error sx =d used for the toy example in A specifically we usedNn = 59 nuclei uniformly tightly packing the central 80of the AP axis (L = 08 and thus d = LNn) In comparison

the 1 positional error inferred from data in Dubuis et al (2013b) corresponds to the green arrow The information is computed using Equation 22and made to saturate at Imax = log2(Nn) bits (perfect identifiability) All parameters are chosen to roughly match the results of the Drosophila gapgene analysis

46 G Tkacik et al

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 7: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

Brunel and Nadal (1998) demonstrating the link for the caseof one gene g

Let us assume that the Gaussian approximation of Equa-tion 4 holds and that the distribution of the levels of a singlegene at a given position is

PethgjxTHORN frac14 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ps2

gethxTHORNq exp

(212ethg2gethxTHORNTHORN2

s2gethxTHORN

) (15)

We can use the Gaussian distribution to compute the Fisherinformation in Equation 14 We find that

I frac14 g92ethxTHORNs2gethxTHORN

thorn 2s92gethxTHORNs2gethxTHORN

(16)

where (amp)9 denotes a derivative with respect to position xInformation about position is thus carried by the change inmean profile with position as well as the change in the var-iability itself with position If the noise is small sg $ g thentypically it will be the case that

))s9g)) $ jg9j (a potentially

important issue arises when the profiles and their variabilityare estimated from noisy sampled data in this case oneneeds to be careful about spurious contributions to Fisherinformation due to noise-induced gradients of the profilesand the variability) If we formally require

))s9g)) $ jg9j we

can retain only the first term to obtain a bound on positionalerror

s2xethxTHORN$

1IethxTHORN

-dgdx

22s2gethxTHORN (17)

This result is intuitively straightforward it is simply thetransformation of the variability in gene expression s2

gethxTHORN intoan effective variance in the position estimate s2

x ethxTHORN and thetwo are related by slope of the inputoutput relation gethxTHORN

A crucial next step is to think of x as determining geneexpressions gi probabilistically and the x as being a functionof these gene expression levelsmdashthat is when computing xneither we nor the nuclei have access to the true positionThis forms a dependency chain x gi x Since eachof these steps is probabilistic it can only lose informationsuch that by information-processing inequality (Cover andThomas 1991) we must have I(gi x) $ I(x x) The mu-tual information between the true location and its estimateis given by

Iethx xTHORN frac14 Sfrac12PxethxTHORN(2DSfrac12PethxjxTHORN(

E

PxethxTHORN (18)

Under weak assumptions the first term in our case is ap-proximately the entropy of a uniform distribution While wedo not know the full distribution P(x|x) and thus cannotcompute its entropy directly we know its variance which isjust the square of the positional error s2

xethxTHORN Regardless ofwhat the full distribution is its entropy must be less or equal

to the entropy of the Gaussian distribution of the same var-iance which is Sfrac12PethxjxTHORN( frac14 log2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

bits Putting ev-erything together we find that

Iethfgig xTHORN$ I ethx xTHORN$log2

PxethxTHORNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(19)

Therefore positional information I(gi x) puts an upperbound to the average ability of the cells to infer their loca-tions that is to the smallness of the positional error sx(x)In a straightforward generalization of a single-gene case theFisher information for a multivariate Gaussian distributionwritten for compactness in matrix notation yields

I frac14 g9TC21g9thorn 12TrhC21C9C21C9

i (20)

Under the same assumptions we made for the case of a singlegene we retain only the first term so that the expression forpositional error written out explicitly in the componentnotation reads

s2x ethxTHORN$

1IethxTHORN

0

XN

ijfrac141

dgidx

hCethxTHORN21

i

ij

dgjdx

1

A21

(21)

where Cij is the covariance matrix of the profiles as definedin Equation 3 This extends the fundamental connectionEquation 19 between the positional information and posi-tional error to the case of multiple genes Importantly allquantitiesmdashthe mean profiles and their covariancemdashinEquation 21 can be obtained from experimental data sosx(x) is a quantity that can be estimated directly

Interpretation in a spatially discrete (cellular) system Inthe setup presented above gene expression levels carryinformation about a continuous position variable x But ina real biological system there exists a minimal spatial scalemdashthe scale of individual nuclei (or cells)mdashbelow which theconcept of gene expression at a position is no longer welldefined This is particularly the case in systems that interpretmolecular concentrations to make decisions that determinecell fates such interpretations have no meaning at the spa-tial scale of a molecule but only at cellular scales How shouldpositional information be interpreted in such a context

When noise is small enough and the distribution ofpositions consistent with observed gene expression levelsP(x|gi) of Equation 11 is nearly Gaussian and Equations19 and 21 become tight bounds In this case we can applyEquation 10 directly In developmental systems cells ornuclei are often distributed in space such that the intercel-lular (or internuclear) spacings are to a good approxima-tion equal at least in systems we are studying there areno significant local rarefactions or overabundances of cellsor nuclei Mathematically this amounts to assuming thatPx(x) = 1L (where L is the linear spatial extent over whichthe cells or nuclei are distributed) This assumption of

Positional Information and Error 45

uniformity is not crucial for any calculation in this articleand can be easily relaxed but it makes the equations some-what simpler to display and interpret Using a uniform dis-tribution for Px(x) the information is

Iethfgig xTHORN log2

Lffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(22)

Note that in the problem setup we have normalized thespatial axis so that the length L= 1 in the formula above (orif we restrict the attention to some section of the embryothe fractional size of that section) For any real data set oneneeds to verify that the approximations leading to this resultare warranted (see below) Assuming that Equation 22 fur-ther illustrates the connection between positional error andpositional information Consider a one-dimensional row ofnuclei spaced by internuclear distance d along the AP axisas illustrated in Figure 4 To determine its position along theAP axis each nucleus has to generate an estimate x of itstrue position x by reading out gap gene expression levels Inthe simplest case the errors of these estimates are indepen-dent normally distributed with a mean of zero and a vari-ance s2

x Given these parameters there is some probabilityPerror that the positional estimate deviates by more than thelattice spacing d in which case the nucleus would be assignedto the wrong position and perfectly unique nucleus-by-nucleus identifiability would be impossible Figure 4 showshow the positional information of Equation 22 and the proba-bility of fate misassignment Perror vary as functions of sxImportantly even when sx d that is the positional erroris smaller than the internuclear spacing (as in Figure 4A) thereis still some probability of nuclear misidentification and there-fore the positional information has not yet saturated Onlywhen the positional error is sufficiently small that the proba-bility weight in the tails of the Gaussian distribution is negligi-ble (at sx $ d) can each of the Nn nuclei be perfectlyidentified and the information saturates at log2(Nn) bits

In sum we have shown that a rigorous mathematicalframework of positional information can quantify the repro-ducibility of gene expression profiles in a global manner By

framing the cellsrsquo problem of finding their location in theembryo in terms of an estimation problem we have shownthat the same mathematical framework of positional infor-mation places precise constraints on how well the cells caninfer their positions by reading out a set of genes Theseconstraints are universal regardless of how complex themechanistic details of the cellsrsquo readout of the gene concen-tration levels are the expression level variability preventsthe cells from decreasing the positional error below sx Theconcept of positional error easily generalizes to the case ofmultiple genes and is diagnostic about how positional infor-mation is distributed along the AP axis

Estimation methodology

There are a number of technical details that need to befulfilled when applying the formalisms of positional in-formation and positional error to real data sets Estimatinginformation theoretic quantities from finite data is generallychallenging (Paninski 2003 Slonim et al 2005) In practiceit involves combining general theory and algorithms for in-formation estimation with problem-specific approximationsHere we present the information content of the four majorgap genes in early Drosophila embryos as a test case First webriefly recapitulate our experimental and data-processingmethods (for details see Dubuis et al 2013a) Next wepresent the statistical techniques necessary to consistentlymerge data from separate pairwise gap gene immunostain-ing experiments into a single data set Then we estimatemutual information directly from data for one gene usingdifferent profile normalization and alignment methods Fi-nally we introduce the Gaussian noise approximation andan adaptive Monte Carlo integration scheme to extract theinformation carried jointly by pairs of genes and by the fullset of four genes

Extracting expression level profiles from imaging dataDrosophila embryos were fixed and simultaneously immu-nostained for the four major gap genes hunchback (hb)Kruumlppel (Kr) knirps (kni) and giant (gt) using fluorescentantibodies with minimal spectral overlap as described in

Figure 4 Positional information error and perfect identi-fiability (A) Nuclei separated by distance d decode theirposition from gap gene expression levels (top) The prob-ability P(x|x) for the estimated position x is Gaussian(solid black line for the central nucleus and dashed linefor the right next nucleus) its width is the positionalerror sx The probability Perror that the central nucleus isassigned to the wrong lattice position is equal to the in-tegral of the tails |x| d of the distribution P(x|x) (redarea) (B) Positional information (blue) and probability offalse assignment (red) as a function of positional error sxBlue arrow indicates the value of positional error sx =d used for the toy example in A specifically we usedNn = 59 nuclei uniformly tightly packing the central 80of the AP axis (L = 08 and thus d = LNn) In comparison

the 1 positional error inferred from data in Dubuis et al (2013b) corresponds to the green arrow The information is computed using Equation 22and made to saturate at Imax = log2(Nn) bits (perfect identifiability) All parameters are chosen to roughly match the results of the Drosophila gapgene analysis

46 G Tkacik et al

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 8: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

uniformity is not crucial for any calculation in this articleand can be easily relaxed but it makes the equations some-what simpler to display and interpret Using a uniform dis-tribution for Px(x) the information is

Iethfgig xTHORN log2

Lffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pes2

xethxTHORNp

+

x

(22)

Note that in the problem setup we have normalized thespatial axis so that the length L= 1 in the formula above (orif we restrict the attention to some section of the embryothe fractional size of that section) For any real data set oneneeds to verify that the approximations leading to this resultare warranted (see below) Assuming that Equation 22 fur-ther illustrates the connection between positional error andpositional information Consider a one-dimensional row ofnuclei spaced by internuclear distance d along the AP axisas illustrated in Figure 4 To determine its position along theAP axis each nucleus has to generate an estimate x of itstrue position x by reading out gap gene expression levels Inthe simplest case the errors of these estimates are indepen-dent normally distributed with a mean of zero and a vari-ance s2

x Given these parameters there is some probabilityPerror that the positional estimate deviates by more than thelattice spacing d in which case the nucleus would be assignedto the wrong position and perfectly unique nucleus-by-nucleus identifiability would be impossible Figure 4 showshow the positional information of Equation 22 and the proba-bility of fate misassignment Perror vary as functions of sxImportantly even when sx d that is the positional erroris smaller than the internuclear spacing (as in Figure 4A) thereis still some probability of nuclear misidentification and there-fore the positional information has not yet saturated Onlywhen the positional error is sufficiently small that the proba-bility weight in the tails of the Gaussian distribution is negligi-ble (at sx $ d) can each of the Nn nuclei be perfectlyidentified and the information saturates at log2(Nn) bits

In sum we have shown that a rigorous mathematicalframework of positional information can quantify the repro-ducibility of gene expression profiles in a global manner By

framing the cellsrsquo problem of finding their location in theembryo in terms of an estimation problem we have shownthat the same mathematical framework of positional infor-mation places precise constraints on how well the cells caninfer their positions by reading out a set of genes Theseconstraints are universal regardless of how complex themechanistic details of the cellsrsquo readout of the gene concen-tration levels are the expression level variability preventsthe cells from decreasing the positional error below sx Theconcept of positional error easily generalizes to the case ofmultiple genes and is diagnostic about how positional infor-mation is distributed along the AP axis

Estimation methodology

There are a number of technical details that need to befulfilled when applying the formalisms of positional in-formation and positional error to real data sets Estimatinginformation theoretic quantities from finite data is generallychallenging (Paninski 2003 Slonim et al 2005) In practiceit involves combining general theory and algorithms for in-formation estimation with problem-specific approximationsHere we present the information content of the four majorgap genes in early Drosophila embryos as a test case First webriefly recapitulate our experimental and data-processingmethods (for details see Dubuis et al 2013a) Next wepresent the statistical techniques necessary to consistentlymerge data from separate pairwise gap gene immunostain-ing experiments into a single data set Then we estimatemutual information directly from data for one gene usingdifferent profile normalization and alignment methods Fi-nally we introduce the Gaussian noise approximation andan adaptive Monte Carlo integration scheme to extract theinformation carried jointly by pairs of genes and by the fullset of four genes

Extracting expression level profiles from imaging dataDrosophila embryos were fixed and simultaneously immu-nostained for the four major gap genes hunchback (hb)Kruumlppel (Kr) knirps (kni) and giant (gt) using fluorescentantibodies with minimal spectral overlap as described in

Figure 4 Positional information error and perfect identi-fiability (A) Nuclei separated by distance d decode theirposition from gap gene expression levels (top) The prob-ability P(x|x) for the estimated position x is Gaussian(solid black line for the central nucleus and dashed linefor the right next nucleus) its width is the positionalerror sx The probability Perror that the central nucleus isassigned to the wrong lattice position is equal to the in-tegral of the tails |x| d of the distribution P(x|x) (redarea) (B) Positional information (blue) and probability offalse assignment (red) as a function of positional error sxBlue arrow indicates the value of positional error sx =d used for the toy example in A specifically we usedNn = 59 nuclei uniformly tightly packing the central 80of the AP axis (L = 08 and thus d = LNn) In comparison

the 1 positional error inferred from data in Dubuis et al (2013b) corresponds to the green arrow The information is computed using Equation 22and made to saturate at Imax = log2(Nn) bits (perfect identifiability) All parameters are chosen to roughly match the results of the Drosophila gapgene analysis

46 G Tkacik et al

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 9: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

Dubuis et al (2013a) We present an analysis of four data sets(A B C and D) data sets AndashC have been processed simulta-neously but imaged in different sessions data set D has beenprocessed and imaged independently from the other data setsHence these datasets are well suited to assess the dependenceof our measurements and calculations on the experimentalprocessing Dataset A has been presented and analyzed pre-viously in Dubuis et al (2013ab) and Krotov et al (2013)

Fluorescence intensities were measured using automatedlaser scanning confocal microscopy processing hundreds ofembryos in one imaging session Cross-sectional images ofmulticolor-labeled embryos were taken with an imagingfocus at the midsagittal plane the center plane of theembryo with the largest circumference Intensity profiles ofindividual embryos were extracted along the outer edge ofthe embryos using custom software routines (MATLABMathWorks Natick MA) as described in Houchmandzadehet al (2002) and Dubuis et al (2013a) The results in thefollowing sections are presented using exclusively dorsal in-tensity profiles and we report their projection onto the APaxis in units normalized by the total length L of the embryoyielding a fractional coordinate between 0 (anterior) and 1(posterior) The dorsal edge was chosen for its smaller cur-vature which limits geometric distortions of the profileswhen projecting the intensity onto the AP axis The AP axiswas uniformly divided into 1000 bins and the average pro-file intensity in each bin is reported This procedure resultsin a raw data matrix that lists for each embryo and for eachof the four gap genes one intensity value for each of the1000 equally spaced spatial positions along the AP axis Un-less specified differently all our results are reported for themiddle 80 of the AP axis (from x = 01 to x = 09) con-sistent with Dubuis et al (2013b) at the embryo poles geo-metric distortion is large and patterning mechanisms aredistinct from the middle

For any successful information-theoretic analysis the dom-inant source of observed variability in the data set must bedue to the biological system and not due to the measure-ment process requiring tight control over systematic mea-surement errors Gap gene expression levels critically dependon the developmental stage and on the imaging orientationof the embryo Therefore we applied strict selection criteriaon developmental timing and orientation angle (see Dubuiset al 2013a) In this article we restrict our analysis to a timewindow of )10 min 38ndash48 min into nuclear cycle 14 Dur-ing this time interval the mean gap gene expression levelspeak and overall temporal changes are minimal A carefulanalysis of residual variabilities due to measurement error (ie age determination orientation imaging antibody nonspe-cificity spectral crosstalk and focal plane determination)reveals that the estimated fraction of observed variance inthe gap gene profiles due to systematic and experimentalerror is 20 of the total variance in the pool of profilesie 80 of the variance is due to the true biological var-iability (Dubuis et al 2013a) satisfying the condition for theinformation-theoretic analysis to yield true biological insight

In most cases expression profiles need to be normalizedbefore they can be compared or aggregated across embryosCareful experimental design and imaging conditions canmake normalization steps essentially superfluous (Dubuiset al 2013a) but such control may not always be possibleTo deal with experimental artifacts we introduce three pos-sible types of normalizations which we call Y alignment Xalignment and T alignment In Y alignment the recordedintensity of immunostaining for each profile of a given gapgene G(m)(x) recorded from embryo m ethm frac14 1 N THORN isassumed to be linearly related to the (unknown) true con-centration profile g(m)(x) by an additive constant am and anoverall scale factor bm (to account for the background andstaining efficiency variations from embryo to embryo respec-tively) so that G(m)(x) = am+bmg(m)(x) We wish to minimizethe total deviation x2 of the concentration profiles from themean across all embryos The objective function is

x2Y

nambm

o$frac14

XN

mfrac141

Z 1

0dx

GethmTHORNethxTHORN2

ham thorn bmgethxTHORN

i$2

(23)

where gethxTHORN frac14 N21Pmg

ethmTHORNethxTHORN denotes an average concentra-tion profile across all N embryos The parameters am bmare chosen to minimize the x2

Y and thus maximize align-ment and since the cost function is quadratic this optimi-zation has a closed-form solution (Gregor et al 2007aDubuis et al 2013a)

Additionally one can perform the X alignment where allgap gene profiles from a given embryo can be translatedalong the AP axis by the same amount this introduces onemore parameter gi per embryo (not per expression profile)which can be again determined using x2 minimization sim-ilar to the above (In practice one first carries out the trac-table Y alignment followed by a joint minimization of x2 fora b g which needs to be carried out numerically) Thereare two candidate sources of variability that can be compen-sated for by X alignments (i) our error in the exact deter-mination of the AP axis ie in the exact end points of theembryo due to image processing and (ii) real biologicalvariability that would result in all the nuclei within the em-bryo being rigidly displaced by a small amount along thelong axis of the embryo relative to the egg boundary

Finally the T alignment attempts to compensate for thefraction of observed variability that is due to the systematicchange in gene profiles with embryo age even within thechosen 10-min time bracket We can use our knowledge ofhow the mean profiles evolve with time and detrend theentire data set in a given time window by this evolution ofthe mean profile We follow the procedure outlined in Dubuiset al (2013a) to carry out this alignment

In sum each of the three alignments X Y and T subtractsfrom the total variance the components that are likely to havean experimental origin and do not represent properly eitherthe intrinsic noise within an embryo or natural variabilitybetween the embryos since the total variance will be lower

Positional Information and Error 47

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 10: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

after alignment successive alignment procedures should leadto increases in positional information Strictly speaking thelower bound on positional information would be obtained byestimating it using raw data without any alignment thusascribing all variability in the recorded profiles to the truebiological variability in the system but unless the control overexperimental variability is excellent this lower bound mightbe far below the true value For instance if embryos cannotbe stained and imaged in a single session it would bevery hard to guarantee that there are no embryo-to-embryovariabilities in the antibody staining and imaging backgroundFor that reason we view the Y alignment as the minimalprocedure that should be performed unless staining andimaging variability is shown to be negligible in dedicatedcontrol experiments The choice of performing alignmentsbeyond Y depends on system-specific knowledge about theplausible sources of experimental vs biological variability Forthese reasons all the results in this article are based onthe minimal Y alignment procedure (unless explicitly specifiedotherwise) thus yielding conservative information estimatesbut we also explore how these estimates would increasefor single genes and the quadruplet in the case of the otheralignments

Once the desired alignment procedure has been carriedout we can define the mean expression across the embryosgiethxTHORN for every gap gene i = 1 4 and choose the unitsfor the profiles such that the minimum value of each meanprofile across the AP axis is 0 and the maximal value is 1This is the final aligned and normalized set of profiles onwhich we carry out all subsequent analyses and from whichwe compute the covariance matrix Cij(x) of Equation 3

Applying the described selection criteria to the fourmentioned data sets yields embryo counts of N frac14 24 (A)N frac14 32 (B) N frac14 31 (C) and N frac14 102 (D) Figure 5 shows

the mean profiles and their variability for two of the data setsusing either the minimal (Y) or the full (XYT) alignment

Estimating information with limited amounts of dataMeasuring positional information from a finite number N ofembryos is challenging due to estimation biases Good esti-mators are more complicated than the naive approachwhich consists of estimating the relevant distributions bycounting and using Equation 7 for mutual information directlyNevertheless the naive approach can be used as a basis for anunbiased information estimator following the so-called directmethod (Strong et al 1998 Slonim et al 2005)

The easiest way to obtain a naive estimate for P(gi x) isto convert the range of continuous values for gi and x intodiscrete bins of size DN 3 D On this discrete domain we canestimate the distribution ~PDMethfgig xTHORN empirically by creat-ing a normalized histogram To this end we treat our datamatrix of (4 gap genes) 3 (N embryos) 3 (1000 spatialbins) as containing M frac14 10003N samples from the jointdistribution of interest A naive estimate of the positionalinformation IDIRDMethfgig xTHORN is

IDIRDMethfgig xTHORN

frac14P

fgigx~PDMethfgig xTHORNlog2

~PDMethfgig xTHORNePxDMethxTHORN ePgDMethfgigTHORN

(24)

where the subscripts indicate the explicit dependence onsample sizeM and bin size D It is known that naive estimatorssuffer from estimation biases that scale as 1M and DN+1Following Strong et al (1998) and Slonim et al (2005) wecan obtain a direct estimate of the mutual information by firstcomputing a series of naive estimates for a fixed value of D andfor fractions of the whole data set Concretely we pick fractions

Figure 5 Simultaneous measurements of gap gene ex-pression levels and profile alignment (A) The mean pro-files (thick lines) of gap genes Hunchback Giant Knirpsand Kruumlppel (color coded as indicated) and the profilevariability (shaded region = 61 SD envelope) across 24embryos in data set A normalized using the Y alignmentprocedure (B) The same profiles have been aligned usingthe full (XYT) procedure Shown is the region that corre-sponds to the dashed rectangle in A to illustrate the sub-stantially reduced variability across the profiles (C and D)Plots analogous to A and B showing the profiles and theirvariability for data set D

48 G Tkacik et al

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 11: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

m frac14 frac12095thinsp 09thinsp 085thinsp 08thinsp 075thinsp 05(3N of the total number ofembryos N At each fraction we randomly pick m embryos100 times and compute hIDIRDmethfgig xTHORNi (where averages aretaken across 100 random embryo subsets) This gives us a se-ries of data points that can be extrapolated to an infinite datalimit by linearly regressing hIDIRDmethfgig xTHORNi vs 1m (Figure 6A)The intercept of this linear model yields IDIRDmNethfgig xTHORN andwe can repeat this procedure for a set of ever smaller bin sizesD To extrapolate the result to very small bin sizes D 0 weuse the previously computed IDIRDN ethfgig xTHORN for various choicesof decreasing D and extrapolate to D 0 as shown in Figure6B At the end of this procedure we obtain the final estimateIDIRD0nNethfgig xTHORN called direct estimate (DIR) of positionalinformation (red square in Figure 6B) While no prior knowl-edge about the shape of the distribution P(gi x) is assumedby the direct estimation method a potential disadvantage isthe amount of data required which grows exponentially in thenumber of gap genes In practice our current data sets sufficefor the direct estimation of positional information carried si-multaneously by one or at most two gap genes

To extend this method tractably to more than two genesone needs to resort to approximations for P(gi|x) thesimplest of which is the so-called Gaussian approximationshown in Equation 4 In this case we can write down theentropy of P(gi|x) analytically For a single gene we get

S~PDMethgjxTHORN

ampfrac14 1

2log2

2pes2

gethxTHORN$thorn logD (25)

while the straightforward generalization to the case of Ngenes is given by

S~PDMethfgigjxTHORN

ampfrac14 1

2log2

eth2peTHORNN

))CethxTHORN))$thorn NlogD (26)

where |C(x)| is the determinant of the covariance matrixCij(x) From Equation 8 we know that I(g x) = S[P(g)] 2hS[P(g|x)]ix Here the second term (ldquonoise entropyrdquo)is therefore easily computable from Equation 25 for adiscretized version of the distribution ~P using the (co)variance estimate of gene expression levels alone The firstterm (total entropy) can be estimated as above by thedirect method ie by making an empirical estimate of~PDMethgTHORN for various sample sizes M and bin sizes D and ex-trapolating MN and D 0 This combined procedurewhere we evaluate one term in the Gaussian approxima-tion and the other one directly has two important proper-ties First the total entropy is usually much better sampledthan the noise entropy because it is based on the values ofg pooled together over every value of x it can therefore beestimated in a direct (assumption-free) way even when thenoise entropy cannot be Second by making the Gaussianapproximation for the second term we are always overes-timating the noise entropy and thus underestimating thetotal positional information because the Gaussian distribu-tion is the maximum entropy distribution with a given

mean and variance Therefore in a scenario where the firstterm in Equation 25 is estimated directly while the secondterm is computed analytically from the Gaussian ansatz wealways obtain a lower bound on the true positional infor-mation We call this bound (which is tight if the conditionaldistributions really are Gaussian) the first Gaussian approx-imation (FGA)

For three or more gap genes the amount of data can beinsufficient to reliably apply either the direct estimate or FGAand one needs to resort to yet another approximation calledthe second Gaussian approximation (SGA) As in the FGA forthe second Gaussian approximation we also assume thatPethfgigjxTHORN Gethfgig giethxTHORNCijethxTHORNTHORN is Gaussian but we make an-other assumption in that Pg(gi) obtained by integrating overthese Gaussian conditional distributions is a good approxima-tion to the true Pg(gi) The total distribution we use for theestimation is therefore a Gaussian mixture

PgethfgigTHORN frac14Z 1

0dx thinsp PethfgigjxTHORN (27)

For each position the noise entropy in Equation 26 is pro-portional to the logarithm of the determinant of the covariancematrix whose bias scales as 1M if estimated from a limitednumber M of samples We therefore estimate the informa-tion ISGAM ethfgig xTHORN for fractions of the whole data set and thenextrapolate for MN

Figure 7 compares the three methods for computing thepositional information carried by single gap genes in the10ndash90 egg length segment Across all four gap genes and allfour data sets the methods agree within the estimation errorbars This provides implicit evidence that at least on thelevel of single genes the Gaussian approximation holds suf-ficiently well for our estimation methods

Figure 6 Direct estimation for the positional information carried byHunchback (A) Extrapolation in sample size for different bin sizes Dstarting with 10 bins for the bottom line and increasing to 50 bins forthe top line in increments of 2 Black points are averages of naive esti-mates over 100 random choices of m embryos (error bars = SD) plottedagainst 1m on the x-axis Lines and blue circles represent extrapolationsto the infinite data limit m N (B) The extrapolations to the infinitedata limit (blue points from A) are plotted as a function of the bin sizeand the second regression is performed (blue line) to find the extrapola-tion of D 0 The final estimate represented by the red square isIDIR(hb x) = 226 6 004 bits the error bar is the statistical uncertaintyin the extrapolated value due to limited sample size

Positional Information and Error 49

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 12: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

Figure 8 compares the estimation results across data setsand the alignment methods With the exception of data setD data under T alignments the information estimates areconsistent across data sets As expected successive align-ment procedures remove systematic variability and increasethe information by comparable amounts so that the max-imal differences (between the minimal Y alignment and themaximal XYT alignment) are )25 21 21 and 11for kni Kr gt and hb respectively when averaged acrossdata sets

Merging data from different experiments To computepositional information carried by multiple genes from ourdata using the second Gaussian approximation we needto measure N mean profiles giethxTHORN and the N 3 N covari-ance matrix Cij(x) While measuring giethxTHORN is a standardtechnique estimating the covariance matrix Cij(x)requires the simultaneous labeling of all N gap genes ineach embryo using fluorescent probes of different colorsWhile simultaneous immunostainings of two genes arenot unusual it is not easy to scale the method up to moregenes while maintaining a quantitative readout While fordata sets AndashD we succeeded in doing a simultaneous four-way stain in general this is not always feasible so wedeveloped an estimation technique for inferring a consis-tent N 3 N covariance matrix based on multiple collec-tions of pairwise stained embryos

Estimating a joint covariance matrix from pairwisestaining experiments is nontrivial for two reasons Firsteach diagonal element of the covariance matrix ie thevariance of an individual gap gene is measured in multipleexperiments but the obtained values might vary due to mea-surement errors Second true covariance matrices are posi-tive definite ie det(C(x)) 0 a property that is notguaranteed by naively filling in different terms of the matrixby computing them across sets of embryos collected in differ-

ent experiments This is a consequence of small samplingerrors that can strongly influence the determinant of the ma-trix We therefore need a principled way to find a single bestand valid covariance matrix from multiple partial observa-tions a problem that has considerable history in statisticsand finance (see eg Stein 1956 Newey and West 1986)

We start by considering a number N ij of embryos that havebeen costained for the pair of gap genes (i j) Let the full dataset consist of all such pairwise stainings for N gap genes this is

a total of-N2

pairwise experiments where i j = 1 N

and i j Thus in the case of the four major gap genes inDrosophila embryos kni Kr gt and hb the total number ofrecorded embryos isN frac14

PethijTHORNN ij where the sum is across all

six pairwise measurements (1 2) (1 3) (1 4) (2 3) (2 4)(3 4) These pairwise measurements give us estimates of themean profile giethxTHORN and the 2 3 2 covariance matrices for eachpair (i j) CijethxTHORN

For four gap genes and six pairwise experiments we willget six 2 3 2 partial covariance matrices C and 6 3 2estimates of the mean profile g Our task is to find a singleset of four mean profiles giethxTHORN and a single 4 3 4 covariancematrix Cij(x) that fits all pairwise experiments best In thenext paragraphs we show how this can be computed for thearbitrary case of N gap genes

To infer a single set of mean profiles giethxTHORN and a singleconsistent N3 N covariance matrix Cij(x) we use maximum-likelihood inference We assume that at each position x ourdata are generated by a single N-dimensional Gaussian dis-tribution of Equation 4 with unknown mean values and anunknown covariance matrix which we wish to find but wecan observe only two of the mean values and a partial co-variance in each experiment the other variables are inte-grated over in the likelihood

Following this reasoning the log likelihood of the datafor pairwise staining (i j) at position x is

Figure 7 Comparing estimation methods for positionalinformation carried by single gap genes Shown are theestimates for four gap genes (color coded as in Figure 5)using four data sets (data sets A B C and D) and fourdifferent alignment methods (Y YT XY and XYT) for 64total points per plot Dashed line shows equality Differentalignment methods result in a spread of information val-ues for the same gap gene as shown explicitly in Figure 8

Lij frac141N ij

logZ Y

k 6frac14ijdgk thinsp PethfgigjxTHORN

frac14 1N ij

lnYN ij

mfrac1413 thinsp

exp2eth1=2THORN

--Cjj

gethmTHORNi 2gi

$2thorn Cii

gethmTHORNj 2gj

$22 2Cijg

ethmTHORNi gethmTHORNj

0CiiCjj 2C2

ij

$1

2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCiiCjj 2C2

ij

q (28)

50 G Tkacik et al

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 13: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

where the log-likelihood L as well as the mean profilescovariance elements and the measurements all depend on xindex m enumerates all theN ij embryos recorded in a pairwiseexperiment (i j)

Since all pairwise experiments are independent measure-ments the total likelihood LtotethxTHORN at a given position x is thesum of the individual likelihoods LtotethxTHORN frac14

PethijTHORNLijethxTHORN After

some algebraic manipulation the total log likelihood can bewritten in terms of the empirical estimates for the meanprofiles gi and covariances Cij of the separate pairwiseexperiments

LtotethxTHORN frac14 2P

ethijTHORNlneth2pTHORN thorn ln

CiiethxTHORNCjjethxTHORN2C2

ijethxTHORN$

thorn thinsp CjjethxTHORNCijethxTHORN2 2giethxTHORNgiethxTHORN thorn g2i ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thornthinsp CiiethxTHORNCjjethxTHORN2 2gjethxTHORNgjethxTHORN thorn g2j ethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

thorn thinsp 2CijethxTHORN

3CijethxTHORN2 giethxTHORNgjethxTHORN2 gjethxTHORNgiethxTHORN thorn giethxTHORNgjethxTHORN

CiiethxTHORNCjjethxTHORN2C2ijethxTHORN

For each position x we search for giethxTHORN and Cij(x) that maximizeLtotethxTHORN Before proceeding however we have to guarantee

that the search can take place only in the space of positivesemidefinite matrices Cij(x) (ie det C$ 0) We enforce thisconstraint by spectrally decomposing Cij and parameterizingit in its eigensystem (Pinheiro and Bates 2007) To this endwe write C(x) = PDPT where D is a diagonal matrix param-eterized with the variables a1 aN that determine thediagonal elements in D ie Dii = exp(ai) The orthonormalmatrix P is decomposed as a product of the N(N 2 1)2rotation matrices in N dimensions P frac14

QNethN21THORN=2kfrac141 RkethukTHORN

where Rk(uk) is a Euclidean rotation matrix for angle uk

acting in each pair of dimensions indexed by kIn the spectral representation the likelihood is a function

of N values gi N parameters ai and N(N2 1)2 angles uk ateach x In the case of four gap genes this is a total of 14parameters that need to be computed by maximizing LtotethxTHORNat each location x

There is no guarantee that there is a unique minimumfor the log likelihood moreover there could exist sets ofcovariance matrices that all lead to essentially the samevalue for LtotethxTHORN or even more dangerously the maximum-likelihood solution could favor matrices with vanishingdeterminants especially when estimating from a small num-ber of samples To address these issues we regularize theproblem by replacing D D + lI where I is the identitymatrix and l is the regularization parameter Larger valuesof l will favor more ldquosphericalrdquo distributions while smallvalues will allow distributions that can be very squeezedin some directions We thus maximize Ltotethx lTHORN to find thebest set of parameters given the value of the regularizer(which is assumed to be the same for every x) To set lwe use cross-validation the maximum-likelihood fit is per-formed for various choices of l not over all available data(embryos) but only over a training subset The remainingembryos constitute a test subset Parameter fits for differentl obtained on the training data can be assessed and com-pared by evaluating their likelihood over test data andselecting the value of l that maximizes the model likelihoodover the testing set

To compute giethxTHORN and Cij(x) we initialize giethxTHORN to themean profile across all pairwise experiments (i j) weinitialize ai to the mean of the log of the diagonal termsCii and we initialize all rotation angles uk = 0 TheNelderndashMead simplex method is used to maximize LtotethxTHORN(Lagarias et al 1998) Finally we compute C(x) from ai

and ukFinally we use our quadruple-stained data sets to test our

pairwise merging procedure We partition the 102 embryosfrom data set D into six disjoint subsets of 14 embryos each(while retaining 18 embryos for validation) and in eachsubset we consider only one of the six possible pairs of gapgene data that is in every subset for a pair of genes (i j)we measured the mean expression profiles giethxTHORN gjethxTHORN andthe 2 3 2 covariance matrix Cij(x) the inputs to the maximum-likelihood merging procedure Note that this benchmarkuses real data from data set D by simply ldquoblacking outrdquocertain measured values so that for every embryo only

Figure 8 Consistency across data sets and a comparison of alignmentmethods for positional information carried by single gap genes Directestimates with estimation error bars are shown for four gap genes (colorcoded as in Figure 5) in separate panels On each panel different align-ment methods (Y YT XY and XYT) are arranged on the horizontal axisand for every alignment method we report the estimation results usingdata sets A B C and D (successive plot symbols from left to right) andtheir average (thick horizontal line)

Positional Information and Error 51

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 14: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

one pair of gap genes remains visible to the merging pro-cedure The inferred 4 3 4 covariance matrix can be com-pared to the covariance matrix estimated from the completedata set The results of this procedure are shown in Figure 9demonstrating that pairwise measurements can be mergedinto a consistent covariance matrix whose determinantmatches well with the determinant extracted from the com-plete data set Note that determinants are compared becausethey are very sensitive to the inferred parameters and enterdirectly into the expressions for positional information andpositional error Importantly filling in the covariance matrixnaively or skipping the regularization results in either non-positive-definite matrices or matrices whose determinantsare close to singular at multiple values of position x Thepresented maximum-likelihood merging procedure will al-low our framework to be applied to model systems wheresimultaneous gap gene measurements are hard to obtainbut pairwise stains are feasible

Monte Carlo integration of mutual information An-other technical challenge in applying the second Gaussianapproximation for positional information to three or moregap genes lies in computing the entropy of the distributionof expression levels S[Pg(gi)] This is a Gaussian mixtureobtained by integrating P(gi|x) over all x as prescribedby Equation 27 In one or two dimensions one can evaluatethis integral numerically in a straightforward fashion bypartitioning the integration domain into a grid with finespacing D in each dimension evaluating the conditionaldistribution P(gi|x) on the grid for each x and averagingthe results over x to get Pg(g) from which the entropycan be computed using Equation 9 Unfortunately for threegenes or more this is infeasible because of the curse ofdimensionality for any reasonably fine-grained partition

To address this problem we make use of the fact thatover most of the integration domain Pg(gi) is very smallif the variability over the embryos is small This meansthat most of the probability weight is concentrated inthe small volume around the path traced out in N-dimen-sional space by the mean gene expression trajectory giethxTHORNas x changes from 0 to 1 We designed a method thatpartitions the whole integration domain adaptively intovolume elements such that the total probability weightin every box is approximately the same ensuring fine

partition in regions where the probability weight is con-centrated while simultaneously using only a tractablenumber of partitions

The following algorithm was used to compute the totalentropy S[Pg(gi)]

1 The whole domain for gi is recursively divided intoboxes such that no box contains 1 of the total domainvolume

2 For each box i with volume vi we use Monte Carlo sam-pling to randomly select t = 1 T points gt in the boxand approximate the weight of the box i as pethijxTHORN frac14viT21PT

tfrac141PethgtjxTHORN we explore different choices for T inFigure 11

3 Analogously we evaluate the approximate total weight ofeach box p(i) by pooling Monte Carlo sampled pointsacross all x

4 p(i|x) and p(i) are renormalized to ensureP

ipethijxTHORN frac14 1for every x and

PipethiTHORN frac14 1

5 The conditional and total entropies are computedas Snoise frac14 2 h

PipethijxTHORNlog2pethijxTHORNix and Stot frac14 2

PipethiTHORN

log2pethiTHORN6 The positional information is estimated as I(gi x) =

Stot 2 Snoise7 The box i = argmaxip(i) with the highest probability

weight p(i) is split into two smaller boxes of equal vol-ume v(i)2 and the estimation procedure is repeatedby returning to step 2 Additional Monte Carlo samplingneeds to be done only within the newly split box for theother boxes old samples can be reused

8 The algorithm terminates when the positional informa-tion achieves desired convergence or at a preset numberof box partitions

Figure 10 A and B shows how the algorithm works fora pair of genes (hb Kr) for which the joint probabilitydistribution is easy to visualize To get the final SGA esti-mate of positional information that the two genes jointlycarry about position extrapolation to infinite data size isperformed in Figure 10D

Figure 11 shows the MC estimate of the positional in-formation carried by the quadruplet of gap genes and itsdependence on the parameter T (the number of MC sam-ples per box per position) and the refinement of the adap-tive partition When T is too small (eg T = 100) we

Figure 9 Merging data sets using maximum-likelihoodreconstruction Dataset D (102 embryos) is used to simu-late six pairwise staining experiments with 14 unique em-bryos per experiment (A) The log likelihood of the mergedmodel on 18 test embryos (not used in model fitting) asa function of regularization parameter l l = 1024 is theoptimal choice for this synthetic data set (B) The compar-ison of the determinant of the 4 3 4 inferred covariancematrix (red) as a function of position x compared to thedeterminant of the full data set (black solid line) or thedeterminant of a subset of 14 embryos (black dashedline) where small sample effects are noticeable

52 G Tkacik et al

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 15: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

obtain a biased estimate of the information presumablybecause the evaluation of the distribution over boxes ispoor leading to suboptimal recursive partitioning WhenT is increased to T $ 200 the estimates for different Tstart converging as the number of adaptive boxes is in-creased and the final estimates (after extrapolation to aninfinitely fine partition) for different T agree to withina few percent

Application to the Drosophila gap gene system

Information in single gap genes and gap gene pairsInformation carried by single genes is reported in Table 1The values are estimated using the direct method whichagrees closely with the alternative estimation methods(FGA and SGA cf Figure 7) Measured across four indepen-dent experiments the information values are consistent towithin the estimated error bars indicating that not onlyembryo-to-embryo experimental variability can be broughtunder control as described in Dubuis et al (2013a) but alsoexperiment-to-experiment variability is small

Single genes carry substantially more than 1 bit ofpositional information sometimes even more than 2 bitsthus clearly providing more information about position thanwould be possible with binary domains of expression Howis this information combined across genes We can look atthe (fractional) redundancy of K genes coding for positiontogether

R frac14PK

ifrac141Iethgi xTHORN2 Iethfgig xTHORNIethfgig xTHORN

(29)

by comparing the sum of individual positional informationsto the information encoded jointly by pairs (K = 2) of gapgenes Note that R= 1 for a totally redundant pair R= 0 fora pair that codes for the position independently and R= 21for a totally synergistic pair where individual genes carryzero information about position

At the level of pairs information is always redundantlyencoded R 0 although redundancy is relatively small()20) as shown in Figure 12 Such a low degree of re-dundancy is expected because gap genes are often expressedin complementary regions of the embryo in nontrivial com-binations and will thus mostly convey new informationabout position Unlike in our toy examples in the Introduc-tion real gene expression levels are continuous and noisyand some degree of redundancy might be useful in mitigat-ing the effects of noise as has been shown for other biolog-ical information processing systems (Tkacik et al 2010)Overall with the addition of the second gene positionalinformation increases well above the 05 bits that wouldbe expected theoretically for fully redundant gene profilesThe increase is also larger than the theoretical maximumfor nonredundant (but noninteracting) genes where the in-crease for the second gap gene would be limited to 1 bit(Tkacik et al 2009) The ability to generate nonmonotonicprofiles of gene expression (eg bumps) is therefore crucialfor high information transmissions achieved in the Drosoph-ila gap gene network (Walczak et al 2010)

Information in the gap gene quadruplet How muchinformation about position is carried by the four gap genes

Figure 10 Monte Carlo integration of information fora pair of genes (A) The (log) distribution of gene expres-sion levels Pg(hb Kr) obtained by numerically averagingthe conditional Gaussian distributions for the pair over all x(data set A) For two genes this explicit construction of Pgis tractable and can be used to evaluate the total entropyS[Pg] (B) As an alternative one can use the Monte Carloprocedure (with T = 100) outlined in the text which usesadaptive partitioning of the domain shown here Theboxes here 104 are dense where the distribution containsa lot of weight (C) The comparison between MC evalu-ated positional information carried by the hbKr pair andthe exact numerical calculation as in A over different ran-dom subsets of m = 16 24 embryos with 10 randomsubsets at every m The MC integration underestimatesthe true value by 01 on average with a relative scatterof 43 1024 (D) To debias the estimate of the informationdue to the finite sample size used to estimate the covari-ance matrix a SGA extrapolationm N is performed onthe estimates from C to yield a final estimate of I(hb Krx) = 344 6 002 bits

Positional Information and Error 53

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 16: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

together referred to as the gap gene quadruplet Figure 13shows the estimation of positional information using MCintegration for all four data sets The average positional in-formation across four data sets is I = 43 6 007 bits andthis value is highly consistent across the data sets Notablythe information content in the quadruplet displays a highdegree of redundancy the sum of single-gene positional in-formation values is substantially larger than the informationcarried by the quadruplet with the fractional redundancy ofEquation 29 being R = 084 Possible implications of sucha redundant representation are addressed in the Discussion

To assess the importance of correlations in the expressionprofiles and their contribution to the total information westart with the covariance matrices Cij(x) of data set A whichwe artificially diagonalize by setting the off-diagonal ele-ments to zero at every x This manipulation destroys thecorrelations between the genes making them conditionallyindependent (mechanistically off-diagonal elements in thecovariance matrix could arise as a signature in gene expres-sion noise of the gap gene cross interactions) With thesematrices in hand we performed the information estimationusing Monte Carlo integration analogous to the analysis inFigure 13 Surprisingly we find a minimal increase in in-formation from I = 42 6 003 bits to I = 44 6 002 bitswhen the correlations are removed This indicates that gapgene cross interactions play a minor role in reshaping thegene expression variability at least insofar as that influencesthe total positional information

We also evaluated the importance of the alignmentprocedure for estimating total information Again we usedata set A to recompute the MC information estimates withYT XY and XYT alignments and compare these with the Yalignment procedure used in Figure 13 The positional infor-mation encoded by the quadruplet is I = 436 003 (for YT)

I = 47 6 006 (for XY) and I = 48 6 006 (for XYT)respectively This shows that the temporal (T) alignment doesnot change the information much probably because our strin-gent initial selection cutoff on the depth of the membranefurrow canal has picked out the profiles that are sufficientlylocalized in time around their stable shapes In contrast Xalignment emerges as important generating an extra half bitof positional information This suggests that either our de-termination of the AP axis used to assign a coordinate toevery gene expression has a random embryo-to-embryo error(which is unlikely given the visual inspection of microscopyimages and extracted AP axes) or the gap gene expressionpattern is intrinsically variable in that it shifts rigidly fromembryo to embryo relative to the egg boundary This wouldimply that correlated readout errors that the nuclei mightmake are less harmful than uncorrelated errors If two neigh-boring nuclei are positioned both toward the left or both to-ward the right of the ldquotruerdquo pattern with some positional errorsx this is very different from each of the nuclei randomlyperturbing its position with noise of magnitude sx in the firstcase the rank order of the nuclear identities is preservedwhile in the second case it need not be possibly leading toa detrimental spatial mixing of the cell fates

Decoding information and the resulting positional errorPositional information is a single aggregate or global mea-sure that quantifies the performance of the patterning systembut we can also ask about such performance position byposition using the positional error of Equation 21

We start by looking at a single gene g for which wecompute the positional error (for equally spaced positionsalong the AP axis) using Equation 17 Figure 14 shows thatby reading out single gap genes (hb or Kr) the nuclei canalready achieve positional errors of 15 egg length inspecific regions of the embryo yet fail to do so in otherregions Positional error is smaller in the regions of highprofile slope where the variations in gene expression arereliably translated into variations in position Converselythe error formally diverges at the peaks and troughs of theprofiles where small variations in gene expression cannotefficiently map to changes in position Were the noise muchhigher this conclusion would not holdmdashin that regime onecould distinguish solely between eg a domain of minimaland a domain of maximal expression and the positionalinformation would correspond better to the intuitive pictureof gap genes that define on and off domains

Figure 11 Monte Carlo integration of mutual information for the gapgene quadruplet The estimation uses N frac14 24 data set A embryos with-out correcting for the finite number of embryos (ie the empirical co-variance matrix is taken to be the ground truth for this analysis) Shownare the mean and 1-SD scatter over 10 independent MC estimations foreach choice of T (number of samples per box) The information can thenbe linearly regressed against the inverse number of boxes for last 2000partitions and extrapolated to an infinitely fine partition for each T theseextrapolations are shown at right (crosses)

Table 1 Information (in bits) carried by single genes

Data set kni Kr gt hb

A 182 6 006 194 6 007 181 6 006 226 6 004B 181 6 004 193 6 006 179 6 005 229 6 005C 188 6 007 204 6 005 184 6 005 219 6 005D 179 6 004 194 6 004 191 6 003 221 6 003Mean 183 6 004 196 6 005 184 6 005 224 6 005

For each data set we report the direct estimate of positional information Error barsare the estimation errors The bottom row contains the (mean 6 SD) over data sets

54 G Tkacik et al

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 17: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

An important advantage of using positional error is that itcan be naturally generalized to quantify the precision of localposition readout using more than one morphogen gradientIn Figure 14 E and F we analyze the positional error giventhe joint readout of the hb Kr pair The results show thatthe optimal positional decoding performed with several genes(eg N= 2) at a given x does not correspond to the positionalerror carried by the most informative gene at that positionthe combined error can be smaller than the individual errorsdue to the noise averaging by the N readouts as well as dueto the correlation structure in the variability of the N profiles

Finally using the gap gene quadruplet simultaneouslythe positional error can reach an average value of ) 1while never exceeding a few percent as shown for all fourdata sets in Figure 15 This shows that the gap genes estab-lish a convenient ldquochemical coordinate systemrdquo in which itis in principle possible to position any feature along theentire length of the AP axis with roughly 1 precision theuniform coverage of the AP axis is a sign of efficient encod-ing of positional information (Dubuis et al 2013b) Notethat 1 positional error still does not correspond to perfectnuclear identifiability as shown in Figure 4 the error prob-ability of switching the identities of two nearest neighbornuclei remains )16 (and the positional information is

)15 bits below that needed for perfect identifiability) Thisprecision might be sufficient for development as suggestedby the matching precision of downstream markers (Dubuiset al 2013b) Alternatively the positional errors of neigh-boring nuclei are correlated in which case the gap genescould have sufficient information to uniquely determine theorder of nuclear identities (but not their absolute position)despite 1 positional error In either case the consequencesof gene expression variability are truly small and we expectthat the approximation to the positional information usingpositional error given by Equation 22 could hold

To systematically check whether the gene quadrupletsystem really is within the small noise limit where expres-sions of Equations 17 21 and 22 should hold we performthe analysis summarized in Figure 16 We systematicallyscale the measured gene variability in data set A up or downby a factor q to generate synthetic data sets and compare thepositional information computed directly using MC integra-tion on these synthetic data with the approximation of Equa-tion 22 computed using the positional error Across therange of q (extending from

ffiffiffiffiffiffiffi04

p 063 times the observed

noise toffiffiffi5

p 224 times the observed noise) the difference

between both estimates of positional information is3 Forsmall q the information for the synthetic data (which isGaussian by construction) should be equal to the informationderived from positional error computed using the full expres-sion for the Fisher information given by Equation 20 In factFigure 16 shows that simple approximations to the Fisherinformation leading to Equations 17 and 21 are sufficientfor a very good match Even when the noise is increased forq 1 the approximation remains surprisingly good

The agreement between the Monte Carlo estimate ofpositional information and its approximation based onpositional error observed in a controlled setting of Figure 16is reflected in the analogous comparison on the real datahighlighted in Figure 13 Here the Monte Carlo estimate isby )01 bit larger than the estimate from positional errorcorresponding to a relative difference of )25 and formallystill within the error bars of both estimates Both analysessuggest that the gap gene quadruplet truly is in the small

Figure 13 Positional information car-ried by the quadruplet kni Kr gt hbof gap genes The four panels corre-spond to the four data sets Shown isthe extrapolation to the infinite datalimit (M N) for SGA estimates usingMonte Carlo integration (black plot sym-bols and dark red extrapolated value inbits) as well as for estimation of posi-tional information from positional error(gray plot symbols and light red extrap-

olated value in bits) For MC estimation 25 subsets of embryos were analyzed per subset size For estimation from positional error 100 estimates at eachsubsample size (data fractions f = 05 055 085 09) were performed In the MC estimation where MC sampling contributes to the error on theextrapolated information value the error on the final estimate is taken to be the SD of the estimates over 25 subsets at the highest data fraction(smallest 1M) For estimation from positional error that does not use a stochastic estimation procedure the error estimate on the final extrapolation isthe variance due to small number of samples estimated as 2212 times the SD over 100 estimates at half the data fraction

Figure 12 Positional information carried by pairs of gap genes and theirredundancy For each gene pair (horizontal axis) the first bar representsthe sum of individual positional informations (color coded as in Figure 5)using the direct estimate The second (black) bar represents the positionalinformation carried jointly by the gene pair estimated using SGA withdirect numerical evaluation of the integrals Fractional redundancy R(Equation 29) is reported above the bar

Positional Information and Error 55

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 18: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

noise regime (note that this might not be true for single genesor gene pairs) and that tractable approximations to positionalinformation are therefore available

Discussion

To generate a differentiated body plan during the develop-ment of a multicellular organism cells with identical geneticmaterial need to reproducibly acquire distinct cell fatesdepending on their position in the embryo The mechanismsof establishment and acquisition of such positional informa-tion have been widely studied but the concept of positional

information itself has surprisingly eluded formal definitionHere we have provided a mathematical framework forpositional information and positional error based on infor-mation theory These are principled measures for quan-tifying how much knowledge cells can gain about theirabsolute location in the embryomdashand thus how preciselythey can commit to correct cell fatesmdashby locally readingout noisy gene expression profiles of (possibly multiple)morphogen gradients From this broad perspective ourframework is a mathematical realization of the classic ideasput forth by Wolpert almost 50 years ago (Wolpert 1969)

An information-theoretic formulation of positional in-formation has a number of very attractive features Firstand foremost it is mechanism independent Many plau-sible mechanisms exist for the establishment of spatial geneexpression patterns involving the processes of gene regula-tion signaling controlled degradation etc In particularthese mechanisms could involve spatial coupling eg bydiffusion (Little et al 2013) The spatial aspect of the prob-lem seemingly suggests that our definition of positional in-formation that considers gene expression locallymdashseparatelyat each spatial location xmdashis insufficient or incomplete How-ever irrespective of the mechanisms involved what ulti-mately matters for positional information is the final patternat the local scale where nuclei make decisions specificallywhat matters are the mean profile shapes and their (co)var-iability This thinking forces us to make a clear distinctionbetween the mechanistic processes that generate expres-sion profiles and the positional information that these pro-files carry

Our definition of positional information is also free ofassumptions of what specific geometric features of theexpression profilesmdashboundaries domains slopes etcmdashldquocarryrdquo or encode the information Much prior work has fo-cused on the sharpness of an expression boundary as a proxyfor positional information (Meinhardt 1983 Crauk andDostatni 2005 Dahmann et al 2011 Lopes et al 2012Zhang et al 2012) It is unclear however how to generalizethis approach to systems with more than one expressionboundary and even more profoundly we show that theboundary sharpness is not the feature that actually maximizespositional information In contrast the information-theoreticframework makes it clear in what particular way profile shapesand their (co)variabilities need to be combined into an appro-priate measure of positional information This measure and theassociated positional error naturally generalize to systems withan arbitrary number of gene gradients

Furthermore positional information inherits all theattractive features of mutual information The numericalvalue of mutual information can be interpreted in terms ofthe number of distinguishable states or in the developmen-tal context as the number of distinguishable positions andcell fates Information is reparameterization invariant itsvalue does not depend on what units or what scale eg logvs linear the expression levels are measured To estimatethe information carefully worked out estimation procedures

Figure 14 Positional error for a single gene and a pair of genes (A) Themean hb profile and standard deviation across N frac14 24 embryos of dataset A as a function of fractional egg length The inset shows a close-upwith the positional error sx determined geometrically from the mean gethxTHORNand the variability sg of the profiles (B) Positional error for hb computedusing Equation 17 (dashed line corresponds to sx = 001) Error bars areobtained by bootstrapping 10 times over N =2 embryo subsets (C and D)Plots analogous to A and B for Kr (E) Three-dimensional representation ofhb and Kr profiles from A and C as a function of position x The mean andstandard deviations of the gene expression levels are shown in thehb Kr plane in a gray curve with error bars cf the joint distribution ofhb and Kr expression levels in Figure 11A The black curve that extendsthrough the cube volume shows the average expression ldquotrajectoryrdquo asa function of position x (F) Positional error computed using Equation 21for the hb Kr pair

56 G Tkacik et al

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 19: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

exist and can be adapted to the developmental context Fi-nally as experimental variability can only decrease the in-formation the computed values will always be conservativelower-bound estimates of the information that approachthe true value from below as experimental techniquesadvance

Methodologically we have provided enough detail to makethis framework applicable to different systems and experi-mental setups In particular we have shown how data can bealigned and normalized if necessary how different partialstainings can be merged into a consistent data set and howanalysis methods including inference from data numericalcomputation and information estimation should be performedon real experimental data Importantly we have proposed howinformation can be estimated in the small noise regime (whichis likely applicable in different systems) and how the consis-tency of these estimates can be validated

An important shortcoming of the proposed analysis frame-work is the assumption that positional information can be readout from steady-state expression patterns Expression patternsof the gap gene system are highly dynamic even within a givennuclear cycle While they appear most stable in the particulartime class during nuclear cycle 14 that we chose for ouranalysis whether that constitutes a true steady state has beendebated (Bergmann et al 2007 Siggia 2008) There existother systems eg the segmentation clock in vertebrate somiteformation (Pourquie 2001) which are intrinsically dynamicWhile it is possible that positional information in all of thesesystems is encoded in the expression levels in particular timewindows it could (at least in principle) also be encoded in fullgene expression trajectories ie the temporal sequences ofgene expression levels Extending the information-theoreticapproach presented here to include such a temporal aspect willbe an important direction of future research

A significant drawback of the current experiments is theirrestriction to extracting expression profiles from below thedorsal embryo surface The resulting analysesmdashincludingthis onemdashtherefore confound the expression levels of neigh-boring nuclei with the levels in the interstitial cytoplasmicspace while the only expression levels that are meaningfulfor regulating downstream genes are the nuclear ones Ide-ally a data set would contain expression levels on a nucleus-by-nucleus basis (Myasnikova et al 2001 2009 Surkova

et al 2008) possibly encompassing all nuclei in three-dimensional (3D) reconstructed embryos (Fowlkes et al2008) While the analysis methods presented here can beeasily extended and applied to data sets containing discretenuclear expression levels collecting the large data setsneeded to estimate profile covariances is a challenge forcurrent nuclear and 3D experimental setups

The application of our methods to the Drosophila gap genesystem has generated several results beyond those reported inDubuis et al (2013b) which we now highlight First theresults are consistent across four different data sets withfractional deviations in information estimates of 5 Thisis comparable to the estimation errors on single data setsindicating the extremely high biological reproducibility as

Figure 15 Positional error for the gene quadruplet Posi-tional error has been estimated using Equation 21 andextrapolated to large sample size Error bars are bootstraperror estimates Different colors denote different data sets(inset)

Figure 16 The validity of the small noise approximation Synthetic datawere generated from data set A by keeping the mean profiles as inferredfrom the data forcing the covariance matrix to be diagonal (by zeroingout the off-diagonal terms) and multiplying the variances by a tunablefactor q (see inset) q = 1 corresponds to measured variances in the dataand q 1 (q 1) corresponds to decreasing (increasing) amount ofvariability For each q positional information was estimated using Equa-tion 21 (vertical axis) or using Monte Carlo integration (horizontal axis)Error bars are SD across 10 independent Monte Carlo integrations forevery q

Positional Information and Error 57

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 20: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

well as very stringent control over the experiment and dataanalysis Second the analysis of positional error explicitlyshows that information is encoded in the parts of the expres-sion profile that have high slopes (spatial derivatives) furtherweakening the interpretation of gap genes as providing sharpboundaries between expression domains Third we find thatat the level of the gene quadruplet the information is repre-sented very redundantly This opens up the possibility wheresuch redundancy could be used for robustness to differentexternal perturbations or mutations by allowing a measureof ldquoerror correctionrdquo in the downstream layer (Gierer 1991)Finally the difference between information estimates withand without X alignment implies that a noticeable fractionof biological variability across the embryos consists of rigidshifts of the full gap gene expression pattern along the APaxis This suggests that the biological system cares less aboutthe absolute position of each nucleus in the embryo and moreabout their relative positions (ie order along the AP axis)This is a topic of our future research

Literature Cited

Akam M 1987 The molecular basis for metameric pattern in theDrosophila embryo Development 101 1ndash22

Anderson K V 1998 Pinning down positional information dorsal-ventral polarity in the Drosophila embryo Cell 95 439ndash442

Ashe H L and J Briscoe 2006 The interpretation of morphogengradients Development 133 385ndash394

Bergmann S O Sandler H Sbarro S Shnider E Schejter et al2007 Pre-steady-state decoding of the bicoid morphogen gra-dient PLoS Biol 5 e46

Boumlkel C and M Brand 2013 Generation and interpretation ofFGF morphogen gradients in vertebrates Curr Opin GenetDev 23 415ndash422

Bollenbach T P Pantazis A Kicheva C Byumlokel M Gonzalez-Gaitan et al 2008 Precision of the Dpp gradient Development135 1137ndash1146

Brunel N and J P Nadal 1998 Mutual information Fisher infor-mation and population coding Neural Comput 10 1731ndash1757

Cover T M and J A Thomas 1991 Elements of InformationTheory John Wiley amp Sons New York

Crauk O and N Dostatni 2005 Bicoid determines sharp andprecise target gene expression in the Drosophila embryo CurrBiol 15 1888ndash1898

Dahmann C A C Oates and M Brand 2011 Boundary forma-tion and maintenance in tissue development Nat Rev Genet12 43ndash55

Driever W and C Nuumlsslein-Volhard 1988a A gradient of Bicoidprotein in Drosophila embryos Cell 54 83ndash93

Driever W and C Nuumlsslein-Volhard 1988b The Bicoid proteindetermines position in the Drosophila embryo in a concentration-dependent manner Cell 54 95ndash104

Dubuis J O R Samanta and T Gregor 2013a Accurate mea-surements of dynamics and reproducibility in small genetic net-works Mol Syst Biol 9 639

Dubuis J O G Tkacik E F Wieschaus T Gregor and W Bialek2013b Positional information in bits Proc Natl Acad SciUSA 110 16301ndash16308

Fakhouri W D A Ay R Sayal J Dresch E Dayringer et al2010 Deciphering a transcriptional regulatory code modelingshort-range repression in the Drosophila embryo Mol SystBiol 6 341

Ferrandon D L Elphick C Nuumlsslein-Volhard and D St Johnston1994 Staufen protein associates with the 39UTR of bicoidmRNA to form particles that move in a microtuble-dependentmanner Cell 79 1221ndash1232

Fowlkes C C C L L Hendriks S V E Keraumlnen G H Weber ORuumlbel et al 2008 A quantitative spatiotemporal atlas of geneexpression in the Drosophila blastoderm Cell 133 364ndash374

French V P J Bryant and S V Bryant 1976 Pattern regulationin epimorphic fields Science 193 969ndash981

Gergen J P D Coulter and E F Wieschaus 1986 Segmentalpattern and blastoderm cell identities pp 195ndash220 in Gameto-genesis and The Early Embryo edited by J G Gall Alan R LissNew York

Gierer A 1991 Regulation and reproducibility of morphogenesisSemin Dev Biol 2 83ndash93

Gregor T D W Tank E F Wieschaus andW Bialek 2007a Probingthe limits to positional information Cell 130 153ndash164

Gregor T E F Wieschaus A P McGregor W Bialek and D WTank 2007b Stability and nuclear dynamics of the bicoid mor-phogen gradient Cell 130 141ndash152

Grossniklaus U K M Cadigan and W J Gehring 1994 Threematernal coordinate systems cooperate in the patterning of theDrosophila head Development 120 3155ndash3171

Houchmandzadeh B E Wieschaus and S Leibler 2002 Es-tablishment of developmental precision and proportions in theearly Drosophila embryo Nature 415 798ndash802

Ingham P W 1988 The molecular genetics of embryonic patternformation in Drosophila Nature 335 25ndash34

Jaeger J 2011 The gap gene network Cell Mol Life Sci 68243ndash274

Jaeger J and J Reinitz 2006 On the dynamic nature of posi-tional information BioEssays 28 1102ndash1111

Kirschner M and J Gerhart 1997 Cells Embryos and EvolutionBlackwell Science Malden MA

Krotov D J O Dubuis T Gregor and W Bialek 2013 Mor-phogenesis at criticality Proc Natl Acad Sci USA 111 6301ndash6308

Lagarias J C J A Reeds M H Wright and P E Wright1998 Convergence properties of the Nelder-Mead simplexmethod in low dimensions SIAM J Optim 9 112ndash147

Lawrence P A 1992 The Making of a Fly The Genetics of AnimalDesign Blackwell Scientific Oxford

Lawrence P A and P Johnston 1989 Pattern formation in theDrosophila embryo allocation of cells to parasegments by even-skipped and fushi tarazu Development 105 761ndash767

Little S C G Tkacik T B Kneeland E F Wieschaus andT Gregor 2011 The formation of the bicoid morphogen gradi-ent requires protein movement from anteriorly localized mRNAPLoS Biol 9 e1000596

Little S C M Tikhonov and T Gregor 2013 Precise develop-mental gene expression arises from globally stochastic transcrip-tional activity Cell 154 789ndash800

Liu F A H Morrison and T Gregor 2013 Dynamic interpreta-tion of maternal inputs by the Drosophila segmentation genenetwork Proc Natl Acad Sci USA 110 6724ndash6729

Lopes F J A V Spirov and P M Bisch 2012 The role of Bicoidcooperative binding in the patterning of sharp borders in Dro-sophila melanogaster Dev Biol 370 165ndash172

Meinhardt H 1983 A boundary model for pattern formation invertebrate limbs J Embryol Exp Morphol 76 115ndash137

Meinhardt H 1988 Models for maternally supplied positionalinformation and the activation of segmentation genes in Dro-sophila embryogenesis Development 104 95ndash110

Myasnikova E A Samsonova K Kozlov M Samsonova and J Reinitz2001 Registration of the expression patterns of Drosophila segmen-tation genes by two independent methods Bioinformatics 17 3ndash12

58 G Tkacik et al

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59

Page 21: Positional Information, Positional Error, and …pub.ist.ac.at/~gtkacik/Genetics_PosInfoLong.pdfINVESTIGATION Positional Information, Positional Error, and Readout Precision in Morphogenesis:

Myasnikova E S Surkova L Panok M Samsonova and J Reinitz2009 Estimation of errors introduced by confocal imaging intothe data on segmentation gene expression in Drosophila Bio-informatics 25 346ndash352

Newey W K and K D West 1986 A simple positive semi-definite heteroskedasticity and autocorrelation consistent co-variance matrix NBER Technical Working Paper N 55 NationalBureau of Economic Research Cambridge MA

Nuumlsslein-Volhard C 1991 Determination of the embryonic axesof Drosophila Development 1(Suppl) 1ndash10

Nuumlsslein-Volhard C and E F Wieschaus 1980 Mutations affectingsegment number and polarity in Drosophila Nature 287 795ndash801

Okabe-Oho Y H Murakami S Oho and M Sasai 2009 Stableprecise and reproducible patterning of bicoid and hunchbackmolecules in the early Drosophila embryo PLoS Comput Biol5 e1000486

Paninski L 2003 Estimation of entropy and mutual informationNeural Comput 15 1191ndash1253

Papatsenko D 2009 Stripe formation in the early fly embryoprinciples models and networks BioEssays 31 1172ndash1180

Pinheiro J C and D M Bates 2007 Unconstrained parametriza-tions for variance-covariance matrices Stat Comput 6 289ndash296

Pourquie O 2001 The vertebrate segmentation clock J Anat199 169ndash175

Reinitz J E Mjolsness and D H Sharp 1995 Model for coop-erative control of positional information in Drosophila by bicoidand maternal hunchback J Exp Zool 271 47ndash56

Schier A F and W S Talbot 2005 Molecular genetics of axisformation in zebrafish Annu Rev Genet 39 561ndash613

Shannon C E 1948 A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656

Siggia E D 2008 Developmental regulatory bits Mol Syst Biol 4226

Slonim N G S Atwal G Tkacik and W Bialek 2005 Estimatingmutual information and multindashinformation in large networksarXivorg csIT0502017

Spradling A 1993 The Development of Drosophila melanogasteredited by M Arias Cold Spring Harbor Laboratory Press ColdSpring Harbor NY

Stein C 1956 Some problems in multivariate analysis part ITechnical Report 6 Department of Statistics Stanford Univer-sity Stanford CA

St Johnston D and C Nuumlsslein-Volhard 1992 The origin ofpattern and polarity in the Drosophila embryo Cell 68 201ndash219

Strong S P R Koberle R R de Ruyter van Steveninck andW Bialek 1998 Entropy and information in neural spiketrains Phys Rev Lett 80 197ndash200

Struhl G K Struhl and P M Macdonald 1989 The gradientmorphogen Bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

Surkova S D Kosman and K Kozlov Manu E Myasnikova et al2008 Characterization of the Drosophila segment determina-tion morphome Dev Biol 313 844ndash862

Tickle C D Summerbell and L Wolpert 1975 Positional signal-ing and specification of digits in chick limb morphogenesis Na-ture 254 199ndash202

Tkacik G C G Callan Jr and W Bialek 2008 Information flowand optimization in transcriptional regulation Proc Natl AcadSci USA 105 12265ndash12270

Tkacik G A M Walczak and W Bialek 2009 Optimizing in-formation flow in small genetic networks Phys Rev E StatNonlin Soft Matter Phys 80 031920

Tkacik G J S Prentice V Balasubramanian and E Schneidman2010 Optimal population coding by noisy spiking neuronsProc Natl Acad Sci USA 107 14419ndash14424

Tomancak P B Berman A Beaton R Weiszmann E Kwan et al2007 Global analysis of patterns of gene expression duringDrosophila embryogenesis Genome Biol 8 R145

Tsimring L S 2014 Noise in biology Rep Prog Phys 77026601

Turing A M 1952 The chemical basis of morphogenesis PhilosTrans R Soc Lond B Biol Sci 237 37ndash72

van Kampen N 2011 Stochastic Processes in Physics and Chemis-try Ed 3 North Holland Amsterdam

von Dassow G E Meir E M Munro and G M Odell 2000 Thesegment polarity network is a robust developmental moduleNature 406 188ndash192

Walczak A M G Tkacik and W Bialek 2010 Optimizing in-formation flow in small genetic networks II Feed-forward in-teractions Phys Rev E Stat Nonlin Soft Matter Phys 81041905

Witchley J N M Mayer D E Wagner J H Owen and P WReddien 2013 Muscle cells provide instructions for planarianregeneration Cell Rep 4 633ndash641

Wolpert L 1969 Positional information and the spatial patternof cellular differentiation J Theor Biol 25 1ndash47

Wolpert L 2011 Positional information and patterning revisitedJ Theor Biol 269 359ndash365

Zhang L K Radtke L Zheng A Q Cai T F Schilling et al2012 Noise drives sharpening of gene expression bound-aries in the zebrafish hindbrain Mol Syst Biol 8 613

Communicating editor G D Stormo

Positional Information and Error 59