shaped 3d singular spectrum analysis for quantifying gene expression, with application ... ·...

18
Research Article Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application to the Early Zebrafish Embryo Alex Shlemov, 1 Nina Golyandina, 1 David Holloway, 2 and Alexander Spirov 3,4 1 Faculty of Mathematics and Mechanics, St. Petersburg State University, Universitetsky Pr. 28, St. Peterhof, St. Petersburg 198504, Russia 2 Mathematics Department, British Columbia Institute of Technology, 3700 Willingdon Avenue, Burnaby, BC, Canada V5G 3H2 3 Computer Science and CEWIT, SUNY Stony Brook, 1500 Stony Brook Road, Stony Brook, NY 11794, USA 4 e Sechenov Institute of Evolutionary Physiology & Biochemistry, Torez Pr. 44, St. Petersburg 194223, Russia Correspondence should be addressed to Alexander Spirov; [email protected] Received 8 February 2015; Accepted 1 May 2015 Academic Editor: Shigehiko Kanaya Copyright © 2015 Alex Shlemov et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Recent progress in microscopy technologies, biological markers, and automated processing methods is making possible the development of gene expression atlases at cellular-level resolution over whole embryos. Raw data on gene expression is usually very noisy. is noise comes from both experimental (technical/methodological) and true biological sources (from stochastic biochemical processes). In addition, the cells or nuclei being imaged are irregularly arranged in 3D space. is makes the processing, extraction, and study of expression signals and intrinsic biological noise a serious challenge for 3D data, requiring new computational approaches. Here, we present a new approach for studying gene expression in nuclei located in a thick layer around a spherical surface. e method includes depth equalization on the sphere, flattening, interpolation to a regular grid, pattern extraction by Shaped 3D singular spectrum analysis (SSA), and interpolation back to original nuclear positions. e approach is demonstrated on several examples of gene expression in the zebrafish egg (a model system in vertebrate development). e method is tested on several different data geometries (e.g., nuclear positions) and different forms of gene expression patterns. Fully 3D datasets for developmental gene expression are becoming increasingly available; we discuss the prospects of applying 3D-SSA to data processing and analysis in this growing field. 1. Introduction Recent advances in microscopy technologies, biological markers, and automated processing methods are enabling sustained progress towards the long-standing goal of recon- structing embryogenesis by integrating cellular behavior and molecular dynamics [14]. New technology, including photonic microscopy approaches (reviewed in [5, 6]) and new biological markers (fluorescent proteins, photoactivatable compounds, and fluorescent nanoparticles such as quantum dots) [7], is producing quantitative high resolution data (spatial and temporal) at all levels of organization [812]. Reconstruction of a developmental atlas can be considered in stages, proceeding from an automated reconstruction of a cell lineage tree in space and time, annotated with quantitative information for cell shape, and adding on the spatiotemporal dynamics of gene expression in the cells [2, 4, 5]. ere are some impressive examples of progress in this new research area on several model experimental animals, where the details of embryo development have been tracked from the fertilized egg to the several thousand cell embryonic stages, with cellular level resolution quantitative data (in which each cell in the embryo is described by its 3D spatial coordinates, its lineage, and the expression level of the key genes). Such large-scale research projects require not only new high-throughput experimental approaches, but also new quantitative mathematical and computational approaches for the processing, analysis, and modeling of such extensive datasets [10, 1217]. Hindawi Publishing Corporation BioMed Research International Volume 2015, Article ID 986436, 18 pages http://dx.doi.org/10.1155/2015/986436

Upload: others

Post on 23-Mar-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

Research ArticleShaped 3D Singular Spectrum Analysis for Quantifying GeneExpression with Application to the Early Zebrafish Embryo

Alex Shlemov1 Nina Golyandina1 David Holloway2 and Alexander Spirov34

1Faculty of Mathematics and Mechanics St Petersburg State University Universitetsky Pr 28 St PeterhofSt Petersburg 198504 Russia2Mathematics Department British Columbia Institute of Technology 3700 Willingdon Avenue Burnaby BC Canada V5G 3H23Computer Science and CEWIT SUNY Stony Brook 1500 Stony Brook Road Stony Brook NY 11794 USA4The Sechenov Institute of Evolutionary Physiology amp Biochemistry Torez Pr 44 St Petersburg 194223 Russia

Correspondence should be addressed to Alexander Spirov alexanderspirovgmailcom

Received 8 February 2015 Accepted 1 May 2015

Academic Editor Shigehiko Kanaya

Copyright copy 2015 Alex Shlemov et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Recent progress in microscopy technologies biological markers and automated processing methods is making possible thedevelopment of gene expression atlases at cellular-level resolution over whole embryos Raw data on gene expression is usuallyvery noisy This noise comes from both experimental (technicalmethodological) and true biological sources (from stochasticbiochemical processes) In addition the cells or nuclei being imaged are irregularly arranged in 3D space This makes theprocessing extraction and study of expression signals and intrinsic biological noise a serious challenge for 3D data requiringnew computational approaches Here we present a new approach for studying gene expression in nuclei located in a thick layeraround a spherical surfaceThemethod includes depth equalization on the sphere flattening interpolation to a regular grid patternextraction by Shaped 3D singular spectrum analysis (SSA) and interpolation back to original nuclear positions The approach isdemonstrated on several examples of gene expression in the zebrafish egg (a model system in vertebrate development)Themethodis tested on several different data geometries (eg nuclear positions) and different forms of gene expression patterns Fully 3Ddatasets for developmental gene expression are becoming increasingly available we discuss the prospects of applying 3D-SSA todata processing and analysis in this growing field

1 Introduction

Recent advances in microscopy technologies biologicalmarkers and automated processing methods are enablingsustained progress towards the long-standing goal of recon-structing embryogenesis by integrating cellular behaviorand molecular dynamics [1ndash4] New technology includingphotonicmicroscopy approaches (reviewed in [5 6]) andnewbiological markers (fluorescent proteins photoactivatablecompounds and fluorescent nanoparticles such as quantumdots) [7] is producing quantitative high resolution data(spatial and temporal) at all levels of organization [8ndash12]Reconstruction of a developmental atlas can be considered instages proceeding from an automated reconstruction of a celllineage tree in space and time annotated with quantitative

information for cell shape and adding on the spatiotemporaldynamics of gene expression in the cells [2 4 5]

There are some impressive examples of progress in thisnew research area on several model experimental animalswhere the details of embryo development have been trackedfrom the fertilized egg to the several thousand cell embryonicstages with cellular level resolution quantitative data (inwhich each cell in the embryo is described by its 3D spatialcoordinates its lineage and the expression level of the keygenes)

Such large-scale research projects require not only newhigh-throughput experimental approaches but also newquantitative mathematical and computational approaches forthe processing analysis and modeling of such extensivedatasets [10 12ndash17]

Hindawi Publishing CorporationBioMed Research InternationalVolume 2015 Article ID 986436 18 pageshttpdxdoiorg1011552015986436

2 BioMed Research International

For reconstructing the spatiotemporal dynamics of geneexpression the raw data sets are stacks of confocal micro-scope scans of early embryos (fixed or live) Data is usuallythe intensity of fluorescent markers for either the mRNA orproteins encoded by the genes of interest Extracting thisdata requires image segmentation to identify the signal foreach cell (or nucleus)This produces text-files with the spatialcoordinates gene expression levels and lineage history ofeach cell

A major goal of this processing is to collect reliablequantitative data for the fitting and verification of moderncomputer dynamic and stochastic models of developmentalgene regulation at single cell resolution

Examples of publicly available high resolution quantita-tive datasets for several experimental animals include theroundworm ldquoC elegansrdquo (Web resource ldquoEPICrdquo [18]) the fruitfly ldquoD melanogasterrdquo (ldquoBID BDTNPrdquo [2 19]) and the zebra-fish ldquoD reriordquo (ldquoBioEmergencesrdquo [4 20]) Such data can beseen as a combination of biological signal biological noiseand experimental noise Use of the fluorescent data requiresa clear separation of these components For example deter-ministic gene regulatory models should be corroboratedsolely against the biological trend component stochasticmodels will also include biological noise components

The data in cellular resolution 3D gene expression atlasestypically has very high noise with contributions from aspectssuch as the intrinsic gene expression noise observed inprokaryotes and eukaryotes [21ndash27] and the disorder incellularnuclear positions New quantitative approaches areneeded to separate the raw expression data into signal andnoise components [28] While some animalsrsquo embryos havesimpler geometries being relatively flat with spherical orellipsoidal cell layers (like the early Drosophila fly embryo)many types of embryos have inherently spatially three-dimensional cell order adding methodological difficultieswith respect to specimen thickness and optical nontrans-parency Such 3D challenges require new experimental andcomputational approaches [29 30]

Different embryo geometries can produce different spa-tial characteristics on the gene expression data For instancein the early Drosophila embryo the data can be consideredas patterns on an ellipsoidal surface With an appropriatetwo-dimensional convolution (eg cylindrical projection)the data can be studied with 2D image processing techniques[31]

There are however datasets of recent experimental datawhich are truly 3D and cannot be properly transformed andanalyzed in 2D A prime example is expression data fromearly zebrafish embryos where nuclei (cells) are in severalirregular layers on the fish egg Such data requires techniquesand algorithms for directly processing 3D data The irregulardistribution of nuclei in layers presents an added challengesince most quantitative methods operate on spatially regulardata points

To this end this paper introduces a new method forprocessing irregular data points scattered in layers in thevicinity of an ellipsoidal (spheroidal) surface We present anonparametric method which can address in a nonbiasedway the arbitrary spatial distribution and unknown noise

character of the expression data In [31] we presented exten-sions of 2D singular spectrum analysis (2D-SSA) to analyze2D and surface 3D datasets from Drosophila confocal scansThese extensions circular and shaped 2D-SSA were appliedto gene expression patterns in the thin nuclear layer justunder the surface of the embryo We demonstrated howcircular and shaped 2D-SSA can decompose the expressiondata into identifiable components (trend and noise) as wellas separating signals from different genes

In this paper we extend SSA to irregular three-dimen-sional expression data (3D-SSA) For an initial application ofthe approach we focus on dealing with spatial irregularityusing real zebrafish data for the spatial coordinates of nucleiand the spatial gene expression patterns from the ldquoMatchITrdquoand ldquoAtlasITrdquo packages [4] but using artificial functions(simple math functions and a smooth approximation of anintensity indicator) for the expression (fluorescence) intensi-ties

Figures 1 and 2 illustrate the main steps of the 3D-SSAapproach (see details in Section 3) Figure 1(a) shows simu-lated intensity data (a two-exponential pattern with noise)on experimental data for nuclear positions in the spherical-cap geometry of the gene expression region Figure 2 showsthe algorithm steps depth equalization on the sphere flat-tening interpolation and reconstruction Figure 1(b) showsthe resulting pattern reconstruction after application of the3D-SSA algorithm Coloring of nuclei corresponds to geneexpression intensities Mixing of nuclei of different colorsreflects noise Regular (smooth) color patterns reflect noiseremoval

This paper is structured as follows Section 2 describes thesemiartificial datasets which were analyzed The method isdescribed in Section 3 In Sections 4 and 5 reliability of theapproach is demonstrated on semiartificial data similar toreal observations Specifically the first example considers allnuclei detected in the specimen at the shield stage in whichnuclei are distributed in a ldquospherical caprdquo and expression(with noise) is generated for two patterns (a) the sum of twoexponentials and (b) bell-shaped

The second example is for expression patterns similarto those for the nine regulatory genes characterized in [4]Specifically the test pattern is limited to nuclei where the ntlagene is found experimentally (the ldquoMatchITrdquo package [4])This set of nuclei is distributed in an equatorial strip Forits extraction we build a hull (envelope) of expressed nuclei(generally not convex since expressing area has complexshape) ldquoShapedrdquo 3D-SSA is applied in all of the test casesSection 6 contains discussion and conclusions

2 Data

In a 3D dataset from a zebrafish embryo each datapointcorresponds to a nucleus each represented by an array ofnumbers three spatial coordinates for the nucleus centroidand the fluorescence intensities (in arbitrary units) of thelabelled genes (usually two genes are labelled per embryo)Geometrically the data points are distributed around a 3Dellipsoid in several irregular layers (see Figures 1 4(a) 7(a)

BioMed Research International 3

x

y

z

(a) Initial data on a sphere

xy

z

(b) Results of data reconstruction the denoised pattern on asphere

Figure 1 Zebrafish data [4] and the 3D-SSA approach original nucleus position data for the spherical-cap gene expression region withsimulated intensity values (a schematic two-exponential pattern with noise) represented by the colormap colors are given in the topographicscale where the blue color corresponds to small values when brownred colors correspond to large values of intensity

x

y

z

(a) Depth equalization on the sphere

d

120595

120601

(b) Flattened nuclei

d

120595

120601

(c) Interpolation to a regular grid

d

120595

120601

(d) Reconstructed pattern on a regular grid

Figure 2 Processing steps in the 3D-SSA approach

4 BioMed Research International

and 15(a)) If we approximate the fish egg as an ellipsoid (orspheroid) then the early fish embryo can be geometricallydescribed as a thick (multilayer) spherical cap overlayingthe egg which can be flattened to a disc without substantialdistortions at the margins (biologists refer to the geometry ofthese embryonic stages as ldquodomerdquo or ldquodiscrdquo) The key genesstudied at these early stages tend to form expression patternsin compact subareas of the spherical cap such as open-endedrings and so forth Preprocessing and SSA procedures canbe focused or confined to these subareas In other words weassume that there is a transformation of a given expressionarea to a parallelepiped and that the transformation does notdistort the data drastically If the expression area is too largefor such a transformation as a whole then the transformationcan be done as several independent pieces

3 Method

We have adapted singular spectrum analysis (SSA) [32ndash34]to 2D spatial expression data [31] and shown it to be anefficient and robust means for data decomposition in thesecases The advantages of SSA its adaptivity flexibility withfew parameters visual control and no prior specificationof a noise model make it promising for 3D analysis Adrawback of SSA its current lack of automation though see[35] regarding automation on similar data structures

SSA-typemethods process data specified on a regular gridwithin a parallelepipedTherefore application to irregular 3Ddata first requires flattening followed by regularization

Processing 3D data which is in a layer near an ellipsoidalsurface consists of the following steps

(i) detection of data location estimation of the ellipsoidcenter and finding the nuclear centroid positionsrelative to this enabling data rotation for simplernondistorting flattening

(ii) flattening the data and embedding them into a paral-lelepiped

(iii) interpolation to a regular grid 119894 = 1 1198731 times 119895 =

1 1198732 times 119896 = 1 1198733 to obtain 119891119894119895119896

(iv) application of 3D-SSA perhaps by the shaped versionto confine the analysis to subareas 3D-SSA results in adecomposition of the form119891

119894119895119896= 119904119894119895119896

+119903119894119895119896 where 119904

119894119895119896

corresponds to the expression pattern or trend(v) interpolation of 119904

119894119895119896back to regular nuclei on the par-

allelepiped(vi) transformation back to the original (irregular)

embryo coordinates

This process results in an extracted pattern with residualnoise on the original geometry This allows for further studyof the patternrsquos form (eg comparison with deterministicdynamic gene regulation models) as well as the model forthe noise (eg to detect whether noise is additive or multi-plicative)

Below we comment on the steps of the processing schemein further detail For simplicity we assume data are locatednear the surface of a sphere (the case for zebrafish eggs)

31 Detection of Data Location The origin of the coordinatesystem is the center of the sphere estimated as the pointmost equidistant from all data points (more formally wefind a point minimizing the variance of distances between itsposition and those of the data points)

Two types of spatial data distributions are consideredeach of which has a specific procedure for reorientation andflattening

The first type is for data located near the spherical cap Inthis case the ldquo119911rdquo-axis passes through the center of the sphereand the middle point of the data we can rotate the data toobtain positive ldquo119911rdquo-values for all nuclear coordinates

In the second type data are located in a strip along theequator of the sphere In this case the ldquo119911rdquo-axis is chosenorthogonal to the equatorial plane and passing through thecenter of the sphere

In both cases ldquo119909rdquo- and ldquo119910rdquo-axes are chosen orthogonalto the ldquo119911rdquo-axis and to each other oriented to maintain theoriginal axis orientation as much as possible

32 Flattening the Data Data is embedded into a par-allelepiped with sides parallel to the axes The first axiscorresponds to depth in the data layer with the second andthe third axes corresponding to surface directions We aim tokeep the proportions of the data as unchanged as possible

321 Depth Equalization of Data We assume that the dataare located in a layer of approximately constant depth ona spherical surface Before projection the data should becorrected to be within an ideal spherical layer of constantdepth Suppose that all nuclei are located in an area whichis bounded by two convex surfaces (eg in a spherical layer)

The first step of the procedure is to find these surfaces asconvex external and internal hulls (envelopes) applying theclassical ldquoconvex hullrdquo method to original data for finding theexternal hull and to inverted data to find the interior oneThen the found exterior and interior hulls are transformed tospherical surfaces of radii 119877 + 1198632 and 119877 minus 1198632 correspond-ingly where the sphere radius 119877 is estimated as the median ofdistances from the data points to the sphere center which wasestimated on Section 31 and the layer depth 119863 is estimatedas the median distance between the hulls

The procedure can detect a too-thin layer of one nucleusdepth when 2D-SSA should be used instead of 3D versionFor data which is truly 2D the 2D-SSA method discussed in[31 36 37] is applied Otherwise a newmodified approach iselaborated here

322 Projection Spherical projections can have differentinvariants For data in spherical caps we use the equidistantazimuthal projection [38 Section 25] to maintain distancefrom the center point (ie latitude) and therefore the originallayerrsquos linear sizes After flattening and projection the datapoints (nuclei) will therefore have the following coordinates120595 120593 are coordinates along orthogonal meridians in theazimuthal projection and119889 represents depth all aremeasuredin metrical units

BioMed Research International 5

For data in equatorial strips we use the equidistantcylindrical projection [38 Section 12] maintaining distancesof points to the equator (ie latitudes) and distances betweenpoints on the equator This gives the similar coordinates 120595120593 and 119889 (latitude longitude and depth measured in metricalunits) If an equatorial strip encircles with the whole equatorwe obtain a parallelepiped with circular topology on theequatorial coordinate and can apply the circular version ofSSA [39]

Note that we obtain new coordinates in approximately thesame units (and the same proportions) as the original dataThis is the main purpose of using equidistant projectionssince proportions can strongly influence the interpolation toa regular grid

We use relative coordinates for the flattened dataThus inall pictures 119889 is reported as a percentage with 0 at the innersurface and 100 at the outer surface 120595 120593 are reported asfractions of the equator length

33 Interpolation of Nuclei to a Regular Grid and Back Inter-polation of irregular data to a regular grid is known as a ldquo3Dscattered data interpolation problemrdquo (see [40] for a descrip-tion and an overview of different approaches)

We use a ldquotriangulated irregular network-based linearinterpolation using Delaunay triangulationrdquo approach wherethe interpolation is constructed by a linear interpolationof the vertex values from the corresponding triangulationsimplex Implementation is performed with the help of thelibrary ldquoCGALrdquo [41]

Note the nuclei do not necessarily pack the whole par-allelepiped and edge effects are a consideration Thereforeafter interpolation we obtain the expression data on a subsetof the parallelepiped grid For cap-shaped data for examplethe subset is a disc This subset can be processed with theshaped version of 3D-SSA

For back-interpolation to nuclei the conventional trilin-ear interpolation is used for pattern reconstruction Residualvalues are calculated as differences between original andextracted-pattern (trend) expression values

34 Shaped 3D-SSA Shaped 3D-SSA methods need originaldata to be given on a subset of a regular grid (which we callthe initial shape)

A parameter of the method is the shaped 3D windowwhich is inscribed in a parallelepiped of sizes 1198711 times 1198712 times 1198713

It is assumed that the chosen window covers all the pointsof the original shape If not the uncovered grid points areremoved For a properwindow shape the number of removedpoints should not be large If so another window shape or sizeshould be chosen

Shaped 3D-SSA can be considered as a particular case ofshaped SSA (see eg [37 39]) To outline the algorithm wedo the following

Algorithm of Shaped 3D-SSA(1) We will consider all possible locations of the win-

dow within the original data array and denote theirnumber as 119870 For each location of the window we

unfold the shaped window into a vector of length 119871The obtained vectors are put together into a so-calledtrajectory matrixThe trajectory matrix X has a certain structure sincethe windows cover the array points several times a lotof matrix elements coincide Such matrices are calledquasi-Hankel [42] and form a linear space We willdesignate this spaceHDefine the embedding operator T(X)X rarr H Thisoperator is linear is an injection (if each point ofthe initial shaped array is covered by windows) andtherefore is invertible

(2) Construct the singular value decomposition (SVD) ofthe trajectory matrix X X = sum

119889

119894=1 120590119894119880119894119881T119894 The eigen-

vectors 119880119894can be folded to eigenarrays which have

the same shape as the window has(3) Choose a subset 119868 sub 1 119889 and put X

119868= sum119894isin119868

X119894

where X119894

= 120590119894119880119894119881

T119894 Conventionally the choice is

119868 = 1 119903 119903 lt 119889 what corresponds to an approx-imation of X by a low-rank matrix X

119868

(4) Project the matrix X119868to the space H and obtain the

reconstructed image (pattern) asX119868= Tminus1ΠHX

119868

Note that by additivity X119868

= sum119894isin119868

X119894 where X

119894=

Tminus1ΠHX119894are called elementary components Therefore the

forms of elementary components can be used for detection ofpattern components

35 Choice of Parameters Decomposition of gene expressiondata on irregular nuclear positions has several parameters forinterpolation and flattening and for SSA (the window shapeand size and the number 119903 of components for reconstruc-tion)

In order to obtain a sufficient number of grid points withrespect to the number of nuclei steps of the regular gridshould not be too large The upper bound for the grid pointsis limited only by computational costs

Recommendations for 3D-SSA are similar to that for 2D-SSA and 1D-SSA given in [33 34 37] Larger windows cor-respond to more refined decomposition and more accuratereconstruction if the signal (pattern) has a simple structuregenerating a few SVD components For more complex pat-terns medium to small windows are preferable

Note that window size ismeasuredwith respect to patternfeatures and should not depend on interpolation step There-fore window sizes are measured as a percentage of imagesizes not in the number of grid points Starting window sizescan be chosen as approximately 10ndash20 of the image sizesin each direction If the pattern is extracted imprecisely thewindow can be enlarged if the pattern is mixed with theresidual (noise) the window should be decreased

Identification of pattern components can be performedby analyzing the forms of eigenarrays or elementary recon-structed components (see [37]) Slowly varying patterns canbe constructed readily by accumulation of slowly varyingelementary components Since it is difficult to perform avisual analysis of 3D objects it is preferable to depict slices

6 BioMed Research International

(1D graphs or 2D images) obtained by fixing one or twocoordinates

Quality of pattern extraction can be checked by means ofresidual behaviour For proper pattern extraction residualsshould vary around zero Thus it is recommended to chooseslowly varying elementary components such that the residualhas no part in the pattern

36Model of Residuals For understanding biological sourcesof noise in gene expression it is of interest to extract from thedata a model of the noise (functional relation of the leadingmoments of gene expressions) We suggest a method formodel detection based on a standard test of heteroscedasticityof residuals with different normalizations

For a decomposition of initial data into pattern and noise119909119894= 119904119894+ 119903119894 119894 = 1 119873 where 119873 is the number of nuclei

(enumeration by one index instead of three does not affectthe results) assume that noise in nuclei is independent andconsider the model

119903119894= 120576119894sdot1003816100381610038161003816119904119894

1003816100381610038161003816120572 (1)

where E120576119894= 0 and D120576

119894= const

If 120572 = 0 the noise is additive (its standard deviationdoes not depend on pattern values) If 120572 = 1 the noiseis multiplicative (its standard deviation is proportional topattern values) The intermediate value 120572 = 05 correspondsto Poissonian noise where noise variance is proportional topattern value

The Park method [43] estimates 120572 as the slope of a linearregression in the model

log 10038161003816100381610038161199031198941003816100381610038161003816 = log120590+120572 log 1003816100381610038161003816119904119894

1003816100381610038161003816 + ]119894 (2)

where ]119894= log|120576

119894120590| is a well behaved error term The Park

method appears to be robust to the distribution of theresiduals An estimate of 120572 in model (2) can be obtained forexample by the least-squares method

37 Implementation We implemented all of the describedmethods in R [44] and included them in our BioSSA package

For construction of convex and nonconvex hulls (theldquoalphashaperdquo [45] method is used) the library ldquoqhullrdquo [46]is used by the wrapping R-packages (alphashape3d [47]geometry [48]) For 3D spatial interpolation we implementeda special R-packagewith help of the library ldquoCGALrdquo [41] sinceavailable R implementations of 3D spatial interpolation havevery poor performance

For trilinear interpolation the R-package oce [49] is usedFor 3D-SSA we use the Rssa R-package [50] This imple-

mentation is highly effective since it uses the approachesfrom [31 37 51]

4 Example for Spherical-Cap Nuclear Pattern

Here we work through an application of the 3D-SSA pro-cedure on the close-to-spherical zebrafish egg Nuclear coor-dinates are taken from the default MatchIT [4] example

The ldquoMatchITrdquo tool was run with the default dataset andparameters producing files with automatically detectednuclei Processing is on the file ldquogsc ntla wt t008 ch01csvrdquocontaining nuclear coordinates for the cells expressing the gsc(ldquogoosecoidrdquo) gene at the ldquolate shieldrdquo developmental stage

In this data nuclei are located in a thick layer on a sphereEach nucleus is marked by ldquo1rdquo or ldquo0rdquo for the gene expressingor not respectively For this data set gsc-expressing nuclei(ldquo1rdquo-s) are located within a small spherical cap We extendthe analyzed area to include all ldquo1rdquo-nuclei plus a surroundingring of nuclei

We consider two artificial (intensity) expression patternsfor these nuclei and show that shaped 3D-SSA can soundlyextract both types of patternsThe first kind of pattern is two-exponential analogous to 2D Drosophila patterns analyzedin [35] The second type of pattern approximates the onoff(ldquo1rdquoldquo0rdquo) expression values by a smooth bell-shaped pattern

We first present results for both examples and explain thechoice of parameters and pattern components We then showhow the method can be used to estimate the model of noiseafter extracting the pattern In the considered examples weadd Gaussian noise to patterns however it is not essentialsince the SSA-family ofmethods is stablewith respect to noisedistribution and possible weak dependencies

41 Two-Exponential Expression Pattern Let us construct theexpression values as

119904 (120595 120593 119889) = 1198902120593+9120595+2119889

+ 2119890minus(2120593+13120595+119889) (3)

V (120595 120593 119889) = 119904 (120595 120593 119889) + 120576 sdot 119904 (120595 120593 119889) (4)

where 120595 120593 are relative spherical coordinates 119889 is layer depth(119889 isin [0 1]) and 120576 is white Gaussian noise with standard devi-ation 035 This pattern which we call CAP-2EXP is thesum of two exponentials plus multiplicative noise Note thatthe pattern depends on three coordinates and therefore it isvaried in three directions Moderate noise levels here andin the other considered simulated examples were chosenfor better visual color representation of the results for theconsidered patterns

Inside and outside views of the nuclear hulls are shownin Figures 3(a)ndash3(d) where colors correspond to expressionlevels as in Figure 1 It can be seen that the coloring is vari-egated (nuclei of different color occupy similar positions)reflecting the presence of noise

The method described in Section 3 produces the recon-structed pattern depicted in Figures 3(e) and 3(f) The resultsof noise removal can clearly be seen

For better visual representation the nuclei themselves canbe depicted see Figures 4 and 5 As before the color of anucleus reflects the expression level in the same scale as inFigure 1 The expression pattern can be clearly seen in thedenoised data Difference between the inside and the outsideviews demonstrates pattern dynamic in depth direction

After pattern extraction the noisemodel can be estimated(see Section 36) In (3) multiplicative noise with 120572 = 1 wassimulated Applying the Park method provides an estimateof = 1073 recovering the multiplicative character of

BioMed Research International 7

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 3 CAP-2EXP the nuclear hulls coloring based on original and pattern expression intensities

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 4 CAP-2EXP nuclei colored an outside view

the generated noise and demonstrating how the process candistinguish between for example additive Poissonian andmultiplicative noise in datasets

42 Bell-Shaped Expression Pattern We now test the shaped3D-SSA process on a different intensity pattern using thesame nuclear data gsc ntla wt t008 ch01csv We constructa bell-shaped intensity pattern (referred to as CAP-BELL) asan approximation of 0-1 expression indicators in the data

plus white Gaussian noise with standard deviation 025The approximation was performed by twofold 3D-SSAsmoothing and therefore it cannot be expressed by anequation

Inside and outside views on the nuclear hulls are shownin Figures 6(a)ndash6(d) where colors correspond to expressionlevels in the topographic scale described in the caption ofFigure 1 It can be seen that the coloring is variegated signi-fying the presence of noise

8 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 5 CAP-2EXP nuclei colored an inside view

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 6 CAP-BELL the nuclear hulls coloring based on original and pattern expression intensities

The method described in Section 3 produces the patterndepicted in Figures 6(e) and 6(f) it can be seen that the 3D-SSA procedure has removed the original noise

For better visual representation the nuclei themselves canbe depicted see Figure 7 As before the color of a nucleusreflects the expression level in the topographic scale describedin Figure 1 The pattern (trend) can clearly be seen in thedenoised data

Estimating the noise model by the Park method returnsan = 0005 close to zero indicates additive noise theprocess has recovered the simulated Gaussian white noise

43 The Chosen Parameters After flattening as described inSection 32 a parallelepiped with length 119897 asymp 1030 width 119908 asymp

885 and depth119889 asymp 50 Since the shape of the flattened nuclear

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 2: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

2 BioMed Research International

For reconstructing the spatiotemporal dynamics of geneexpression the raw data sets are stacks of confocal micro-scope scans of early embryos (fixed or live) Data is usuallythe intensity of fluorescent markers for either the mRNA orproteins encoded by the genes of interest Extracting thisdata requires image segmentation to identify the signal foreach cell (or nucleus)This produces text-files with the spatialcoordinates gene expression levels and lineage history ofeach cell

A major goal of this processing is to collect reliablequantitative data for the fitting and verification of moderncomputer dynamic and stochastic models of developmentalgene regulation at single cell resolution

Examples of publicly available high resolution quantita-tive datasets for several experimental animals include theroundworm ldquoC elegansrdquo (Web resource ldquoEPICrdquo [18]) the fruitfly ldquoD melanogasterrdquo (ldquoBID BDTNPrdquo [2 19]) and the zebra-fish ldquoD reriordquo (ldquoBioEmergencesrdquo [4 20]) Such data can beseen as a combination of biological signal biological noiseand experimental noise Use of the fluorescent data requiresa clear separation of these components For example deter-ministic gene regulatory models should be corroboratedsolely against the biological trend component stochasticmodels will also include biological noise components

The data in cellular resolution 3D gene expression atlasestypically has very high noise with contributions from aspectssuch as the intrinsic gene expression noise observed inprokaryotes and eukaryotes [21ndash27] and the disorder incellularnuclear positions New quantitative approaches areneeded to separate the raw expression data into signal andnoise components [28] While some animalsrsquo embryos havesimpler geometries being relatively flat with spherical orellipsoidal cell layers (like the early Drosophila fly embryo)many types of embryos have inherently spatially three-dimensional cell order adding methodological difficultieswith respect to specimen thickness and optical nontrans-parency Such 3D challenges require new experimental andcomputational approaches [29 30]

Different embryo geometries can produce different spa-tial characteristics on the gene expression data For instancein the early Drosophila embryo the data can be consideredas patterns on an ellipsoidal surface With an appropriatetwo-dimensional convolution (eg cylindrical projection)the data can be studied with 2D image processing techniques[31]

There are however datasets of recent experimental datawhich are truly 3D and cannot be properly transformed andanalyzed in 2D A prime example is expression data fromearly zebrafish embryos where nuclei (cells) are in severalirregular layers on the fish egg Such data requires techniquesand algorithms for directly processing 3D data The irregulardistribution of nuclei in layers presents an added challengesince most quantitative methods operate on spatially regulardata points

To this end this paper introduces a new method forprocessing irregular data points scattered in layers in thevicinity of an ellipsoidal (spheroidal) surface We present anonparametric method which can address in a nonbiasedway the arbitrary spatial distribution and unknown noise

character of the expression data In [31] we presented exten-sions of 2D singular spectrum analysis (2D-SSA) to analyze2D and surface 3D datasets from Drosophila confocal scansThese extensions circular and shaped 2D-SSA were appliedto gene expression patterns in the thin nuclear layer justunder the surface of the embryo We demonstrated howcircular and shaped 2D-SSA can decompose the expressiondata into identifiable components (trend and noise) as wellas separating signals from different genes

In this paper we extend SSA to irregular three-dimen-sional expression data (3D-SSA) For an initial application ofthe approach we focus on dealing with spatial irregularityusing real zebrafish data for the spatial coordinates of nucleiand the spatial gene expression patterns from the ldquoMatchITrdquoand ldquoAtlasITrdquo packages [4] but using artificial functions(simple math functions and a smooth approximation of anintensity indicator) for the expression (fluorescence) intensi-ties

Figures 1 and 2 illustrate the main steps of the 3D-SSAapproach (see details in Section 3) Figure 1(a) shows simu-lated intensity data (a two-exponential pattern with noise)on experimental data for nuclear positions in the spherical-cap geometry of the gene expression region Figure 2 showsthe algorithm steps depth equalization on the sphere flat-tening interpolation and reconstruction Figure 1(b) showsthe resulting pattern reconstruction after application of the3D-SSA algorithm Coloring of nuclei corresponds to geneexpression intensities Mixing of nuclei of different colorsreflects noise Regular (smooth) color patterns reflect noiseremoval

This paper is structured as follows Section 2 describes thesemiartificial datasets which were analyzed The method isdescribed in Section 3 In Sections 4 and 5 reliability of theapproach is demonstrated on semiartificial data similar toreal observations Specifically the first example considers allnuclei detected in the specimen at the shield stage in whichnuclei are distributed in a ldquospherical caprdquo and expression(with noise) is generated for two patterns (a) the sum of twoexponentials and (b) bell-shaped

The second example is for expression patterns similarto those for the nine regulatory genes characterized in [4]Specifically the test pattern is limited to nuclei where the ntlagene is found experimentally (the ldquoMatchITrdquo package [4])This set of nuclei is distributed in an equatorial strip Forits extraction we build a hull (envelope) of expressed nuclei(generally not convex since expressing area has complexshape) ldquoShapedrdquo 3D-SSA is applied in all of the test casesSection 6 contains discussion and conclusions

2 Data

In a 3D dataset from a zebrafish embryo each datapointcorresponds to a nucleus each represented by an array ofnumbers three spatial coordinates for the nucleus centroidand the fluorescence intensities (in arbitrary units) of thelabelled genes (usually two genes are labelled per embryo)Geometrically the data points are distributed around a 3Dellipsoid in several irregular layers (see Figures 1 4(a) 7(a)

BioMed Research International 3

x

y

z

(a) Initial data on a sphere

xy

z

(b) Results of data reconstruction the denoised pattern on asphere

Figure 1 Zebrafish data [4] and the 3D-SSA approach original nucleus position data for the spherical-cap gene expression region withsimulated intensity values (a schematic two-exponential pattern with noise) represented by the colormap colors are given in the topographicscale where the blue color corresponds to small values when brownred colors correspond to large values of intensity

x

y

z

(a) Depth equalization on the sphere

d

120595

120601

(b) Flattened nuclei

d

120595

120601

(c) Interpolation to a regular grid

d

120595

120601

(d) Reconstructed pattern on a regular grid

Figure 2 Processing steps in the 3D-SSA approach

4 BioMed Research International

and 15(a)) If we approximate the fish egg as an ellipsoid (orspheroid) then the early fish embryo can be geometricallydescribed as a thick (multilayer) spherical cap overlayingthe egg which can be flattened to a disc without substantialdistortions at the margins (biologists refer to the geometry ofthese embryonic stages as ldquodomerdquo or ldquodiscrdquo) The key genesstudied at these early stages tend to form expression patternsin compact subareas of the spherical cap such as open-endedrings and so forth Preprocessing and SSA procedures canbe focused or confined to these subareas In other words weassume that there is a transformation of a given expressionarea to a parallelepiped and that the transformation does notdistort the data drastically If the expression area is too largefor such a transformation as a whole then the transformationcan be done as several independent pieces

3 Method

We have adapted singular spectrum analysis (SSA) [32ndash34]to 2D spatial expression data [31] and shown it to be anefficient and robust means for data decomposition in thesecases The advantages of SSA its adaptivity flexibility withfew parameters visual control and no prior specificationof a noise model make it promising for 3D analysis Adrawback of SSA its current lack of automation though see[35] regarding automation on similar data structures

SSA-typemethods process data specified on a regular gridwithin a parallelepipedTherefore application to irregular 3Ddata first requires flattening followed by regularization

Processing 3D data which is in a layer near an ellipsoidalsurface consists of the following steps

(i) detection of data location estimation of the ellipsoidcenter and finding the nuclear centroid positionsrelative to this enabling data rotation for simplernondistorting flattening

(ii) flattening the data and embedding them into a paral-lelepiped

(iii) interpolation to a regular grid 119894 = 1 1198731 times 119895 =

1 1198732 times 119896 = 1 1198733 to obtain 119891119894119895119896

(iv) application of 3D-SSA perhaps by the shaped versionto confine the analysis to subareas 3D-SSA results in adecomposition of the form119891

119894119895119896= 119904119894119895119896

+119903119894119895119896 where 119904

119894119895119896

corresponds to the expression pattern or trend(v) interpolation of 119904

119894119895119896back to regular nuclei on the par-

allelepiped(vi) transformation back to the original (irregular)

embryo coordinates

This process results in an extracted pattern with residualnoise on the original geometry This allows for further studyof the patternrsquos form (eg comparison with deterministicdynamic gene regulation models) as well as the model forthe noise (eg to detect whether noise is additive or multi-plicative)

Below we comment on the steps of the processing schemein further detail For simplicity we assume data are locatednear the surface of a sphere (the case for zebrafish eggs)

31 Detection of Data Location The origin of the coordinatesystem is the center of the sphere estimated as the pointmost equidistant from all data points (more formally wefind a point minimizing the variance of distances between itsposition and those of the data points)

Two types of spatial data distributions are consideredeach of which has a specific procedure for reorientation andflattening

The first type is for data located near the spherical cap Inthis case the ldquo119911rdquo-axis passes through the center of the sphereand the middle point of the data we can rotate the data toobtain positive ldquo119911rdquo-values for all nuclear coordinates

In the second type data are located in a strip along theequator of the sphere In this case the ldquo119911rdquo-axis is chosenorthogonal to the equatorial plane and passing through thecenter of the sphere

In both cases ldquo119909rdquo- and ldquo119910rdquo-axes are chosen orthogonalto the ldquo119911rdquo-axis and to each other oriented to maintain theoriginal axis orientation as much as possible

32 Flattening the Data Data is embedded into a par-allelepiped with sides parallel to the axes The first axiscorresponds to depth in the data layer with the second andthe third axes corresponding to surface directions We aim tokeep the proportions of the data as unchanged as possible

321 Depth Equalization of Data We assume that the dataare located in a layer of approximately constant depth ona spherical surface Before projection the data should becorrected to be within an ideal spherical layer of constantdepth Suppose that all nuclei are located in an area whichis bounded by two convex surfaces (eg in a spherical layer)

The first step of the procedure is to find these surfaces asconvex external and internal hulls (envelopes) applying theclassical ldquoconvex hullrdquo method to original data for finding theexternal hull and to inverted data to find the interior oneThen the found exterior and interior hulls are transformed tospherical surfaces of radii 119877 + 1198632 and 119877 minus 1198632 correspond-ingly where the sphere radius 119877 is estimated as the median ofdistances from the data points to the sphere center which wasestimated on Section 31 and the layer depth 119863 is estimatedas the median distance between the hulls

The procedure can detect a too-thin layer of one nucleusdepth when 2D-SSA should be used instead of 3D versionFor data which is truly 2D the 2D-SSA method discussed in[31 36 37] is applied Otherwise a newmodified approach iselaborated here

322 Projection Spherical projections can have differentinvariants For data in spherical caps we use the equidistantazimuthal projection [38 Section 25] to maintain distancefrom the center point (ie latitude) and therefore the originallayerrsquos linear sizes After flattening and projection the datapoints (nuclei) will therefore have the following coordinates120595 120593 are coordinates along orthogonal meridians in theazimuthal projection and119889 represents depth all aremeasuredin metrical units

BioMed Research International 5

For data in equatorial strips we use the equidistantcylindrical projection [38 Section 12] maintaining distancesof points to the equator (ie latitudes) and distances betweenpoints on the equator This gives the similar coordinates 120595120593 and 119889 (latitude longitude and depth measured in metricalunits) If an equatorial strip encircles with the whole equatorwe obtain a parallelepiped with circular topology on theequatorial coordinate and can apply the circular version ofSSA [39]

Note that we obtain new coordinates in approximately thesame units (and the same proportions) as the original dataThis is the main purpose of using equidistant projectionssince proportions can strongly influence the interpolation toa regular grid

We use relative coordinates for the flattened dataThus inall pictures 119889 is reported as a percentage with 0 at the innersurface and 100 at the outer surface 120595 120593 are reported asfractions of the equator length

33 Interpolation of Nuclei to a Regular Grid and Back Inter-polation of irregular data to a regular grid is known as a ldquo3Dscattered data interpolation problemrdquo (see [40] for a descrip-tion and an overview of different approaches)

We use a ldquotriangulated irregular network-based linearinterpolation using Delaunay triangulationrdquo approach wherethe interpolation is constructed by a linear interpolationof the vertex values from the corresponding triangulationsimplex Implementation is performed with the help of thelibrary ldquoCGALrdquo [41]

Note the nuclei do not necessarily pack the whole par-allelepiped and edge effects are a consideration Thereforeafter interpolation we obtain the expression data on a subsetof the parallelepiped grid For cap-shaped data for examplethe subset is a disc This subset can be processed with theshaped version of 3D-SSA

For back-interpolation to nuclei the conventional trilin-ear interpolation is used for pattern reconstruction Residualvalues are calculated as differences between original andextracted-pattern (trend) expression values

34 Shaped 3D-SSA Shaped 3D-SSA methods need originaldata to be given on a subset of a regular grid (which we callthe initial shape)

A parameter of the method is the shaped 3D windowwhich is inscribed in a parallelepiped of sizes 1198711 times 1198712 times 1198713

It is assumed that the chosen window covers all the pointsof the original shape If not the uncovered grid points areremoved For a properwindow shape the number of removedpoints should not be large If so another window shape or sizeshould be chosen

Shaped 3D-SSA can be considered as a particular case ofshaped SSA (see eg [37 39]) To outline the algorithm wedo the following

Algorithm of Shaped 3D-SSA(1) We will consider all possible locations of the win-

dow within the original data array and denote theirnumber as 119870 For each location of the window we

unfold the shaped window into a vector of length 119871The obtained vectors are put together into a so-calledtrajectory matrixThe trajectory matrix X has a certain structure sincethe windows cover the array points several times a lotof matrix elements coincide Such matrices are calledquasi-Hankel [42] and form a linear space We willdesignate this spaceHDefine the embedding operator T(X)X rarr H Thisoperator is linear is an injection (if each point ofthe initial shaped array is covered by windows) andtherefore is invertible

(2) Construct the singular value decomposition (SVD) ofthe trajectory matrix X X = sum

119889

119894=1 120590119894119880119894119881T119894 The eigen-

vectors 119880119894can be folded to eigenarrays which have

the same shape as the window has(3) Choose a subset 119868 sub 1 119889 and put X

119868= sum119894isin119868

X119894

where X119894

= 120590119894119880119894119881

T119894 Conventionally the choice is

119868 = 1 119903 119903 lt 119889 what corresponds to an approx-imation of X by a low-rank matrix X

119868

(4) Project the matrix X119868to the space H and obtain the

reconstructed image (pattern) asX119868= Tminus1ΠHX

119868

Note that by additivity X119868

= sum119894isin119868

X119894 where X

119894=

Tminus1ΠHX119894are called elementary components Therefore the

forms of elementary components can be used for detection ofpattern components

35 Choice of Parameters Decomposition of gene expressiondata on irregular nuclear positions has several parameters forinterpolation and flattening and for SSA (the window shapeand size and the number 119903 of components for reconstruc-tion)

In order to obtain a sufficient number of grid points withrespect to the number of nuclei steps of the regular gridshould not be too large The upper bound for the grid pointsis limited only by computational costs

Recommendations for 3D-SSA are similar to that for 2D-SSA and 1D-SSA given in [33 34 37] Larger windows cor-respond to more refined decomposition and more accuratereconstruction if the signal (pattern) has a simple structuregenerating a few SVD components For more complex pat-terns medium to small windows are preferable

Note that window size ismeasuredwith respect to patternfeatures and should not depend on interpolation step There-fore window sizes are measured as a percentage of imagesizes not in the number of grid points Starting window sizescan be chosen as approximately 10ndash20 of the image sizesin each direction If the pattern is extracted imprecisely thewindow can be enlarged if the pattern is mixed with theresidual (noise) the window should be decreased

Identification of pattern components can be performedby analyzing the forms of eigenarrays or elementary recon-structed components (see [37]) Slowly varying patterns canbe constructed readily by accumulation of slowly varyingelementary components Since it is difficult to perform avisual analysis of 3D objects it is preferable to depict slices

6 BioMed Research International

(1D graphs or 2D images) obtained by fixing one or twocoordinates

Quality of pattern extraction can be checked by means ofresidual behaviour For proper pattern extraction residualsshould vary around zero Thus it is recommended to chooseslowly varying elementary components such that the residualhas no part in the pattern

36Model of Residuals For understanding biological sourcesof noise in gene expression it is of interest to extract from thedata a model of the noise (functional relation of the leadingmoments of gene expressions) We suggest a method formodel detection based on a standard test of heteroscedasticityof residuals with different normalizations

For a decomposition of initial data into pattern and noise119909119894= 119904119894+ 119903119894 119894 = 1 119873 where 119873 is the number of nuclei

(enumeration by one index instead of three does not affectthe results) assume that noise in nuclei is independent andconsider the model

119903119894= 120576119894sdot1003816100381610038161003816119904119894

1003816100381610038161003816120572 (1)

where E120576119894= 0 and D120576

119894= const

If 120572 = 0 the noise is additive (its standard deviationdoes not depend on pattern values) If 120572 = 1 the noiseis multiplicative (its standard deviation is proportional topattern values) The intermediate value 120572 = 05 correspondsto Poissonian noise where noise variance is proportional topattern value

The Park method [43] estimates 120572 as the slope of a linearregression in the model

log 10038161003816100381610038161199031198941003816100381610038161003816 = log120590+120572 log 1003816100381610038161003816119904119894

1003816100381610038161003816 + ]119894 (2)

where ]119894= log|120576

119894120590| is a well behaved error term The Park

method appears to be robust to the distribution of theresiduals An estimate of 120572 in model (2) can be obtained forexample by the least-squares method

37 Implementation We implemented all of the describedmethods in R [44] and included them in our BioSSA package

For construction of convex and nonconvex hulls (theldquoalphashaperdquo [45] method is used) the library ldquoqhullrdquo [46]is used by the wrapping R-packages (alphashape3d [47]geometry [48]) For 3D spatial interpolation we implementeda special R-packagewith help of the library ldquoCGALrdquo [41] sinceavailable R implementations of 3D spatial interpolation havevery poor performance

For trilinear interpolation the R-package oce [49] is usedFor 3D-SSA we use the Rssa R-package [50] This imple-

mentation is highly effective since it uses the approachesfrom [31 37 51]

4 Example for Spherical-Cap Nuclear Pattern

Here we work through an application of the 3D-SSA pro-cedure on the close-to-spherical zebrafish egg Nuclear coor-dinates are taken from the default MatchIT [4] example

The ldquoMatchITrdquo tool was run with the default dataset andparameters producing files with automatically detectednuclei Processing is on the file ldquogsc ntla wt t008 ch01csvrdquocontaining nuclear coordinates for the cells expressing the gsc(ldquogoosecoidrdquo) gene at the ldquolate shieldrdquo developmental stage

In this data nuclei are located in a thick layer on a sphereEach nucleus is marked by ldquo1rdquo or ldquo0rdquo for the gene expressingor not respectively For this data set gsc-expressing nuclei(ldquo1rdquo-s) are located within a small spherical cap We extendthe analyzed area to include all ldquo1rdquo-nuclei plus a surroundingring of nuclei

We consider two artificial (intensity) expression patternsfor these nuclei and show that shaped 3D-SSA can soundlyextract both types of patternsThe first kind of pattern is two-exponential analogous to 2D Drosophila patterns analyzedin [35] The second type of pattern approximates the onoff(ldquo1rdquoldquo0rdquo) expression values by a smooth bell-shaped pattern

We first present results for both examples and explain thechoice of parameters and pattern components We then showhow the method can be used to estimate the model of noiseafter extracting the pattern In the considered examples weadd Gaussian noise to patterns however it is not essentialsince the SSA-family ofmethods is stablewith respect to noisedistribution and possible weak dependencies

41 Two-Exponential Expression Pattern Let us construct theexpression values as

119904 (120595 120593 119889) = 1198902120593+9120595+2119889

+ 2119890minus(2120593+13120595+119889) (3)

V (120595 120593 119889) = 119904 (120595 120593 119889) + 120576 sdot 119904 (120595 120593 119889) (4)

where 120595 120593 are relative spherical coordinates 119889 is layer depth(119889 isin [0 1]) and 120576 is white Gaussian noise with standard devi-ation 035 This pattern which we call CAP-2EXP is thesum of two exponentials plus multiplicative noise Note thatthe pattern depends on three coordinates and therefore it isvaried in three directions Moderate noise levels here andin the other considered simulated examples were chosenfor better visual color representation of the results for theconsidered patterns

Inside and outside views of the nuclear hulls are shownin Figures 3(a)ndash3(d) where colors correspond to expressionlevels as in Figure 1 It can be seen that the coloring is vari-egated (nuclei of different color occupy similar positions)reflecting the presence of noise

The method described in Section 3 produces the recon-structed pattern depicted in Figures 3(e) and 3(f) The resultsof noise removal can clearly be seen

For better visual representation the nuclei themselves canbe depicted see Figures 4 and 5 As before the color of anucleus reflects the expression level in the same scale as inFigure 1 The expression pattern can be clearly seen in thedenoised data Difference between the inside and the outsideviews demonstrates pattern dynamic in depth direction

After pattern extraction the noisemodel can be estimated(see Section 36) In (3) multiplicative noise with 120572 = 1 wassimulated Applying the Park method provides an estimateof = 1073 recovering the multiplicative character of

BioMed Research International 7

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 3 CAP-2EXP the nuclear hulls coloring based on original and pattern expression intensities

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 4 CAP-2EXP nuclei colored an outside view

the generated noise and demonstrating how the process candistinguish between for example additive Poissonian andmultiplicative noise in datasets

42 Bell-Shaped Expression Pattern We now test the shaped3D-SSA process on a different intensity pattern using thesame nuclear data gsc ntla wt t008 ch01csv We constructa bell-shaped intensity pattern (referred to as CAP-BELL) asan approximation of 0-1 expression indicators in the data

plus white Gaussian noise with standard deviation 025The approximation was performed by twofold 3D-SSAsmoothing and therefore it cannot be expressed by anequation

Inside and outside views on the nuclear hulls are shownin Figures 6(a)ndash6(d) where colors correspond to expressionlevels in the topographic scale described in the caption ofFigure 1 It can be seen that the coloring is variegated signi-fying the presence of noise

8 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 5 CAP-2EXP nuclei colored an inside view

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 6 CAP-BELL the nuclear hulls coloring based on original and pattern expression intensities

The method described in Section 3 produces the patterndepicted in Figures 6(e) and 6(f) it can be seen that the 3D-SSA procedure has removed the original noise

For better visual representation the nuclei themselves canbe depicted see Figure 7 As before the color of a nucleusreflects the expression level in the topographic scale describedin Figure 1 The pattern (trend) can clearly be seen in thedenoised data

Estimating the noise model by the Park method returnsan = 0005 close to zero indicates additive noise theprocess has recovered the simulated Gaussian white noise

43 The Chosen Parameters After flattening as described inSection 32 a parallelepiped with length 119897 asymp 1030 width 119908 asymp

885 and depth119889 asymp 50 Since the shape of the flattened nuclear

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 3: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

BioMed Research International 3

x

y

z

(a) Initial data on a sphere

xy

z

(b) Results of data reconstruction the denoised pattern on asphere

Figure 1 Zebrafish data [4] and the 3D-SSA approach original nucleus position data for the spherical-cap gene expression region withsimulated intensity values (a schematic two-exponential pattern with noise) represented by the colormap colors are given in the topographicscale where the blue color corresponds to small values when brownred colors correspond to large values of intensity

x

y

z

(a) Depth equalization on the sphere

d

120595

120601

(b) Flattened nuclei

d

120595

120601

(c) Interpolation to a regular grid

d

120595

120601

(d) Reconstructed pattern on a regular grid

Figure 2 Processing steps in the 3D-SSA approach

4 BioMed Research International

and 15(a)) If we approximate the fish egg as an ellipsoid (orspheroid) then the early fish embryo can be geometricallydescribed as a thick (multilayer) spherical cap overlayingthe egg which can be flattened to a disc without substantialdistortions at the margins (biologists refer to the geometry ofthese embryonic stages as ldquodomerdquo or ldquodiscrdquo) The key genesstudied at these early stages tend to form expression patternsin compact subareas of the spherical cap such as open-endedrings and so forth Preprocessing and SSA procedures canbe focused or confined to these subareas In other words weassume that there is a transformation of a given expressionarea to a parallelepiped and that the transformation does notdistort the data drastically If the expression area is too largefor such a transformation as a whole then the transformationcan be done as several independent pieces

3 Method

We have adapted singular spectrum analysis (SSA) [32ndash34]to 2D spatial expression data [31] and shown it to be anefficient and robust means for data decomposition in thesecases The advantages of SSA its adaptivity flexibility withfew parameters visual control and no prior specificationof a noise model make it promising for 3D analysis Adrawback of SSA its current lack of automation though see[35] regarding automation on similar data structures

SSA-typemethods process data specified on a regular gridwithin a parallelepipedTherefore application to irregular 3Ddata first requires flattening followed by regularization

Processing 3D data which is in a layer near an ellipsoidalsurface consists of the following steps

(i) detection of data location estimation of the ellipsoidcenter and finding the nuclear centroid positionsrelative to this enabling data rotation for simplernondistorting flattening

(ii) flattening the data and embedding them into a paral-lelepiped

(iii) interpolation to a regular grid 119894 = 1 1198731 times 119895 =

1 1198732 times 119896 = 1 1198733 to obtain 119891119894119895119896

(iv) application of 3D-SSA perhaps by the shaped versionto confine the analysis to subareas 3D-SSA results in adecomposition of the form119891

119894119895119896= 119904119894119895119896

+119903119894119895119896 where 119904

119894119895119896

corresponds to the expression pattern or trend(v) interpolation of 119904

119894119895119896back to regular nuclei on the par-

allelepiped(vi) transformation back to the original (irregular)

embryo coordinates

This process results in an extracted pattern with residualnoise on the original geometry This allows for further studyof the patternrsquos form (eg comparison with deterministicdynamic gene regulation models) as well as the model forthe noise (eg to detect whether noise is additive or multi-plicative)

Below we comment on the steps of the processing schemein further detail For simplicity we assume data are locatednear the surface of a sphere (the case for zebrafish eggs)

31 Detection of Data Location The origin of the coordinatesystem is the center of the sphere estimated as the pointmost equidistant from all data points (more formally wefind a point minimizing the variance of distances between itsposition and those of the data points)

Two types of spatial data distributions are consideredeach of which has a specific procedure for reorientation andflattening

The first type is for data located near the spherical cap Inthis case the ldquo119911rdquo-axis passes through the center of the sphereand the middle point of the data we can rotate the data toobtain positive ldquo119911rdquo-values for all nuclear coordinates

In the second type data are located in a strip along theequator of the sphere In this case the ldquo119911rdquo-axis is chosenorthogonal to the equatorial plane and passing through thecenter of the sphere

In both cases ldquo119909rdquo- and ldquo119910rdquo-axes are chosen orthogonalto the ldquo119911rdquo-axis and to each other oriented to maintain theoriginal axis orientation as much as possible

32 Flattening the Data Data is embedded into a par-allelepiped with sides parallel to the axes The first axiscorresponds to depth in the data layer with the second andthe third axes corresponding to surface directions We aim tokeep the proportions of the data as unchanged as possible

321 Depth Equalization of Data We assume that the dataare located in a layer of approximately constant depth ona spherical surface Before projection the data should becorrected to be within an ideal spherical layer of constantdepth Suppose that all nuclei are located in an area whichis bounded by two convex surfaces (eg in a spherical layer)

The first step of the procedure is to find these surfaces asconvex external and internal hulls (envelopes) applying theclassical ldquoconvex hullrdquo method to original data for finding theexternal hull and to inverted data to find the interior oneThen the found exterior and interior hulls are transformed tospherical surfaces of radii 119877 + 1198632 and 119877 minus 1198632 correspond-ingly where the sphere radius 119877 is estimated as the median ofdistances from the data points to the sphere center which wasestimated on Section 31 and the layer depth 119863 is estimatedas the median distance between the hulls

The procedure can detect a too-thin layer of one nucleusdepth when 2D-SSA should be used instead of 3D versionFor data which is truly 2D the 2D-SSA method discussed in[31 36 37] is applied Otherwise a newmodified approach iselaborated here

322 Projection Spherical projections can have differentinvariants For data in spherical caps we use the equidistantazimuthal projection [38 Section 25] to maintain distancefrom the center point (ie latitude) and therefore the originallayerrsquos linear sizes After flattening and projection the datapoints (nuclei) will therefore have the following coordinates120595 120593 are coordinates along orthogonal meridians in theazimuthal projection and119889 represents depth all aremeasuredin metrical units

BioMed Research International 5

For data in equatorial strips we use the equidistantcylindrical projection [38 Section 12] maintaining distancesof points to the equator (ie latitudes) and distances betweenpoints on the equator This gives the similar coordinates 120595120593 and 119889 (latitude longitude and depth measured in metricalunits) If an equatorial strip encircles with the whole equatorwe obtain a parallelepiped with circular topology on theequatorial coordinate and can apply the circular version ofSSA [39]

Note that we obtain new coordinates in approximately thesame units (and the same proportions) as the original dataThis is the main purpose of using equidistant projectionssince proportions can strongly influence the interpolation toa regular grid

We use relative coordinates for the flattened dataThus inall pictures 119889 is reported as a percentage with 0 at the innersurface and 100 at the outer surface 120595 120593 are reported asfractions of the equator length

33 Interpolation of Nuclei to a Regular Grid and Back Inter-polation of irregular data to a regular grid is known as a ldquo3Dscattered data interpolation problemrdquo (see [40] for a descrip-tion and an overview of different approaches)

We use a ldquotriangulated irregular network-based linearinterpolation using Delaunay triangulationrdquo approach wherethe interpolation is constructed by a linear interpolationof the vertex values from the corresponding triangulationsimplex Implementation is performed with the help of thelibrary ldquoCGALrdquo [41]

Note the nuclei do not necessarily pack the whole par-allelepiped and edge effects are a consideration Thereforeafter interpolation we obtain the expression data on a subsetof the parallelepiped grid For cap-shaped data for examplethe subset is a disc This subset can be processed with theshaped version of 3D-SSA

For back-interpolation to nuclei the conventional trilin-ear interpolation is used for pattern reconstruction Residualvalues are calculated as differences between original andextracted-pattern (trend) expression values

34 Shaped 3D-SSA Shaped 3D-SSA methods need originaldata to be given on a subset of a regular grid (which we callthe initial shape)

A parameter of the method is the shaped 3D windowwhich is inscribed in a parallelepiped of sizes 1198711 times 1198712 times 1198713

It is assumed that the chosen window covers all the pointsof the original shape If not the uncovered grid points areremoved For a properwindow shape the number of removedpoints should not be large If so another window shape or sizeshould be chosen

Shaped 3D-SSA can be considered as a particular case ofshaped SSA (see eg [37 39]) To outline the algorithm wedo the following

Algorithm of Shaped 3D-SSA(1) We will consider all possible locations of the win-

dow within the original data array and denote theirnumber as 119870 For each location of the window we

unfold the shaped window into a vector of length 119871The obtained vectors are put together into a so-calledtrajectory matrixThe trajectory matrix X has a certain structure sincethe windows cover the array points several times a lotof matrix elements coincide Such matrices are calledquasi-Hankel [42] and form a linear space We willdesignate this spaceHDefine the embedding operator T(X)X rarr H Thisoperator is linear is an injection (if each point ofthe initial shaped array is covered by windows) andtherefore is invertible

(2) Construct the singular value decomposition (SVD) ofthe trajectory matrix X X = sum

119889

119894=1 120590119894119880119894119881T119894 The eigen-

vectors 119880119894can be folded to eigenarrays which have

the same shape as the window has(3) Choose a subset 119868 sub 1 119889 and put X

119868= sum119894isin119868

X119894

where X119894

= 120590119894119880119894119881

T119894 Conventionally the choice is

119868 = 1 119903 119903 lt 119889 what corresponds to an approx-imation of X by a low-rank matrix X

119868

(4) Project the matrix X119868to the space H and obtain the

reconstructed image (pattern) asX119868= Tminus1ΠHX

119868

Note that by additivity X119868

= sum119894isin119868

X119894 where X

119894=

Tminus1ΠHX119894are called elementary components Therefore the

forms of elementary components can be used for detection ofpattern components

35 Choice of Parameters Decomposition of gene expressiondata on irregular nuclear positions has several parameters forinterpolation and flattening and for SSA (the window shapeand size and the number 119903 of components for reconstruc-tion)

In order to obtain a sufficient number of grid points withrespect to the number of nuclei steps of the regular gridshould not be too large The upper bound for the grid pointsis limited only by computational costs

Recommendations for 3D-SSA are similar to that for 2D-SSA and 1D-SSA given in [33 34 37] Larger windows cor-respond to more refined decomposition and more accuratereconstruction if the signal (pattern) has a simple structuregenerating a few SVD components For more complex pat-terns medium to small windows are preferable

Note that window size ismeasuredwith respect to patternfeatures and should not depend on interpolation step There-fore window sizes are measured as a percentage of imagesizes not in the number of grid points Starting window sizescan be chosen as approximately 10ndash20 of the image sizesin each direction If the pattern is extracted imprecisely thewindow can be enlarged if the pattern is mixed with theresidual (noise) the window should be decreased

Identification of pattern components can be performedby analyzing the forms of eigenarrays or elementary recon-structed components (see [37]) Slowly varying patterns canbe constructed readily by accumulation of slowly varyingelementary components Since it is difficult to perform avisual analysis of 3D objects it is preferable to depict slices

6 BioMed Research International

(1D graphs or 2D images) obtained by fixing one or twocoordinates

Quality of pattern extraction can be checked by means ofresidual behaviour For proper pattern extraction residualsshould vary around zero Thus it is recommended to chooseslowly varying elementary components such that the residualhas no part in the pattern

36Model of Residuals For understanding biological sourcesof noise in gene expression it is of interest to extract from thedata a model of the noise (functional relation of the leadingmoments of gene expressions) We suggest a method formodel detection based on a standard test of heteroscedasticityof residuals with different normalizations

For a decomposition of initial data into pattern and noise119909119894= 119904119894+ 119903119894 119894 = 1 119873 where 119873 is the number of nuclei

(enumeration by one index instead of three does not affectthe results) assume that noise in nuclei is independent andconsider the model

119903119894= 120576119894sdot1003816100381610038161003816119904119894

1003816100381610038161003816120572 (1)

where E120576119894= 0 and D120576

119894= const

If 120572 = 0 the noise is additive (its standard deviationdoes not depend on pattern values) If 120572 = 1 the noiseis multiplicative (its standard deviation is proportional topattern values) The intermediate value 120572 = 05 correspondsto Poissonian noise where noise variance is proportional topattern value

The Park method [43] estimates 120572 as the slope of a linearregression in the model

log 10038161003816100381610038161199031198941003816100381610038161003816 = log120590+120572 log 1003816100381610038161003816119904119894

1003816100381610038161003816 + ]119894 (2)

where ]119894= log|120576

119894120590| is a well behaved error term The Park

method appears to be robust to the distribution of theresiduals An estimate of 120572 in model (2) can be obtained forexample by the least-squares method

37 Implementation We implemented all of the describedmethods in R [44] and included them in our BioSSA package

For construction of convex and nonconvex hulls (theldquoalphashaperdquo [45] method is used) the library ldquoqhullrdquo [46]is used by the wrapping R-packages (alphashape3d [47]geometry [48]) For 3D spatial interpolation we implementeda special R-packagewith help of the library ldquoCGALrdquo [41] sinceavailable R implementations of 3D spatial interpolation havevery poor performance

For trilinear interpolation the R-package oce [49] is usedFor 3D-SSA we use the Rssa R-package [50] This imple-

mentation is highly effective since it uses the approachesfrom [31 37 51]

4 Example for Spherical-Cap Nuclear Pattern

Here we work through an application of the 3D-SSA pro-cedure on the close-to-spherical zebrafish egg Nuclear coor-dinates are taken from the default MatchIT [4] example

The ldquoMatchITrdquo tool was run with the default dataset andparameters producing files with automatically detectednuclei Processing is on the file ldquogsc ntla wt t008 ch01csvrdquocontaining nuclear coordinates for the cells expressing the gsc(ldquogoosecoidrdquo) gene at the ldquolate shieldrdquo developmental stage

In this data nuclei are located in a thick layer on a sphereEach nucleus is marked by ldquo1rdquo or ldquo0rdquo for the gene expressingor not respectively For this data set gsc-expressing nuclei(ldquo1rdquo-s) are located within a small spherical cap We extendthe analyzed area to include all ldquo1rdquo-nuclei plus a surroundingring of nuclei

We consider two artificial (intensity) expression patternsfor these nuclei and show that shaped 3D-SSA can soundlyextract both types of patternsThe first kind of pattern is two-exponential analogous to 2D Drosophila patterns analyzedin [35] The second type of pattern approximates the onoff(ldquo1rdquoldquo0rdquo) expression values by a smooth bell-shaped pattern

We first present results for both examples and explain thechoice of parameters and pattern components We then showhow the method can be used to estimate the model of noiseafter extracting the pattern In the considered examples weadd Gaussian noise to patterns however it is not essentialsince the SSA-family ofmethods is stablewith respect to noisedistribution and possible weak dependencies

41 Two-Exponential Expression Pattern Let us construct theexpression values as

119904 (120595 120593 119889) = 1198902120593+9120595+2119889

+ 2119890minus(2120593+13120595+119889) (3)

V (120595 120593 119889) = 119904 (120595 120593 119889) + 120576 sdot 119904 (120595 120593 119889) (4)

where 120595 120593 are relative spherical coordinates 119889 is layer depth(119889 isin [0 1]) and 120576 is white Gaussian noise with standard devi-ation 035 This pattern which we call CAP-2EXP is thesum of two exponentials plus multiplicative noise Note thatthe pattern depends on three coordinates and therefore it isvaried in three directions Moderate noise levels here andin the other considered simulated examples were chosenfor better visual color representation of the results for theconsidered patterns

Inside and outside views of the nuclear hulls are shownin Figures 3(a)ndash3(d) where colors correspond to expressionlevels as in Figure 1 It can be seen that the coloring is vari-egated (nuclei of different color occupy similar positions)reflecting the presence of noise

The method described in Section 3 produces the recon-structed pattern depicted in Figures 3(e) and 3(f) The resultsof noise removal can clearly be seen

For better visual representation the nuclei themselves canbe depicted see Figures 4 and 5 As before the color of anucleus reflects the expression level in the same scale as inFigure 1 The expression pattern can be clearly seen in thedenoised data Difference between the inside and the outsideviews demonstrates pattern dynamic in depth direction

After pattern extraction the noisemodel can be estimated(see Section 36) In (3) multiplicative noise with 120572 = 1 wassimulated Applying the Park method provides an estimateof = 1073 recovering the multiplicative character of

BioMed Research International 7

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 3 CAP-2EXP the nuclear hulls coloring based on original and pattern expression intensities

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 4 CAP-2EXP nuclei colored an outside view

the generated noise and demonstrating how the process candistinguish between for example additive Poissonian andmultiplicative noise in datasets

42 Bell-Shaped Expression Pattern We now test the shaped3D-SSA process on a different intensity pattern using thesame nuclear data gsc ntla wt t008 ch01csv We constructa bell-shaped intensity pattern (referred to as CAP-BELL) asan approximation of 0-1 expression indicators in the data

plus white Gaussian noise with standard deviation 025The approximation was performed by twofold 3D-SSAsmoothing and therefore it cannot be expressed by anequation

Inside and outside views on the nuclear hulls are shownin Figures 6(a)ndash6(d) where colors correspond to expressionlevels in the topographic scale described in the caption ofFigure 1 It can be seen that the coloring is variegated signi-fying the presence of noise

8 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 5 CAP-2EXP nuclei colored an inside view

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 6 CAP-BELL the nuclear hulls coloring based on original and pattern expression intensities

The method described in Section 3 produces the patterndepicted in Figures 6(e) and 6(f) it can be seen that the 3D-SSA procedure has removed the original noise

For better visual representation the nuclei themselves canbe depicted see Figure 7 As before the color of a nucleusreflects the expression level in the topographic scale describedin Figure 1 The pattern (trend) can clearly be seen in thedenoised data

Estimating the noise model by the Park method returnsan = 0005 close to zero indicates additive noise theprocess has recovered the simulated Gaussian white noise

43 The Chosen Parameters After flattening as described inSection 32 a parallelepiped with length 119897 asymp 1030 width 119908 asymp

885 and depth119889 asymp 50 Since the shape of the flattened nuclear

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 4: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

4 BioMed Research International

and 15(a)) If we approximate the fish egg as an ellipsoid (orspheroid) then the early fish embryo can be geometricallydescribed as a thick (multilayer) spherical cap overlayingthe egg which can be flattened to a disc without substantialdistortions at the margins (biologists refer to the geometry ofthese embryonic stages as ldquodomerdquo or ldquodiscrdquo) The key genesstudied at these early stages tend to form expression patternsin compact subareas of the spherical cap such as open-endedrings and so forth Preprocessing and SSA procedures canbe focused or confined to these subareas In other words weassume that there is a transformation of a given expressionarea to a parallelepiped and that the transformation does notdistort the data drastically If the expression area is too largefor such a transformation as a whole then the transformationcan be done as several independent pieces

3 Method

We have adapted singular spectrum analysis (SSA) [32ndash34]to 2D spatial expression data [31] and shown it to be anefficient and robust means for data decomposition in thesecases The advantages of SSA its adaptivity flexibility withfew parameters visual control and no prior specificationof a noise model make it promising for 3D analysis Adrawback of SSA its current lack of automation though see[35] regarding automation on similar data structures

SSA-typemethods process data specified on a regular gridwithin a parallelepipedTherefore application to irregular 3Ddata first requires flattening followed by regularization

Processing 3D data which is in a layer near an ellipsoidalsurface consists of the following steps

(i) detection of data location estimation of the ellipsoidcenter and finding the nuclear centroid positionsrelative to this enabling data rotation for simplernondistorting flattening

(ii) flattening the data and embedding them into a paral-lelepiped

(iii) interpolation to a regular grid 119894 = 1 1198731 times 119895 =

1 1198732 times 119896 = 1 1198733 to obtain 119891119894119895119896

(iv) application of 3D-SSA perhaps by the shaped versionto confine the analysis to subareas 3D-SSA results in adecomposition of the form119891

119894119895119896= 119904119894119895119896

+119903119894119895119896 where 119904

119894119895119896

corresponds to the expression pattern or trend(v) interpolation of 119904

119894119895119896back to regular nuclei on the par-

allelepiped(vi) transformation back to the original (irregular)

embryo coordinates

This process results in an extracted pattern with residualnoise on the original geometry This allows for further studyof the patternrsquos form (eg comparison with deterministicdynamic gene regulation models) as well as the model forthe noise (eg to detect whether noise is additive or multi-plicative)

Below we comment on the steps of the processing schemein further detail For simplicity we assume data are locatednear the surface of a sphere (the case for zebrafish eggs)

31 Detection of Data Location The origin of the coordinatesystem is the center of the sphere estimated as the pointmost equidistant from all data points (more formally wefind a point minimizing the variance of distances between itsposition and those of the data points)

Two types of spatial data distributions are consideredeach of which has a specific procedure for reorientation andflattening

The first type is for data located near the spherical cap Inthis case the ldquo119911rdquo-axis passes through the center of the sphereand the middle point of the data we can rotate the data toobtain positive ldquo119911rdquo-values for all nuclear coordinates

In the second type data are located in a strip along theequator of the sphere In this case the ldquo119911rdquo-axis is chosenorthogonal to the equatorial plane and passing through thecenter of the sphere

In both cases ldquo119909rdquo- and ldquo119910rdquo-axes are chosen orthogonalto the ldquo119911rdquo-axis and to each other oriented to maintain theoriginal axis orientation as much as possible

32 Flattening the Data Data is embedded into a par-allelepiped with sides parallel to the axes The first axiscorresponds to depth in the data layer with the second andthe third axes corresponding to surface directions We aim tokeep the proportions of the data as unchanged as possible

321 Depth Equalization of Data We assume that the dataare located in a layer of approximately constant depth ona spherical surface Before projection the data should becorrected to be within an ideal spherical layer of constantdepth Suppose that all nuclei are located in an area whichis bounded by two convex surfaces (eg in a spherical layer)

The first step of the procedure is to find these surfaces asconvex external and internal hulls (envelopes) applying theclassical ldquoconvex hullrdquo method to original data for finding theexternal hull and to inverted data to find the interior oneThen the found exterior and interior hulls are transformed tospherical surfaces of radii 119877 + 1198632 and 119877 minus 1198632 correspond-ingly where the sphere radius 119877 is estimated as the median ofdistances from the data points to the sphere center which wasestimated on Section 31 and the layer depth 119863 is estimatedas the median distance between the hulls

The procedure can detect a too-thin layer of one nucleusdepth when 2D-SSA should be used instead of 3D versionFor data which is truly 2D the 2D-SSA method discussed in[31 36 37] is applied Otherwise a newmodified approach iselaborated here

322 Projection Spherical projections can have differentinvariants For data in spherical caps we use the equidistantazimuthal projection [38 Section 25] to maintain distancefrom the center point (ie latitude) and therefore the originallayerrsquos linear sizes After flattening and projection the datapoints (nuclei) will therefore have the following coordinates120595 120593 are coordinates along orthogonal meridians in theazimuthal projection and119889 represents depth all aremeasuredin metrical units

BioMed Research International 5

For data in equatorial strips we use the equidistantcylindrical projection [38 Section 12] maintaining distancesof points to the equator (ie latitudes) and distances betweenpoints on the equator This gives the similar coordinates 120595120593 and 119889 (latitude longitude and depth measured in metricalunits) If an equatorial strip encircles with the whole equatorwe obtain a parallelepiped with circular topology on theequatorial coordinate and can apply the circular version ofSSA [39]

Note that we obtain new coordinates in approximately thesame units (and the same proportions) as the original dataThis is the main purpose of using equidistant projectionssince proportions can strongly influence the interpolation toa regular grid

We use relative coordinates for the flattened dataThus inall pictures 119889 is reported as a percentage with 0 at the innersurface and 100 at the outer surface 120595 120593 are reported asfractions of the equator length

33 Interpolation of Nuclei to a Regular Grid and Back Inter-polation of irregular data to a regular grid is known as a ldquo3Dscattered data interpolation problemrdquo (see [40] for a descrip-tion and an overview of different approaches)

We use a ldquotriangulated irregular network-based linearinterpolation using Delaunay triangulationrdquo approach wherethe interpolation is constructed by a linear interpolationof the vertex values from the corresponding triangulationsimplex Implementation is performed with the help of thelibrary ldquoCGALrdquo [41]

Note the nuclei do not necessarily pack the whole par-allelepiped and edge effects are a consideration Thereforeafter interpolation we obtain the expression data on a subsetof the parallelepiped grid For cap-shaped data for examplethe subset is a disc This subset can be processed with theshaped version of 3D-SSA

For back-interpolation to nuclei the conventional trilin-ear interpolation is used for pattern reconstruction Residualvalues are calculated as differences between original andextracted-pattern (trend) expression values

34 Shaped 3D-SSA Shaped 3D-SSA methods need originaldata to be given on a subset of a regular grid (which we callthe initial shape)

A parameter of the method is the shaped 3D windowwhich is inscribed in a parallelepiped of sizes 1198711 times 1198712 times 1198713

It is assumed that the chosen window covers all the pointsof the original shape If not the uncovered grid points areremoved For a properwindow shape the number of removedpoints should not be large If so another window shape or sizeshould be chosen

Shaped 3D-SSA can be considered as a particular case ofshaped SSA (see eg [37 39]) To outline the algorithm wedo the following

Algorithm of Shaped 3D-SSA(1) We will consider all possible locations of the win-

dow within the original data array and denote theirnumber as 119870 For each location of the window we

unfold the shaped window into a vector of length 119871The obtained vectors are put together into a so-calledtrajectory matrixThe trajectory matrix X has a certain structure sincethe windows cover the array points several times a lotof matrix elements coincide Such matrices are calledquasi-Hankel [42] and form a linear space We willdesignate this spaceHDefine the embedding operator T(X)X rarr H Thisoperator is linear is an injection (if each point ofthe initial shaped array is covered by windows) andtherefore is invertible

(2) Construct the singular value decomposition (SVD) ofthe trajectory matrix X X = sum

119889

119894=1 120590119894119880119894119881T119894 The eigen-

vectors 119880119894can be folded to eigenarrays which have

the same shape as the window has(3) Choose a subset 119868 sub 1 119889 and put X

119868= sum119894isin119868

X119894

where X119894

= 120590119894119880119894119881

T119894 Conventionally the choice is

119868 = 1 119903 119903 lt 119889 what corresponds to an approx-imation of X by a low-rank matrix X

119868

(4) Project the matrix X119868to the space H and obtain the

reconstructed image (pattern) asX119868= Tminus1ΠHX

119868

Note that by additivity X119868

= sum119894isin119868

X119894 where X

119894=

Tminus1ΠHX119894are called elementary components Therefore the

forms of elementary components can be used for detection ofpattern components

35 Choice of Parameters Decomposition of gene expressiondata on irregular nuclear positions has several parameters forinterpolation and flattening and for SSA (the window shapeand size and the number 119903 of components for reconstruc-tion)

In order to obtain a sufficient number of grid points withrespect to the number of nuclei steps of the regular gridshould not be too large The upper bound for the grid pointsis limited only by computational costs

Recommendations for 3D-SSA are similar to that for 2D-SSA and 1D-SSA given in [33 34 37] Larger windows cor-respond to more refined decomposition and more accuratereconstruction if the signal (pattern) has a simple structuregenerating a few SVD components For more complex pat-terns medium to small windows are preferable

Note that window size ismeasuredwith respect to patternfeatures and should not depend on interpolation step There-fore window sizes are measured as a percentage of imagesizes not in the number of grid points Starting window sizescan be chosen as approximately 10ndash20 of the image sizesin each direction If the pattern is extracted imprecisely thewindow can be enlarged if the pattern is mixed with theresidual (noise) the window should be decreased

Identification of pattern components can be performedby analyzing the forms of eigenarrays or elementary recon-structed components (see [37]) Slowly varying patterns canbe constructed readily by accumulation of slowly varyingelementary components Since it is difficult to perform avisual analysis of 3D objects it is preferable to depict slices

6 BioMed Research International

(1D graphs or 2D images) obtained by fixing one or twocoordinates

Quality of pattern extraction can be checked by means ofresidual behaviour For proper pattern extraction residualsshould vary around zero Thus it is recommended to chooseslowly varying elementary components such that the residualhas no part in the pattern

36Model of Residuals For understanding biological sourcesof noise in gene expression it is of interest to extract from thedata a model of the noise (functional relation of the leadingmoments of gene expressions) We suggest a method formodel detection based on a standard test of heteroscedasticityof residuals with different normalizations

For a decomposition of initial data into pattern and noise119909119894= 119904119894+ 119903119894 119894 = 1 119873 where 119873 is the number of nuclei

(enumeration by one index instead of three does not affectthe results) assume that noise in nuclei is independent andconsider the model

119903119894= 120576119894sdot1003816100381610038161003816119904119894

1003816100381610038161003816120572 (1)

where E120576119894= 0 and D120576

119894= const

If 120572 = 0 the noise is additive (its standard deviationdoes not depend on pattern values) If 120572 = 1 the noiseis multiplicative (its standard deviation is proportional topattern values) The intermediate value 120572 = 05 correspondsto Poissonian noise where noise variance is proportional topattern value

The Park method [43] estimates 120572 as the slope of a linearregression in the model

log 10038161003816100381610038161199031198941003816100381610038161003816 = log120590+120572 log 1003816100381610038161003816119904119894

1003816100381610038161003816 + ]119894 (2)

where ]119894= log|120576

119894120590| is a well behaved error term The Park

method appears to be robust to the distribution of theresiduals An estimate of 120572 in model (2) can be obtained forexample by the least-squares method

37 Implementation We implemented all of the describedmethods in R [44] and included them in our BioSSA package

For construction of convex and nonconvex hulls (theldquoalphashaperdquo [45] method is used) the library ldquoqhullrdquo [46]is used by the wrapping R-packages (alphashape3d [47]geometry [48]) For 3D spatial interpolation we implementeda special R-packagewith help of the library ldquoCGALrdquo [41] sinceavailable R implementations of 3D spatial interpolation havevery poor performance

For trilinear interpolation the R-package oce [49] is usedFor 3D-SSA we use the Rssa R-package [50] This imple-

mentation is highly effective since it uses the approachesfrom [31 37 51]

4 Example for Spherical-Cap Nuclear Pattern

Here we work through an application of the 3D-SSA pro-cedure on the close-to-spherical zebrafish egg Nuclear coor-dinates are taken from the default MatchIT [4] example

The ldquoMatchITrdquo tool was run with the default dataset andparameters producing files with automatically detectednuclei Processing is on the file ldquogsc ntla wt t008 ch01csvrdquocontaining nuclear coordinates for the cells expressing the gsc(ldquogoosecoidrdquo) gene at the ldquolate shieldrdquo developmental stage

In this data nuclei are located in a thick layer on a sphereEach nucleus is marked by ldquo1rdquo or ldquo0rdquo for the gene expressingor not respectively For this data set gsc-expressing nuclei(ldquo1rdquo-s) are located within a small spherical cap We extendthe analyzed area to include all ldquo1rdquo-nuclei plus a surroundingring of nuclei

We consider two artificial (intensity) expression patternsfor these nuclei and show that shaped 3D-SSA can soundlyextract both types of patternsThe first kind of pattern is two-exponential analogous to 2D Drosophila patterns analyzedin [35] The second type of pattern approximates the onoff(ldquo1rdquoldquo0rdquo) expression values by a smooth bell-shaped pattern

We first present results for both examples and explain thechoice of parameters and pattern components We then showhow the method can be used to estimate the model of noiseafter extracting the pattern In the considered examples weadd Gaussian noise to patterns however it is not essentialsince the SSA-family ofmethods is stablewith respect to noisedistribution and possible weak dependencies

41 Two-Exponential Expression Pattern Let us construct theexpression values as

119904 (120595 120593 119889) = 1198902120593+9120595+2119889

+ 2119890minus(2120593+13120595+119889) (3)

V (120595 120593 119889) = 119904 (120595 120593 119889) + 120576 sdot 119904 (120595 120593 119889) (4)

where 120595 120593 are relative spherical coordinates 119889 is layer depth(119889 isin [0 1]) and 120576 is white Gaussian noise with standard devi-ation 035 This pattern which we call CAP-2EXP is thesum of two exponentials plus multiplicative noise Note thatthe pattern depends on three coordinates and therefore it isvaried in three directions Moderate noise levels here andin the other considered simulated examples were chosenfor better visual color representation of the results for theconsidered patterns

Inside and outside views of the nuclear hulls are shownin Figures 3(a)ndash3(d) where colors correspond to expressionlevels as in Figure 1 It can be seen that the coloring is vari-egated (nuclei of different color occupy similar positions)reflecting the presence of noise

The method described in Section 3 produces the recon-structed pattern depicted in Figures 3(e) and 3(f) The resultsof noise removal can clearly be seen

For better visual representation the nuclei themselves canbe depicted see Figures 4 and 5 As before the color of anucleus reflects the expression level in the same scale as inFigure 1 The expression pattern can be clearly seen in thedenoised data Difference between the inside and the outsideviews demonstrates pattern dynamic in depth direction

After pattern extraction the noisemodel can be estimated(see Section 36) In (3) multiplicative noise with 120572 = 1 wassimulated Applying the Park method provides an estimateof = 1073 recovering the multiplicative character of

BioMed Research International 7

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 3 CAP-2EXP the nuclear hulls coloring based on original and pattern expression intensities

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 4 CAP-2EXP nuclei colored an outside view

the generated noise and demonstrating how the process candistinguish between for example additive Poissonian andmultiplicative noise in datasets

42 Bell-Shaped Expression Pattern We now test the shaped3D-SSA process on a different intensity pattern using thesame nuclear data gsc ntla wt t008 ch01csv We constructa bell-shaped intensity pattern (referred to as CAP-BELL) asan approximation of 0-1 expression indicators in the data

plus white Gaussian noise with standard deviation 025The approximation was performed by twofold 3D-SSAsmoothing and therefore it cannot be expressed by anequation

Inside and outside views on the nuclear hulls are shownin Figures 6(a)ndash6(d) where colors correspond to expressionlevels in the topographic scale described in the caption ofFigure 1 It can be seen that the coloring is variegated signi-fying the presence of noise

8 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 5 CAP-2EXP nuclei colored an inside view

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 6 CAP-BELL the nuclear hulls coloring based on original and pattern expression intensities

The method described in Section 3 produces the patterndepicted in Figures 6(e) and 6(f) it can be seen that the 3D-SSA procedure has removed the original noise

For better visual representation the nuclei themselves canbe depicted see Figure 7 As before the color of a nucleusreflects the expression level in the topographic scale describedin Figure 1 The pattern (trend) can clearly be seen in thedenoised data

Estimating the noise model by the Park method returnsan = 0005 close to zero indicates additive noise theprocess has recovered the simulated Gaussian white noise

43 The Chosen Parameters After flattening as described inSection 32 a parallelepiped with length 119897 asymp 1030 width 119908 asymp

885 and depth119889 asymp 50 Since the shape of the flattened nuclear

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 5: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

BioMed Research International 5

For data in equatorial strips we use the equidistantcylindrical projection [38 Section 12] maintaining distancesof points to the equator (ie latitudes) and distances betweenpoints on the equator This gives the similar coordinates 120595120593 and 119889 (latitude longitude and depth measured in metricalunits) If an equatorial strip encircles with the whole equatorwe obtain a parallelepiped with circular topology on theequatorial coordinate and can apply the circular version ofSSA [39]

Note that we obtain new coordinates in approximately thesame units (and the same proportions) as the original dataThis is the main purpose of using equidistant projectionssince proportions can strongly influence the interpolation toa regular grid

We use relative coordinates for the flattened dataThus inall pictures 119889 is reported as a percentage with 0 at the innersurface and 100 at the outer surface 120595 120593 are reported asfractions of the equator length

33 Interpolation of Nuclei to a Regular Grid and Back Inter-polation of irregular data to a regular grid is known as a ldquo3Dscattered data interpolation problemrdquo (see [40] for a descrip-tion and an overview of different approaches)

We use a ldquotriangulated irregular network-based linearinterpolation using Delaunay triangulationrdquo approach wherethe interpolation is constructed by a linear interpolationof the vertex values from the corresponding triangulationsimplex Implementation is performed with the help of thelibrary ldquoCGALrdquo [41]

Note the nuclei do not necessarily pack the whole par-allelepiped and edge effects are a consideration Thereforeafter interpolation we obtain the expression data on a subsetof the parallelepiped grid For cap-shaped data for examplethe subset is a disc This subset can be processed with theshaped version of 3D-SSA

For back-interpolation to nuclei the conventional trilin-ear interpolation is used for pattern reconstruction Residualvalues are calculated as differences between original andextracted-pattern (trend) expression values

34 Shaped 3D-SSA Shaped 3D-SSA methods need originaldata to be given on a subset of a regular grid (which we callthe initial shape)

A parameter of the method is the shaped 3D windowwhich is inscribed in a parallelepiped of sizes 1198711 times 1198712 times 1198713

It is assumed that the chosen window covers all the pointsof the original shape If not the uncovered grid points areremoved For a properwindow shape the number of removedpoints should not be large If so another window shape or sizeshould be chosen

Shaped 3D-SSA can be considered as a particular case ofshaped SSA (see eg [37 39]) To outline the algorithm wedo the following

Algorithm of Shaped 3D-SSA(1) We will consider all possible locations of the win-

dow within the original data array and denote theirnumber as 119870 For each location of the window we

unfold the shaped window into a vector of length 119871The obtained vectors are put together into a so-calledtrajectory matrixThe trajectory matrix X has a certain structure sincethe windows cover the array points several times a lotof matrix elements coincide Such matrices are calledquasi-Hankel [42] and form a linear space We willdesignate this spaceHDefine the embedding operator T(X)X rarr H Thisoperator is linear is an injection (if each point ofthe initial shaped array is covered by windows) andtherefore is invertible

(2) Construct the singular value decomposition (SVD) ofthe trajectory matrix X X = sum

119889

119894=1 120590119894119880119894119881T119894 The eigen-

vectors 119880119894can be folded to eigenarrays which have

the same shape as the window has(3) Choose a subset 119868 sub 1 119889 and put X

119868= sum119894isin119868

X119894

where X119894

= 120590119894119880119894119881

T119894 Conventionally the choice is

119868 = 1 119903 119903 lt 119889 what corresponds to an approx-imation of X by a low-rank matrix X

119868

(4) Project the matrix X119868to the space H and obtain the

reconstructed image (pattern) asX119868= Tminus1ΠHX

119868

Note that by additivity X119868

= sum119894isin119868

X119894 where X

119894=

Tminus1ΠHX119894are called elementary components Therefore the

forms of elementary components can be used for detection ofpattern components

35 Choice of Parameters Decomposition of gene expressiondata on irregular nuclear positions has several parameters forinterpolation and flattening and for SSA (the window shapeand size and the number 119903 of components for reconstruc-tion)

In order to obtain a sufficient number of grid points withrespect to the number of nuclei steps of the regular gridshould not be too large The upper bound for the grid pointsis limited only by computational costs

Recommendations for 3D-SSA are similar to that for 2D-SSA and 1D-SSA given in [33 34 37] Larger windows cor-respond to more refined decomposition and more accuratereconstruction if the signal (pattern) has a simple structuregenerating a few SVD components For more complex pat-terns medium to small windows are preferable

Note that window size ismeasuredwith respect to patternfeatures and should not depend on interpolation step There-fore window sizes are measured as a percentage of imagesizes not in the number of grid points Starting window sizescan be chosen as approximately 10ndash20 of the image sizesin each direction If the pattern is extracted imprecisely thewindow can be enlarged if the pattern is mixed with theresidual (noise) the window should be decreased

Identification of pattern components can be performedby analyzing the forms of eigenarrays or elementary recon-structed components (see [37]) Slowly varying patterns canbe constructed readily by accumulation of slowly varyingelementary components Since it is difficult to perform avisual analysis of 3D objects it is preferable to depict slices

6 BioMed Research International

(1D graphs or 2D images) obtained by fixing one or twocoordinates

Quality of pattern extraction can be checked by means ofresidual behaviour For proper pattern extraction residualsshould vary around zero Thus it is recommended to chooseslowly varying elementary components such that the residualhas no part in the pattern

36Model of Residuals For understanding biological sourcesof noise in gene expression it is of interest to extract from thedata a model of the noise (functional relation of the leadingmoments of gene expressions) We suggest a method formodel detection based on a standard test of heteroscedasticityof residuals with different normalizations

For a decomposition of initial data into pattern and noise119909119894= 119904119894+ 119903119894 119894 = 1 119873 where 119873 is the number of nuclei

(enumeration by one index instead of three does not affectthe results) assume that noise in nuclei is independent andconsider the model

119903119894= 120576119894sdot1003816100381610038161003816119904119894

1003816100381610038161003816120572 (1)

where E120576119894= 0 and D120576

119894= const

If 120572 = 0 the noise is additive (its standard deviationdoes not depend on pattern values) If 120572 = 1 the noiseis multiplicative (its standard deviation is proportional topattern values) The intermediate value 120572 = 05 correspondsto Poissonian noise where noise variance is proportional topattern value

The Park method [43] estimates 120572 as the slope of a linearregression in the model

log 10038161003816100381610038161199031198941003816100381610038161003816 = log120590+120572 log 1003816100381610038161003816119904119894

1003816100381610038161003816 + ]119894 (2)

where ]119894= log|120576

119894120590| is a well behaved error term The Park

method appears to be robust to the distribution of theresiduals An estimate of 120572 in model (2) can be obtained forexample by the least-squares method

37 Implementation We implemented all of the describedmethods in R [44] and included them in our BioSSA package

For construction of convex and nonconvex hulls (theldquoalphashaperdquo [45] method is used) the library ldquoqhullrdquo [46]is used by the wrapping R-packages (alphashape3d [47]geometry [48]) For 3D spatial interpolation we implementeda special R-packagewith help of the library ldquoCGALrdquo [41] sinceavailable R implementations of 3D spatial interpolation havevery poor performance

For trilinear interpolation the R-package oce [49] is usedFor 3D-SSA we use the Rssa R-package [50] This imple-

mentation is highly effective since it uses the approachesfrom [31 37 51]

4 Example for Spherical-Cap Nuclear Pattern

Here we work through an application of the 3D-SSA pro-cedure on the close-to-spherical zebrafish egg Nuclear coor-dinates are taken from the default MatchIT [4] example

The ldquoMatchITrdquo tool was run with the default dataset andparameters producing files with automatically detectednuclei Processing is on the file ldquogsc ntla wt t008 ch01csvrdquocontaining nuclear coordinates for the cells expressing the gsc(ldquogoosecoidrdquo) gene at the ldquolate shieldrdquo developmental stage

In this data nuclei are located in a thick layer on a sphereEach nucleus is marked by ldquo1rdquo or ldquo0rdquo for the gene expressingor not respectively For this data set gsc-expressing nuclei(ldquo1rdquo-s) are located within a small spherical cap We extendthe analyzed area to include all ldquo1rdquo-nuclei plus a surroundingring of nuclei

We consider two artificial (intensity) expression patternsfor these nuclei and show that shaped 3D-SSA can soundlyextract both types of patternsThe first kind of pattern is two-exponential analogous to 2D Drosophila patterns analyzedin [35] The second type of pattern approximates the onoff(ldquo1rdquoldquo0rdquo) expression values by a smooth bell-shaped pattern

We first present results for both examples and explain thechoice of parameters and pattern components We then showhow the method can be used to estimate the model of noiseafter extracting the pattern In the considered examples weadd Gaussian noise to patterns however it is not essentialsince the SSA-family ofmethods is stablewith respect to noisedistribution and possible weak dependencies

41 Two-Exponential Expression Pattern Let us construct theexpression values as

119904 (120595 120593 119889) = 1198902120593+9120595+2119889

+ 2119890minus(2120593+13120595+119889) (3)

V (120595 120593 119889) = 119904 (120595 120593 119889) + 120576 sdot 119904 (120595 120593 119889) (4)

where 120595 120593 are relative spherical coordinates 119889 is layer depth(119889 isin [0 1]) and 120576 is white Gaussian noise with standard devi-ation 035 This pattern which we call CAP-2EXP is thesum of two exponentials plus multiplicative noise Note thatthe pattern depends on three coordinates and therefore it isvaried in three directions Moderate noise levels here andin the other considered simulated examples were chosenfor better visual color representation of the results for theconsidered patterns

Inside and outside views of the nuclear hulls are shownin Figures 3(a)ndash3(d) where colors correspond to expressionlevels as in Figure 1 It can be seen that the coloring is vari-egated (nuclei of different color occupy similar positions)reflecting the presence of noise

The method described in Section 3 produces the recon-structed pattern depicted in Figures 3(e) and 3(f) The resultsof noise removal can clearly be seen

For better visual representation the nuclei themselves canbe depicted see Figures 4 and 5 As before the color of anucleus reflects the expression level in the same scale as inFigure 1 The expression pattern can be clearly seen in thedenoised data Difference between the inside and the outsideviews demonstrates pattern dynamic in depth direction

After pattern extraction the noisemodel can be estimated(see Section 36) In (3) multiplicative noise with 120572 = 1 wassimulated Applying the Park method provides an estimateof = 1073 recovering the multiplicative character of

BioMed Research International 7

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 3 CAP-2EXP the nuclear hulls coloring based on original and pattern expression intensities

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 4 CAP-2EXP nuclei colored an outside view

the generated noise and demonstrating how the process candistinguish between for example additive Poissonian andmultiplicative noise in datasets

42 Bell-Shaped Expression Pattern We now test the shaped3D-SSA process on a different intensity pattern using thesame nuclear data gsc ntla wt t008 ch01csv We constructa bell-shaped intensity pattern (referred to as CAP-BELL) asan approximation of 0-1 expression indicators in the data

plus white Gaussian noise with standard deviation 025The approximation was performed by twofold 3D-SSAsmoothing and therefore it cannot be expressed by anequation

Inside and outside views on the nuclear hulls are shownin Figures 6(a)ndash6(d) where colors correspond to expressionlevels in the topographic scale described in the caption ofFigure 1 It can be seen that the coloring is variegated signi-fying the presence of noise

8 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 5 CAP-2EXP nuclei colored an inside view

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 6 CAP-BELL the nuclear hulls coloring based on original and pattern expression intensities

The method described in Section 3 produces the patterndepicted in Figures 6(e) and 6(f) it can be seen that the 3D-SSA procedure has removed the original noise

For better visual representation the nuclei themselves canbe depicted see Figure 7 As before the color of a nucleusreflects the expression level in the topographic scale describedin Figure 1 The pattern (trend) can clearly be seen in thedenoised data

Estimating the noise model by the Park method returnsan = 0005 close to zero indicates additive noise theprocess has recovered the simulated Gaussian white noise

43 The Chosen Parameters After flattening as described inSection 32 a parallelepiped with length 119897 asymp 1030 width 119908 asymp

885 and depth119889 asymp 50 Since the shape of the flattened nuclear

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 6: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

6 BioMed Research International

(1D graphs or 2D images) obtained by fixing one or twocoordinates

Quality of pattern extraction can be checked by means ofresidual behaviour For proper pattern extraction residualsshould vary around zero Thus it is recommended to chooseslowly varying elementary components such that the residualhas no part in the pattern

36Model of Residuals For understanding biological sourcesof noise in gene expression it is of interest to extract from thedata a model of the noise (functional relation of the leadingmoments of gene expressions) We suggest a method formodel detection based on a standard test of heteroscedasticityof residuals with different normalizations

For a decomposition of initial data into pattern and noise119909119894= 119904119894+ 119903119894 119894 = 1 119873 where 119873 is the number of nuclei

(enumeration by one index instead of three does not affectthe results) assume that noise in nuclei is independent andconsider the model

119903119894= 120576119894sdot1003816100381610038161003816119904119894

1003816100381610038161003816120572 (1)

where E120576119894= 0 and D120576

119894= const

If 120572 = 0 the noise is additive (its standard deviationdoes not depend on pattern values) If 120572 = 1 the noiseis multiplicative (its standard deviation is proportional topattern values) The intermediate value 120572 = 05 correspondsto Poissonian noise where noise variance is proportional topattern value

The Park method [43] estimates 120572 as the slope of a linearregression in the model

log 10038161003816100381610038161199031198941003816100381610038161003816 = log120590+120572 log 1003816100381610038161003816119904119894

1003816100381610038161003816 + ]119894 (2)

where ]119894= log|120576

119894120590| is a well behaved error term The Park

method appears to be robust to the distribution of theresiduals An estimate of 120572 in model (2) can be obtained forexample by the least-squares method

37 Implementation We implemented all of the describedmethods in R [44] and included them in our BioSSA package

For construction of convex and nonconvex hulls (theldquoalphashaperdquo [45] method is used) the library ldquoqhullrdquo [46]is used by the wrapping R-packages (alphashape3d [47]geometry [48]) For 3D spatial interpolation we implementeda special R-packagewith help of the library ldquoCGALrdquo [41] sinceavailable R implementations of 3D spatial interpolation havevery poor performance

For trilinear interpolation the R-package oce [49] is usedFor 3D-SSA we use the Rssa R-package [50] This imple-

mentation is highly effective since it uses the approachesfrom [31 37 51]

4 Example for Spherical-Cap Nuclear Pattern

Here we work through an application of the 3D-SSA pro-cedure on the close-to-spherical zebrafish egg Nuclear coor-dinates are taken from the default MatchIT [4] example

The ldquoMatchITrdquo tool was run with the default dataset andparameters producing files with automatically detectednuclei Processing is on the file ldquogsc ntla wt t008 ch01csvrdquocontaining nuclear coordinates for the cells expressing the gsc(ldquogoosecoidrdquo) gene at the ldquolate shieldrdquo developmental stage

In this data nuclei are located in a thick layer on a sphereEach nucleus is marked by ldquo1rdquo or ldquo0rdquo for the gene expressingor not respectively For this data set gsc-expressing nuclei(ldquo1rdquo-s) are located within a small spherical cap We extendthe analyzed area to include all ldquo1rdquo-nuclei plus a surroundingring of nuclei

We consider two artificial (intensity) expression patternsfor these nuclei and show that shaped 3D-SSA can soundlyextract both types of patternsThe first kind of pattern is two-exponential analogous to 2D Drosophila patterns analyzedin [35] The second type of pattern approximates the onoff(ldquo1rdquoldquo0rdquo) expression values by a smooth bell-shaped pattern

We first present results for both examples and explain thechoice of parameters and pattern components We then showhow the method can be used to estimate the model of noiseafter extracting the pattern In the considered examples weadd Gaussian noise to patterns however it is not essentialsince the SSA-family ofmethods is stablewith respect to noisedistribution and possible weak dependencies

41 Two-Exponential Expression Pattern Let us construct theexpression values as

119904 (120595 120593 119889) = 1198902120593+9120595+2119889

+ 2119890minus(2120593+13120595+119889) (3)

V (120595 120593 119889) = 119904 (120595 120593 119889) + 120576 sdot 119904 (120595 120593 119889) (4)

where 120595 120593 are relative spherical coordinates 119889 is layer depth(119889 isin [0 1]) and 120576 is white Gaussian noise with standard devi-ation 035 This pattern which we call CAP-2EXP is thesum of two exponentials plus multiplicative noise Note thatthe pattern depends on three coordinates and therefore it isvaried in three directions Moderate noise levels here andin the other considered simulated examples were chosenfor better visual color representation of the results for theconsidered patterns

Inside and outside views of the nuclear hulls are shownin Figures 3(a)ndash3(d) where colors correspond to expressionlevels as in Figure 1 It can be seen that the coloring is vari-egated (nuclei of different color occupy similar positions)reflecting the presence of noise

The method described in Section 3 produces the recon-structed pattern depicted in Figures 3(e) and 3(f) The resultsof noise removal can clearly be seen

For better visual representation the nuclei themselves canbe depicted see Figures 4 and 5 As before the color of anucleus reflects the expression level in the same scale as inFigure 1 The expression pattern can be clearly seen in thedenoised data Difference between the inside and the outsideviews demonstrates pattern dynamic in depth direction

After pattern extraction the noisemodel can be estimated(see Section 36) In (3) multiplicative noise with 120572 = 1 wassimulated Applying the Park method provides an estimateof = 1073 recovering the multiplicative character of

BioMed Research International 7

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 3 CAP-2EXP the nuclear hulls coloring based on original and pattern expression intensities

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 4 CAP-2EXP nuclei colored an outside view

the generated noise and demonstrating how the process candistinguish between for example additive Poissonian andmultiplicative noise in datasets

42 Bell-Shaped Expression Pattern We now test the shaped3D-SSA process on a different intensity pattern using thesame nuclear data gsc ntla wt t008 ch01csv We constructa bell-shaped intensity pattern (referred to as CAP-BELL) asan approximation of 0-1 expression indicators in the data

plus white Gaussian noise with standard deviation 025The approximation was performed by twofold 3D-SSAsmoothing and therefore it cannot be expressed by anequation

Inside and outside views on the nuclear hulls are shownin Figures 6(a)ndash6(d) where colors correspond to expressionlevels in the topographic scale described in the caption ofFigure 1 It can be seen that the coloring is variegated signi-fying the presence of noise

8 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 5 CAP-2EXP nuclei colored an inside view

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 6 CAP-BELL the nuclear hulls coloring based on original and pattern expression intensities

The method described in Section 3 produces the patterndepicted in Figures 6(e) and 6(f) it can be seen that the 3D-SSA procedure has removed the original noise

For better visual representation the nuclei themselves canbe depicted see Figure 7 As before the color of a nucleusreflects the expression level in the topographic scale describedin Figure 1 The pattern (trend) can clearly be seen in thedenoised data

Estimating the noise model by the Park method returnsan = 0005 close to zero indicates additive noise theprocess has recovered the simulated Gaussian white noise

43 The Chosen Parameters After flattening as described inSection 32 a parallelepiped with length 119897 asymp 1030 width 119908 asymp

885 and depth119889 asymp 50 Since the shape of the flattened nuclear

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 7: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

BioMed Research International 7

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 3 CAP-2EXP the nuclear hulls coloring based on original and pattern expression intensities

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 4 CAP-2EXP nuclei colored an outside view

the generated noise and demonstrating how the process candistinguish between for example additive Poissonian andmultiplicative noise in datasets

42 Bell-Shaped Expression Pattern We now test the shaped3D-SSA process on a different intensity pattern using thesame nuclear data gsc ntla wt t008 ch01csv We constructa bell-shaped intensity pattern (referred to as CAP-BELL) asan approximation of 0-1 expression indicators in the data

plus white Gaussian noise with standard deviation 025The approximation was performed by twofold 3D-SSAsmoothing and therefore it cannot be expressed by anequation

Inside and outside views on the nuclear hulls are shownin Figures 6(a)ndash6(d) where colors correspond to expressionlevels in the topographic scale described in the caption ofFigure 1 It can be seen that the coloring is variegated signi-fying the presence of noise

8 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 5 CAP-2EXP nuclei colored an inside view

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 6 CAP-BELL the nuclear hulls coloring based on original and pattern expression intensities

The method described in Section 3 produces the patterndepicted in Figures 6(e) and 6(f) it can be seen that the 3D-SSA procedure has removed the original noise

For better visual representation the nuclei themselves canbe depicted see Figure 7 As before the color of a nucleusreflects the expression level in the topographic scale describedin Figure 1 The pattern (trend) can clearly be seen in thedenoised data

Estimating the noise model by the Park method returnsan = 0005 close to zero indicates additive noise theprocess has recovered the simulated Gaussian white noise

43 The Chosen Parameters After flattening as described inSection 32 a parallelepiped with length 119897 asymp 1030 width 119908 asymp

885 and depth119889 asymp 50 Since the shape of the flattened nuclear

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 8: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

8 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 5 CAP-2EXP nuclei colored an inside view

x

y

z

(a) Original an outside view with shadow

x

y

z

(b) Original an inside view with shadow

x

y

z

(c) Original an outside view without shadow

x

y

z

(d) Original an inside view without shadow

x

y

z

(e) Pattern an outside view without shadow

x

y

z

(f) Pattern an inside views without shadow

Figure 6 CAP-BELL the nuclear hulls coloring based on original and pattern expression intensities

The method described in Section 3 produces the patterndepicted in Figures 6(e) and 6(f) it can be seen that the 3D-SSA procedure has removed the original noise

For better visual representation the nuclei themselves canbe depicted see Figure 7 As before the color of a nucleusreflects the expression level in the topographic scale describedin Figure 1 The pattern (trend) can clearly be seen in thedenoised data

Estimating the noise model by the Park method returnsan = 0005 close to zero indicates additive noise theprocess has recovered the simulated Gaussian white noise

43 The Chosen Parameters After flattening as described inSection 32 a parallelepiped with length 119897 asymp 1030 width 119908 asymp

885 and depth119889 asymp 50 Since the shape of the flattened nuclear

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 9: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

BioMed Research International 9

x

y

z

(a) Original expression intensities

x

y

z

(b) Reconstructed pattern

Figure 7 CAP-BELL nuclei colored an inside view

Eigenvectors

1 (9671) [00048 0006] 2 (067) [minus0013 00079] 3 (005) [minus0013 0021]

4 (004) [minus0015 0016] 5 (004) [minus001 0016] 6 (003) [minus0014 0012]

Figure 8 CAP-2EXP 2D slices of 3D eigenarrays at 119889 = 11987132

cloud is similar to a spherical cap the equidistant azimuthalprojection was applied

The stepsize was chosen the same in all directions toobtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in a parallel-epiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are equal to 40

of the original image sizes The total number of nuclei

in the data file is 3595 while the chosen window coversapproximately 160 nuclei on average The number of nucleicovered by all positions of the chosen window is 3306 thatis a few side nuclei were not considered

To identify pattern components let us examine 2D slicesof 3D eigenarrays and elementary reconstructed componentsFor example for the CAP-2EXP data (Section 41) Figure 8shows slices of the six leading eigenarrays at a depth value of

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 10: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

10 BioMed Research International

Reconstructions

Original [17 11] F1 [35 58] F2 [minus024 24] F3 [minus049 17]

F4 [minus05 077] F5 [minus013 031] F6 [minus011 019] Residuals [minus37 5]

Figure 9 CAP-2EXP elementary component reconstructions slices

11987132The first two ellipsoidal eigenarrays are smooth and thethird one has some oscillationsTherefore we choose the firsttwo components for pattern reconstruction

The six leading elementary components which are gener-ated by the six eigenarrays depicted in Figure 8 together withthe original and residual 2D slices are shown in Figure 9

Figure 9 confirms the choice of the two leading compo-nents (ie 119903 = 2) as corresponding to the pattern

The pattern is reconstructed from the leading eigenarrayson the regular grid points then interpolated back to flattened(irregular) nuclei and then transformed to the originalnuclear positions on the zebrafish egg

A similar choice of pattern components can be made forthe CAP-BELL data Here there are 119903 = 3 smooth patterncomponents not surprising due to the more complex form ofthe pattern

The reconstruction result is quite robust to the windowchoice For example the pattern reconstruction is approxi-mately the same if we choose a window size of 35 instead of40 of the original size However formore complex patternssmaller window sizes are preferable see discussion regardingthe choice of window in [34] (1D case) and [31] (2D case)

44 Check of Proper Pattern Extraction Inspection of thereconstructed images clearly demonstrates the noise removalfor both test patterns However this does not prove that

we have reconstructed the whole pattern We now show onthe CAP-2EXP example that the pattern reconstruction iscomplete

Let us consider 2D slices of the reconstructed values onthe regular grid of flattened nuclei fixing the depth Thevertical axis will represent expression values (color mappedin eg Figure 3) and the horizontal axes correspond to 120601 120595nuclear or grid-point positional coordinates

Since the nuclei may not be located exactly on the slicewe consider nuclei from the layer plus-minus 10 to eachside The extracted pattern on the nuclear slice is depictedby a solid surface with nuclear expression values shown asindividual dots see Figure 10(a) In Figure 10(b) the residualsare obtained by subtraction of the pattern from the nuclearvaluesThe even scatter of the residuals around the zero ldquo119909119910rdquo-plane indicates a good fit of the reconstruction to the data

For a more refined analysis we can construct 1D slicesFor example fixing the depth at 825 and the width (120593)at 50 we consider the two-exponential pattern for nucleifrom a thin layer of (plusmn10 around this depth and width)Theextracted pattern on the nuclear slice is depicted by a solidline Figure 11 confirms the quality of the reconstruction withthe residuals between the data and the reconstruction evenlyspaced around the zero plane Note that for estimation of thenoisemodel the choice of 1 2 3 or 4 leading components haslittle effect on the results

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 11: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

BioMed Research International 11

00

005005

Valu

es

minus005 minus005120595

120601

(a)

0

2

0 0050005

Resid

uals

minus005 minus005

minus2

120595120601

(b)

Figure 10 CAP-2EXP pattern 2D slice with expression values depth 119889 = 825 (a) the surface depicts the reconstructed pattern (in theregular grid points) the points show expression values in individual nuclei from a plusmn10 layer (b) nuclear residuals from the reconstructionare scattered evenly around the zero plane

Expr

essio

n

2

4

6

8

000 005 010minus010 minus005

Spatial coordinate 120595

(a)

Expr

essio

n

0

2

4

000 005minus005

minus2

minus4

Spatial coordinate 120595

(b)

Figure 11 CAP-2EXP pattern 1D slice at 825 depth (119889) and 50width (120593) (a) Pattern on the grid (curve) and original values on the nuclei(points) (b) Residuals between the reconstruction and the nuclear data values

45 Graphical Analysis of Residual Model To check thecorrectness of the model determined by the Park method wecan do a visual inspection of plots of the residuals divided bythe trend in a degree on the vertical against the trend on thehorizontal In such plots the normalized residuals 119903()

119894will be

most evenly distributed around 119910 = 0 (homoscedastic) for -values closest to real 120572

Figure 12 shows such plots for = 0 05 1 Figure 12(c)shows the most homoscedastic residuals (also note that themoving median of the absolute residual values magenta is

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 12: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

12 BioMed Research International

|trend|

Resid

uals

0

5

4 6 8 10 12

120572 = 0

minus5

(a)

Resid

uals

trend

^05

0

1

2

|trend|4 6 8 10 12

120572 = 05

minus2

minus1

(b)

|trend|4 6 8 10 12

Resid

uals

trend

00

05

120572 = 1

minus05

(c)

Figure 12 CAP-2EXP pattern normalized residuals (a) Test for the additive ( = 0) noise model (b) Test for Poissonian noise ( = 05)(c) Test for multiplicative noise ( = 1) The homoscedasticity of the residuals on (c) indicates that the noise is multiplicative confirming thenoise model used in simulated generation of the data (3)

the most horizontal) this indicates that 1 is the best estimatefor 120572 in this case as expected for the multiplicative noisemodel in the CAP-2EXP simulated data (3)

5 Example for Equatorial StripNuclear Pattern

In this section we analyze a qualitatively different expressionpattern at a different stage of zebrafish development We usethe file 120618a Localn0t1emb from the ldquoAtlasITrdquo tool [4]labelled for the ntla gene 63 hours after fertilization Expres-sion is labelled in binary as above These data belong not to

a particular embryo but to a common 3D template in theatlas Either individual embryos or common templates can beused since we are using experimental nuclear positions notintensities

The nuclei cover a half-sphere but the gene-expressingnuclei (those labelled ldquo1rdquo) are located in a narrow strip nearthe equator Equatorial strips can be nonconvex and of vary-ing width hence we needmore sophisticated processing thanabove

As before we will start with pattern extraction and thenexplain the specifics of the procedure and the parametervalues

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 13: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

BioMed Research International 13

y

x

z

Figure 13 STRIP-2EXP pattern (3) nuclear hull coloring based on the original simulated data outside view with shadow

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 14 STRIP-2EXP colored nuclear hull outside views without shadow

51 Two-Exponential Expression Pattern We model the pat-tern of gene expression along the strip (STRIP-2EXP) as

V (119909 119910 119911) = 119890(119910minus50)250

+ 6119890(50minus119910)500 + 120576 (5)where 120576 is white Gaussian noise with standard deviation 025

Outside views on the nuclei hull are shown in Figures13 and 14(a) with colors representing expression levels inthe scale defined in Figure 1 Color variegation indicates thepresence of noise also note that the data area is not convex

The method described in Section 3 produces the patterndepicted in Figure 14(b) The noise removal can clearly beseen The area of the reconstructed pattern is visibly smallerthan the original one since the ellipsoidal window can notreach narrow parts of the original area

Figures 15(a) and 15(b) compare the reconstruction pat-tern to the original expression data plotted on separatenuclei

52 Parameter Selection and Result Validation After theflattening procedure described in Section 32 we apply themethod ldquoalphashaperdquo [45] with a parameter equal to 5000for construction of a nonconvex hull The bounding box haslength 119897 asymp 450 width 119908 asymp 85 and depth 119889 asymp 30 For thenuclear cloud in a strip near the equator we applied theequidistant cylindrical projection

As in Section 4 equal step size in all directions was usedto obtain 106 grid points

For the shaped 3D-SSA algorithm described inSection 34 the ellipsoid window was inscribed in aparallelepiped of size 1198711 times 1198712 times 1198713 where the 119871

119894are 35 of

the original image sizes The total number of nuclei is equalto 492 the chosen window covers approximately 19 nuclei onaverage The number of nuclei covered by all positions of thechosen window is 360 that is one quarter of the nuclei werenot considered

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 14: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

14 BioMed Research International

y

x

z

(a) Original expression intensities

y

x

z

(b) Reconstructed pattern

Figure 15 STRIP-2EXP colored nuclei outside views without shadow

As in Section 4 analysis of elementary components helpsto detect the pattern (trend) components 1D slices (Figure 16)confirm that the pattern is described by the two smoothleading components while the third and fourth componentscan reasonably be assigned to the residuals

Due to the small number of nuclei in the equatorial stripswe do not estimate the noise model

6 Discussion and Conclusions

What is the biological impact of extracting denoised (andregularized) data for 3D gene expression In particular whatcan be learned from 3D noise filtering and noise analysis

61 Gene ExpressionVariability andNoise Aswith 1D expres-sion data (profiles) and 2D data (expression surfaces) [26 3135 52] 3D expression data demonstrates pattern variabilityfrom embryo to embryo as well as expression noise betweenneighboring nuclei in the same embryo

Gene Expression VariabilityAs observed by Castro-Gonzalezand colleagues [4] 3D gene expression volumes can vary sig-nificantly from embryo to embryo at the same developmentalstageThis canmanifest as additional rows of cells around thecore gene-expressing domain ([4 Figure S16])

Gene Expression Noise Within-embryo expression noises(differences between neighboring nuclei) are observable bothin qualitative and quantitative representations of the geneexpression data [4] Figure 17 shows qualitative-type noisein the mixing of expressing (green) and nonexpressing (red)nuclei along a pattern boundary

Binary (onoff) data allows us to observe some degreeof noise but only as ldquoruggednessrdquo at the expression domainboundary With quantitative data such as in Figure S10from [4] noise between neighboring nuclei can be observedthroughout the expression domain

62 Prospects for GRN Modeling The work in [4] studied9 genes which are components of the axial mesendodermgene regulatory network (GRN) proposed by Chan et al[53] In particular [53] proposed that a subset of 3 of thesegenes (Foxh1 sox32 and gsc) forms the regulatorymechanismfor territorial exclusion during endomesoderm specificationthe common activator Foxh1 activates both the endodermtranscription factor sox32 and the mesoderm transcriptionfactor gsc and sox32 then turns off expression of the meso-derm activator (gsc) in endoderm-lineage cells [53]

The next step in a systems biology approach would be todevelop a mathematical model for such a GRN motif whichcould generate the proper gene expression patterns in thebiological tissue geometries As we have shown in our testcases the 3D-SSA approach described here can extract suchexpression patterns from complex biological geometries Inaddition correspondence between the mathematical modeland the expression data can be made at an intermediate stageof the procedure for example on the flattened regular gridwhich may be much easier than operating in the originalcoordinates

63 Prospects to Analyze 3D Expression Noise Mathematicalformulations of GRNs are most commonly developed asdeterministic models and as discussed above 3D-SSA canprovide clear pattern trends for matching and developingsuch models Increasingly as a next step models are alsobeing formulated for gene expression noise such stochasticmodelling can by treating the variation as well as the meanof the expression serve to greatly reduce the potential modeldynamics and parameters as well as to characterize howbiophysical mechanisms modulate noise (eg [27]) SSA iswell-established and has been used effectively for extractingand analyzing noise from expression data in 1D (subsectionbelow) and 2D (below) for a number of years This hasallowed for estimations of the model of noise (120572 values) as

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 15: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

BioMed Research International 15

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(a)Ex

pres

sion

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(b)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(c)

Expr

essio

n

60

65

70

75

80

000 005 010minus015 minus010 minus005

Spatial coordinate 120601

(d)

Figure 16 STRIP-2EXP 1D slices for 50 depth (119889) and width (120595) signal reconstructed by 1 (a) 2 (b) 3 (c) and 4 (d) components

well as providing data against which to test stochastic modeloutput (eg [26])

Extraction of 1D Expression Profiles and Noise While 1Dexpression data (spatial profiles) have been publicly availablefor more than a decade (ldquoFlyExrdquo [54] and ldquoBID BDTNPrdquo[2 19] Web resources) few publications have been devotedto decomposition of the data into biological trend and noisecomponents Wu and coauthors [55] extracted the ldquoBicoidrdquo(Bcd) transcription factor noise to compare to stochasticsimulations but relied on parameterizing the Bcd profileas an exponential However a nonparametric SSA-relatedapproach showed a much closer fit for the sum of two

exponentials [35] SSA was also used to show that Bcd noisefollows an additive model for the considered dataset [36]while the multiplicative model was correct for the hb factornoise [26] On the Drosophila gene hb SSA showed that amajor component of the total biological noise was ldquotexturenoiserdquo from the precellular compartmentalization of theembryo [52]

2D Expression Surfaces and Expression NoiseThere have beenless studies on expression data in 2D (expression surfaces)He and coauthors [56] studied nucleus-to-nucleus Bcd dif-ferences and found a signal-dependent rise in variability forlower Bcd intensities ([56 Figure S1]) Both cited publications

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 16: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

16 BioMed Research International

y

x

z

Figure 17The distribution of nuclei (red and green dots) in a shield stage early zebrafish embryo with green dots corresponding to the nucleiexpressing gene ntla Higher magnification in the insert shows mixing of green and red (ntla off) nuclei along the boundary of the expressiondomain

on the Bcd noise [55 56] used 2D data that is stripesfrom the confocal images with several hundred nuclei andthose stripes do have not only a length but a width tooFurther analysis of the inherently 2D data as 1D ignoring thewidth can introduce substantial and unavoidable source ofnoise and bias conclusions on the biological noise Howeverapplication of nonparametric 2D-SSA directly to 2D datatakes into account a spatial distribution of expression inten-sities and therefore diminishes a possible bias In particularon both live and fixed marker techniques for Bcd 2D-SSA indicated a signal-independence of the noise [36] Thissuggests that Bcd one of the most studied developmentalgenes deserves further analytical study

3D Expression Signals and Expression NoiseThe groundworklaid in the 1D and 2D applications of SSA indicates that thepresent extension to 3D can be an effective way to processdata from expression data in complex geometries

For developmental patterning in geometries which aretruly 3D such as the zebrafish embryo or which become 3D(as with Drosophila after gastrulation) 3D-SSA is promisingfor extracting trends and noise from expression data Itcould for example serve as a critical tool for developing adata-driven model for the Foxh1 sox32 and gsc motif bothdeterministic and stochastic

In conclusion our work with 3D-SSA resolves severalproblems in the study of 3D gene expression These are (i)representation of the data in a geometrically ldquoflattenedrdquo formsuitable for further analysis (ii) interpolation of the data toa regular grid (iii) decomposition of the data into signaland noise and (iv) addressing some of the issues (slicing)

for visualizing and evaluating results with four-dimensionaldata (3D + gene expression intensity) We present 3D-SSAas an adaptable and powerful technique for processing andanalyzing the growing amount of gene expression data fromtruly 3D developmental events Since there is an extensionof 3D-SSA to 4D-SSA and even for general nD-SSA theSSA family of methods is potentially able to analyze data ofdifferent spatial and temporal nature

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the NG13-083 Grant of DynastyFoundation US NIH Grant R01-GM072022 and The Rus-sian Foundation for Basic Research Grants 13-04-02137 and15-04-06480

References

[1] N Gorfinkiel S Schamberg and G B Blanchard ldquoIntegrativeapproaches to morphogenesis lessons from dorsal closurerdquoGenesis vol 49 no 7 pp 522ndash533 2011

[2] C C Fowlkes C L L Hendriks S V E Keranen et al ldquoA quan-titative spatiotemporal atlas of gene expression in the Dro-sophila blastodermrdquo Cell vol 133 no 2 pp 364ndash374 2008

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 17: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

BioMed Research International 17

[3] G B Blanchard A J Kabla N L Schultz et al ldquoTissue tect-onics morphogenetic strain rates cell shape change and inter-calationrdquo Nature Methods vol 6 no 6 pp 458ndash464 2009

[4] C Castro-Gonzalez M A Luengo-Oroz L Duloquin et al ldquoAdigital framework to build visualize and analyze a gene expres-sion atlas with cellular resolution in zebrafish early embry-ogenesisrdquo PLoS Computational Biology vol 10 no 6 Article IDe1003670 2014

[5] M A Luengo-Oroz M J Ledesma-Carbayo N Peyrieras andA Santos ldquoImage analysis for understanding embryo develop-ment a bridge frommicroscopy to biological insightsrdquo CurrentOpinion in Genetics andDevelopment vol 21 no 5 pp 630ndash6372011

[6] W Supatto T V Truong D Debarre and E BeaurepaireldquoAdvances in multiphoton microscopy for imaging embryosrdquoCurrent Opinion in Genetics and Development vol 21 no 5 pp538ndash548 2011

[7] D M Chudakov M V Matz S Lukyanov and K A LukyanovldquoFluorescent proteins and their applications in imaging livingcells and tissuesrdquo Physiological Reviews vol 90 no 3 pp 1103ndash1163 2010

[8] A C Oates N Gorfinkiel M Gonzalez-Gaitan and C-PHeisenberg ldquoQuantitative approaches in developmental biol-ogyrdquo Nature Reviews Genetics vol 10 no 8 pp 517ndash530 2009

[9] D Muzzey and A van Oudenaarden ldquoQuantitative time-lapsefluorescence microscopy in single cellsrdquo Annual Review of Celland Developmental Biology vol 25 pp 301ndash327 2009

[10] N Olivier M A Luengo-Oroz L Duloquin et al ldquoCell lineagereconstruction of early zebrafish embryos using label-free non-linearmicroscopyrdquo Science vol 329 no 5994 pp 967ndash971 2010

[11] T V Truong and W Supatto ldquoToward high-contenthigh-throughput imaging and analysis of embryonic morphogene-sisrdquo Genesis vol 49 no 7 pp 555ndash569 2011

[12] E Quesada-Hernandez L Caneparo S Schneider et al ldquoSter-eotypical cell division orientation controls neural rod midlineformation in zebrafishrdquo Current Biology vol 20 no 21 pp1966ndash1972 2010

[13] M A Luengo-Oroz D Pastor-Escuredo C Castro-Gonzalez etal ldquo3D + t morphological processing applications to embryo-genesis image analysisrdquo IEEE Transactions on Image Processingvol 21 no 8 pp 3518ndash3530 2012

[14] C Castro-Gonzalez M J Ledesma-Carbayo N Peyrieras andA Santos ldquoAssembling models of embryo development imageanalysis and the construction of digital atlasesrdquo Birth DefectsResearch Part C Embryo Today Reviews vol 96 no 2 pp 109ndash120 2012

[15] M A Luengo-Oroz J L Rubio-Guivernau E Faure et alldquoMethodology for reconstructing early zebrafish developmentfrom in vivo multiphoton microscopyrdquo IEEE Transactions onImage Processing vol 21 no 4 pp 2335ndash2340 2012

[16] J L Rubio-guivernau V Gurchenkov M A Luengo-oroz et alldquoWavelet-based image fusion in multi-view three-dimensionalmicroscopyrdquo Bioinformatics vol 28 no 2 pp 238ndash245 2012

[17] F Long H Peng X Liu S K Kim and E Myers ldquoA 3D digitalatlas of C elegans and its application to single-cell analysesrdquoNature Methods vol 6 no 9 pp 667ndash672 2009

[18] EPIC Expression patterns in Caenorhabditis httpepicgswashingtonedu

[19] Berkeley Drosophila transcription network project httpbdtnplblgovFly-Net

[20] BioEmergences httpbioemergencesiscpiffr

[21] W J Blake M Kaeligrn C R Cantor and J J Collins ldquoNoise ineukaryotic gene expressionrdquoNature vol 422 no 6932 pp 633ndash637 2003

[22] M B Elowitz A J Levine E D Siggia and P S Swain ldquoSto-chastic gene expression in a single cellrdquo Science vol 297 no5584 pp 1183ndash1186 2002

[23] J M Raser and E K OrsquoShea ldquoNoise in gene expression originsconsequences and controlrdquo Science vol 309 no 5743 pp 2010ndash2013 2005

[24] J E Ladbury and S T Arold ldquoNoise in cellular signaling path-ways causes and effectsrdquo Trends in Biochemical Sciences vol 37no 5 pp 173ndash178 2012

[25] A N Boettiger and M Levine ldquoSynchronous and stochasticpatterns of gene activation in the Drosophila embryordquo Sciencevol 325 no 5939 pp 471ndash473 2009

[26] D M Holloway F J Lopes L da Fontoura Costa et al ldquoGeneexpression noise in spatial patterning hunchback promoterstructure affects noise amplitude and distribution inDrosophilasegmentationrdquo PLoS Computational Biology vol 7 no 2 ArticleID e1001069 18 pages 2011

[27] D M Holloway and A V Spirov ldquoMid-embryo patterning andprecision in Drosophila segmentation Kruppel dual regulationof hunchbackrdquo PLOS ONE vol 10 no 3 Article ID e01184502015

[28] A N Boettiger ldquoAnalytic approaches to stochastic gene expres-sion in multicellular systemsrdquo Biophysical Journal vol 105 no12 pp 2629ndash2640 2013

[29] F Jug T Pietzsch S Preibisch and P Tomancak ldquoBioimageInformatics in the context ofDrosophila researchrdquoMethods vol68 no 1 pp 60ndash73 2014

[30] E Wait M Winter C Bjornsson et al ldquoVisualization and cor-rection of automated segmentation tracking and lineagingfrom 5-D stem cell image sequencesrdquo BMC Bioinformatics vol15 no 1 p 328 2014

[31] A Shlemov N Golyandina D Holloway and A SpirovldquoShaped singular spectrum analysis for quantifying geneexpression with application to the early Drosophila embryo rdquoBioMed Research International vol 2015 Article ID 689745 14pages 2015

[32] D S Broomhead andG P King ldquoExtracting qualitative dynam-ics from experimental datardquo Physica D Nonlinear Phenomenavol 20 no 2-3 pp 217ndash236 1986

[33] N Golyandina V Nekrutkin and A Zhigljavsky Analysis ofTime Series Structure SSA and Related Techniques vol 90 ofMonographs on Statistics and Applied Probability Chapman ampHallCRC Boca Raton Fla USA 2001

[34] N Golyandina and A Zhigljavsky Singular Spectrum Analysisfor Time Series Springer Briefs in Statistics Springer Heidel-berg Germany 2013

[35] T Alexandrov N Golyandina and A Spirov ldquoSingular spec-trum analysis of gene expression profiles of early Drosophilaembryo Exponential-in-distance patterns rdquo Research Letters inSignal Processing vol 2008 Article ID 825758 5 pages 2008

[36] N E Golyandina DM Holloway F J Lopes A V Spirov E NSpirova and K D Usevich ldquoMeasuring gene expression noisein early Drosophila embryos nucleus-tonucleus variabilityrdquoProcedia Computer Science vol 9 pp 373ndash382 2012

[37] N Golyandina A Korobeynikov A Shlemov and K Use-vich ldquoMultivariate and 2D extensions of singular spectrumanalysis with the Rssa packagerdquo Journal of Statistical Softwarehttparxivorgabs13095050

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008

Page 18: Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application ... · 2016-06-14 · Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression,

18 BioMed Research International

[38] J P Snyder Map Projections A Working Manual GeologicalSurvey Bulletin Series US Geological Survey 1987

[39] A Shlemov and N Golyandina ldquoShaped extensions of singularspectrum analysisrdquo in Proceedings of the 21st InternationalSymposium on Mathematical Theory of Networks and Systemspp 1813ndash1820 Groningen The Netherlands July 2014

[40] J P Lewis K Anjyo and F Pighin ldquoScattered data interpolationand approximation for computer graphicsrdquo inACMSIGGRAPHASIA Courses (SArsquo10) pp 271ndash2769 December 2010

[41] S Pion andM Teillaud ldquo3D triangulationsrdquo in CGAL User andReference Manual CGAL Editorial Board 45th edition 2014

[42] B Mourrain and V Y Pan ldquoMultivariate polynomials dualityand structured matricesrdquo Journal of Complexity vol 16 no 1pp 110ndash180 2000

[43] R E Park ldquoEstimation with heteroscedastic error termsrdquo Eco-nometrica vol 34 no 4 p 888 1966

[44] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2014

[45] H Edelsbrunner and E P Mucke ldquoThree-dimensional alphashapesrdquo ACM Transactions on Graphics vol 13 no 1 pp 43ndash72 1994

[46] C B Barber D P Dobkin and H Huhdanpaa ldquoThe quickhullalgorithm for convex hullsrdquoACMTransactions onMathematicalSoftware vol 22 no 4 pp 469ndash483 1996

[47] T Lafarge and B Pateiro-Lopez alphashape3d Implementationof the 3D Alpha-Shape for the Reconstruction of 3D Sets from aPoint Cloud R package version 11 2014

[48] K Habel R Grasman A Stahel and D C Sterratt GeometryMesh generation and surface tesselation R package version 03-52014

[49] D Kelley oce Analysis of Oceanographic Data R package ver-sion 09-14 2014

[50] A Korobeynikov A Shlemov K Usevich and N GolyandinaRssa A Collection of Methods for Singular Spectrum Analysis RPackage Version 011 2014

[51] A Korobeynikov ldquoComputation- and space-efficient imple-mentation of SSArdquo Statistics and Its Interface vol 3 no 3 pp357ndash368 2010

[52] A V Spirov N E Golyandina D M Holloway T AlexandrovE N Spirova and F J P Lopes ldquoMeasuring gene expressionnoise in early Drosophila embryos the highly dynamic com-partmentalized micro-environment of the blastoderm is oneof the main sources of noiserdquo in Evolutionary ComputationMachine Learning and Data Mining in Bioinformatics vol 7246of Lecture Notes in Computer Science pp 177ndash188 SpringerBerlin Germany 2012

[53] T-M Chan W Longabaugh H Bolouri et al ldquoDevelopmentalgene regulatory networks in the zebrafish embryordquo Biochimicaet Biophysica ActamdashGene Regulatory Mechanisms vol 1789 no4 pp 279ndash298 2009

[54] E Poustelnikova A Pisarev M Blagov M Samsonova and JReinitz ldquoA database for management of gene expression data insiturdquo Bioinformatics vol 20 no 14 pp 2212ndash2221 2004

[55] Y F Wu E Myasnikova and J Reinitz ldquoMaster equation simu-lation analysis of immunostained Bicoid morphogen gradientrdquoBMC Systems Biology vol 1 article 52 2007

[56] F He Y Wen J Deng et al ldquoProbing intrinsic properties of arobust morphogen gradient inDrosophilardquoDevelopmental Cellvol 15 no 4 pp 558ndash567 2008