parameter estimation for a deformable template model

Statistics and Computing 11: 337–346, 2001C© 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.

Parameter estimation for a deformabletemplate model

MERRILEE HURN∗, INGELIN STEINSLAND† and H◦AVARD RUE†

∗Mathematical Sciences, University of Bath, Bath BA2 7AY, [email protected]†Department of Mathematical Sciences, NTNU, [email protected]@stat.ntnu.no

Received September 1, 1999 and accepted July 4, 2000

In recent years, a number of statistical models have been proposed for the purposes of high-levelimage analysis tasks such as object recognition. However, in general, these models remain hard touse in practice, partly as a result of their complexity, partly through lack of software. In this paperwe concentrate on a particular deformable template model which has proved potentially useful forlocating and labelling cells in microscope slides Rue and Hurn (1999). This model requires thespecification of a number of rather non-intuitive parameters which control the shape variability ofthe deformed templates. Our goal is to arrange the estimation of these parameters in such a waythat the microscope user’s expertise is exploited to provide the necessary training data graphically byidentifying a number of cells displayed on a computer screen, but that no additional statistical inputis required. In this paper we use maximum likelihood estimation incorporating the error structure inthe generation of our training data.

Keywords: Bayesian image analysis, confocal microscopy, deformable templates, object recognition,Markov chain Monte Carlo, parameter estimation

1. Introduction

The use of high-level models in image analysis has increased inpopularity during the last few years as the problem of interpreta-tion and understanding of images has become more important.The confocal microscopy image in Fig. 1 is one such applica-tion, showing an optical section through cartilage growth platein which two types of cell are present; one stage of the analysisof these data is to identify and label cells on the basis of theirshape and size. Manual segmentation of each cell would be te-dious and time consuming, especially as this image forms justone of a large number of similar images, and so there is a greatneed for some automated procedure. A number of different ap-proachs, both statistical and non-statistical, exist for tackling thetask; Rue and Hurn (1999) concentrate on a high-level statisticalmodel which represents each cell by a polygon outline model,embedding these deformable templates into a marked point pro-cess to handle the unknown number of objects. Although therehas been progress in applying such models recently, they remainsomewhat inaccessible due in part to the difficulty in specify-ing many of the associated parameters. A big step towards the

routine use of these techniques by microscopists would be a wayof automating parameter estimation, both for the parameters de-scribing the stochastic variations in shape for the objects and forthe parameters in the data model. The aim of this paper is to pro-vide such a solution in a way that the only interaction between themicroscope user and the estimation procedures is purely graphi-cal; the end-user supplies computer-drawn templates of the cellsand the training data for parameter estimation then take the formof polygons fitted on the screen to a display of image data.

In Section 2 we briefly review the deformable template modelin the context of confocal microscopy, identifying the parametersto be tackled. In Section 3 we derive our estimation procedure;Section 4 provides an investigation of our methods using bothsimulated and real data.

2. A High-level Bayesian modelfor object recognition

The deformable template approach was pioneered by Grenanderand his co-authors, and recent sources for references canbe found in Grenander and Miller (1994), Jain, Zhong and

0960-3174 C© 2001 Kluwer Academic Publishers

338 Hurn, Steinsland and Rue

Fig. 1. An optical section through cartilage growth plate

Lakshmanan (1996), Baddeley and Van Lieshout (1993), Blakeand Isard (1998) and Dryden and Mardia (1999). We will de-note the scene by x and the recorded pixellated data by y, theposterior of interest then being π (x | y) ∝ π (y | x) π (x). Inthis section, we will review the specification of the componentsof this model, describing how some parameters may be treatedin a fully Bayesian way. Inference for the resulting model willrequire Markov chain Monte Carlo techniques (refer to Gilks,Richardson and Spiegelhalter (1996) for an introduction and listof references).

2.1. The object prior π(x)

2.1.1. The template model for a single cell

There are a number of possible ways in which to describea stochastically deformable shape, and Kent, Dryden andAnderson (1998) describe several in detail. We will concentrateon the deformable edge model used by Grenander and Miller(1994) and Rue and Hurn (1999): Imagine that a prototypicalcell is described by an n-sided template defined by a set of vectorsg0, g1, . . . , gn−1 which give the edges of the polygon (Fig. 2(a)).For example, if one type of cell is characteristically circular, thenthese edges describe a polygonal approximation to a circle. Thetemplate does not have any location information, and we willconsider its first vertex to be located at the origin, the secondto be located at g0, and so on (Fig. 2(b)). In order to accommo-date scaling and rotational effects, the template may be globallyscaled by a scalar R and rotated through an angle α (Fig. 2(c)).These parameters R and α can represent properties of a popu-lation of cells. Then to model natural shape variability occuringbetween cells, each edge gi is subject to a stochastic deforma-tion which incorporates the scaling R, the rotation α plus anadditional edge-wise Gaussian deformation in length and direc-tion (Fig. 2(d)). This final edge-wise effect describes the changein length and direction between the undeformed gi and the newedge prior to global scaling and rotation. Denoting the deformededge as s Rα

i gi where s Rαi is the 2× 2 matrix representing these

changes, we can thus write

1

R

[cos α −sin α

sin α cos α

]s Rα

i gi − gi = ri

[cos θi sin θi

−sin θi cos θi

]gi .

Letting t (0)i = Rri cos (α + θi ) and t (1)

i = Rri sin (α + θi ), deter-mines that

s Rαi =

[R cos α + t (0)

i R sin α + t (1)i

−R sin α − t (1)i R cos α + t (0)

i

].

Conditional on fixed values of R and α, Grenander, Chow andKeenan (1991) suggest a first order cyclic Markov structureon the {t (0)

i } and the {t (1)i } independently with each having an

n-dimensional Gaussian distribution with mean 0 and circulantinverse covariance matrix incorporating the scaling

Σ−1R = 1

R2

β δ δ

δ β δ

δ β δ

. . .. . .

. . .

δ β δ

δ δ β

, (1)

where all other entries are zero. Define the vector t = (t (0)0 ,

t (0)1 , . . . , t (0)

n−1, t (1)0 , . . . , t (1)

n−1)T , then,

t ∼ N2n

(0,

[ΣR 0

0 ΣR

]). (2)

Imposing the constraint that the deformed template mustbe closed will of course destroy the simple structure of (2).However, as it stands knowledge of (2) is sufficient for the im-plementation of an MCMC sampler (see Rue and Hurn (1999)).

Although R and α are fairly clearly interpretable as the sizeand rotation of a cell (or population of cells) in comparison tothe prototype template, the specification of β and δ is far lessclear. Ideally it would be possible to treat all four parametersas hyperparameters in a Bayesian context. Unfortunately to doso would require knowing the ratio of the posterior distributionat different values of the parameters (in order to implement anMCMC sampling algorithm) and this is infeasible because theyeach affect the analytically intractable normalising constant ofthe posterior. It is the parameters β, δ, R and α which we proposeto estimate from user-supplied graphical training data.

Parameter estimation 339

Fig. 2. The stages of data formation

2.1.2. A marked point process for multiple objects

The location-free shape of the previous section can be incorpo-rated into a model for an unknown number of cells by using themarked point process framework of Baddeley and Van Lieshout(1993). Essentially a hard-core marked point process is set upwhere the points of the process represent cell locations (the lo-cation of the first cell vertex) and the marks represent the othervariables required to describe the cells (the deformations of thetemplate). It is possible to allow objects of different types byforming a mixture model using different basic templates withdifferent deformation parameters, where the weights {pi } in themixture model represent the relative frequencies of occurrence;refer to Rue and Hurn (1999) for details.

The use of a hard-core interaction for the marked point pro-cess means that the normalising constant will be too complexfor a fully Bayesian approach. However, previous experience inobject recognition problems suggests that the results are not par-ticularly sensitive to the specification of this part of the model.We propose to use the naive estimators, that is the observed pro-portions of the different cell types in the training data for the{pi }, and the observed frequency per unit area for the parame-

ter of the point process. These estimators will be least accuratein situations where there is significant packing observed in thetraining data. If this is the case, then an alternative is to simu-late from the prior model (having first estimated the deforma-tion parameters) searching over a range of parameter values tomatch the observed and the simulated frequencies. We also notesome recent work by Baddeley and Turner (1998) who use max-imum pseudo-likelihood methods to fit various point processes,although such an approach here would be considerably morecomplex.

2.2. A Poisson likelihood model π(y | x)for confocal microscopy

Confocal microscopes measure emitted photons from a fluo-rescing specimen. We begin by assuming that cells all have atypical fluorescence level µ, while the background emits no flu-orescence. The signal is measured on an 8 bit scale (i.e. on theintegers 0 to 255) and in order to minimise clipping, the signalis operator adjusted at the experimental stage by a scale factor aand a shift b, so that the mean background value is now b, while


the mean foreground value is b + aµ. In addition, the record-ing process involves additive instrument white noise. We willapproximate this combination of a scaled Poisson count and aGaussian perturbation at pixel p by a Gaussian random variablewith mean equal to the black-level b plus the expected scaledPoisson count for p, and with variance equal to the variance ofthe instrument noise σ 2 again plus the scaled expected Poissoncount variance

yp ∼ N(aµI[p∈cell] + b, a2µI[p∈cell] + σ 2

),

where I[p∈cell] is the indicator for whether the pixel is labelledas cell or background. This mean-variance relationship may befaintly discerned from Fig. 1 where the variability in grey level ishigher within cells than that in the background. The {yp} are as-sumed to be conditionally independent given the configuration x.

In this case it is possible to treat µ, a, b and σ 2 as hyperpa-rameters. Our choice of prior distributions is intended mainlyto impose the constraints that the variables are positive, with0 ≤ aµ + b ≤ 255. The noise variance rather than the preci-sion is given a χ2(1) prior distribution since it is expected tobe of the order of a few squared pixel units. The mean fluores-cence is given a fairly diffuse χ2(255) prior reflecting that itis a positive variable expected to take a reasonably large value,although this will vary considerably between specimens. As thedata carry considerable information regarding the parameters,it is intended that these priors will not exert much influence. Inaddition, this quantity of information helps separate the effectsof a and µ. Refer to Hurn and Rue (1997) and Hurn (1998) formore discussion in a similar setting,

σ 2 ∼ χ2(1)

µ∝ χ2(255)

a ∝ U (0, 255)

b ∝ U (0, 255)

I [0 ≤ aµ + b ≤ 255]

The joint posterior distribution of interest is

π (x, σ 2,µ, a, b | y) ∝ π (y | x, σ 2,µ, a, b)π (x)π (σ 2)π (µ, a, b).

Markov chain Monte Carlo methods are needed for the samplingwhich generally will be done one parameter at a time requiringthe easily obtained conditional posterior distributions of eachparameter.

3. Estimation of the deformedtemplate parameters

We now turn to the main goal of this paper, the estimation ofthe deformation parameters for the template prior. We suggestmaximum likelihood-based methods which work with trainingdata generated by the microscopist. The first stage in generatingthese data is for the microscopist to draw on the screen an n-sidedpolygonal template representing the ideal cell; the number andpositioning of the vertices carry information about the ways inwhich real cells may vary from the idealised version. The second

stage requires the microscopist to identify a number of cells inslices of image data and approximate each one by a deformationof the n-sided polygon, the data are thus mouse-input vertexpositions. Although some templates may be invariant to cyclicpermutation of the vertices, not all are, and so vertex orderingis important here. The approach has the benefit that the micro-scopist’s expertise is being used in the data collection, plus therewill be no further user-interaction required in the estimationprocedure.

3.1. The deformation parameters β and δand tracking errors

We derive the likelihood function for β and δ based on the vertexlocations of polygons observed with error. Estimation in thecontext of shape models has generally assumed that the shapeinformation is perfectly observed (see for example Kent et al.(1998)), however given our intended manner of collecting datathis seems unrealistic. The effects of non-inclusion of trackingerror will be considered in Section 4.

We begin by transforming from the deformation model forthe polygon edges to the model for the corresponding vertexlocations. Recall that the first vertex defines the location of theentire polygon,

v j = v0 +∑ j

i=0s Rαi gi , j = 1, . . . , n.

Note that there are n + 1 vertices in the non-closed polygon.Considering the x and y components separately, the vertices canbe written

vx1

vx2

...

vxn

vy1

vy2

...

vyn

=

gx0 gy

0

gx0 gx

1 gy0 gy

1

......

. . ....

.... . .

gx0 gx

1 . . . gxn−1 gy

0 gy1 . . . gy

n−1

gy0 −gx

0

gy0 gy

1 −gx0 −gx

1

......

. . ....

.... . .

gy0 gy

1 . . . gyn−1 −gx

0 −gx1 . . . −gx

n−1

t (0)0

t (0)1

...

t (0)n−1

t (1)0

t (1)1

...

t (1)n−1

+ Iv0 + v Rα (3)

where I is the 2n × 2 matrix whose first column consistsof n 1’s followed by n 0’s, and whose second column is n 0’sfollowed by n 1’s, and vRα is the vector of vertex x and ypositions of the undeformed template with first vertex at zerorotated through α and scaled by R. We write (3) in the formv = Gt + Iv0 + v Rα . The n vertices of the closed polygon arerecorded with error

v i = v i + εi , i = 0, . . . , n − 1 (4)

and we further assume that these errors are normally distributedwith (

εx0 , . . . , εx

n−1, εy0 , . . . , ε

yn−1

)T ∼ N2n(0,Φ).


Since the true location of the polygon is unobservable, we beginby finding the unconstrained distribution of v given R, α,Φ, ε0

and the observable v0, which by (3) and (4) is the distribution ofv = Gt + I(v0 − ε0) + v Rα . Denoting the marginal variance of(εx

0 , εy0 )T by Φ22, the unconstrained distribution is

vT | (R, α,Φ, v0, ε0)

∼ N2n

(I(v0 − ε0) + v Rα, G

[ΣR 0

0 ΣR

]GT

). (5)

To find the distribution of (v1, . . . , vn−1)T | vn = v∗, R, α,

Φ, v0, ε0 for an arbitrary v∗, we reorder the components of vfrom x then y components to the vertex pairs, rewriting (5) as

(v1, v2, . . . , vn)T | (R, α,Φ, v0, ε0)

∼ N2n

([µ1

v0 − ε0

],

[Σ11 Σ12

ΣT12 Σ22

]), (6)

where the partitioning of the mean and variance corre-spond to partitioning into (v1, . . . , vn−1)T and vn . Note thatE(vn | (R, α,Φ, v0, ε0)) = v0 −ε0 by closure of the undeformedtemplate v Rα . Denote the partitioned inverse of the variancematrix [

Σ11 Σ12

ΣT12 Σ22

]−1

=[

Ψ11 Ψ12

Ψ12 Ψ22

]. (7)

We will also rearrange the ordering of the {εi } so that

(ε1, . . . , εn−1, ε0)T ∼ N2n

( [0

0

],

[Φ11 Φ12

ΦT12 Φ22

] ).

(v1, . . . , vn−1)T | (vn = v∗, R, α,Φ, v0, ε0) ∼ N2n−2(µ1 + Ψ−111

12(v0 − ε0 − v∗),Ψ−111 ) (see the Appendix). The particular v∗

of interest is v0, i.e. v0 − ε0, in which case

(v1, . . . , vn−1)T | (vn = v0 − ε0, R, α,Φ, v0, ε0)

∼ N2n−2(µ1,Ψ−111 ). (8)

To find the distribution of the vertices under the closure con-straint, we use the known marginal distribution ε0 ∼ N2(0,Φ22)together with (8), integrating out ε0 to show in the Appendix that

(v1, . . . , vn−1)T | (closure, R, α,Φ, v0)

∼ N2n−2(µ2,Ψ

−111 + IΦ22IT

), (9)

where µ2 and I are the first 2n −2 components of the reorderedI v0+v Rα andI respectively. The distribution of (v1, . . . , vn−1)T

under the same conditioning uses (4). Since (ε1, . . . , εn−1)T maynot be independent of ε0, the resulting distribution is

(v1, . . . , vn−1)T | (closure, R, α,Φ, v0) ∼N2n−2

(µ2,Ψ

−111 + IΦ22IT − IΦT

12 − Φ12IT + Φ11).

(10)

The distribution of v0 arises as the convolution of the distributionof v0 with the distribution of ε0. Assuming that v0 is uniformlydistributed in the observation window, and that the variance ofε0 is small in comparison to the window size, v0 is taken to bealso uniformly distributed in the window. Finally, under an as-sumption of independence of the polygon shapes, the likelihoodfor m cells will be the product of the individual likelihoods.

3.2. Treatment of scaling R and rotation α

In many applications it will not be reasonable to assume that thescale and rotation of each cell is known; the model allows eachcell i to have an associated scaling Ri and rotation αi . We willconcentrate on the case where there is a common unknown Rfor all cells (as would be used in an example where size was adistinguishing characteristic in cell recognition). Treating scaleand rotation as additional nuisance parameters adds parame-ters R, α1, . . . , αm , to the optimisation in order to obtain themaximum-likelihood estimates. However in the Appendix, weshow that the rotation optimisation may be done exactly for anyset of β, δ, R,Φ, thus reducing the computational burden. Byintroducing a rotation parameter for each cell, the number ofparameters is increasing as the number of cells increases, and sothe MLE could in theory be inconsistent. Although we have notheoretical results for this problem, our numerical studies havenot indicated poor large-sample properties of the MLE.

3.3. Variable numbers of sides n

Rue and Hurn (1999) note that for efficient mixing of MCMCalgorithms in this context, it is important to extend the templatemodel to allow it to have a variable number of sides. The param-eters β and δ then have different interpretations depending onn. Kent, Mardia and Walder (1996) propose a parameterisationwhich relates the values at different values of n via two newparameters a0 and a1,

βn = a0/n + 2a1n2, δn = −a1n2. (11)

The invariance of maximum likelihood estimates allows us toestimate a0 and a1 from the estimates of β and δ and thus to findestimates of βn and δn for any other required n.

4. Numerical experiments

4.1. An artificial example

We begin with an artificial example. The basic template for ourexperiments is an eight-sided regular approximation to a circleof radius 1, and the deformation model takes the parametersβ = 100 and δ = − 40. Matlab (The Math Works Inc 1994) hasbeen used for the necessary numerical optimisation; Matlab alsohas the functionality for recording of vertex locations using themouse.


4.1.1. The effect of model misspecification

To assess the importance of deriving the maximum likelihood es-timates under the assumption of measurement error in recordingthe vertices, m = 50 simulated realisations from the deformedtemplate model have been generated using independent mea-surement errors with variance φ2

1 , and estimates of β and δ

have then been found from these data when misspecifying thatφ2

1 = 0. Table 1 shows the mean and median estimates resultingfrom 250 repetitions of this procedure to assess the variabilityof the estimates. Not surprisingly, as the true variance increases,the estimates become severely biased; β and δ both decreasesince the observed shapes demonstrate greater variability andless smoothness than expected.

Similarly, it is possible to fit a model with measurement errorbut where the scale R and rotation α are assumed known but areincorrectly specified. The result is that the observed polygonshave the expected shapes but that in attempting to fit the incor-rectly specified template, φ2

1 is overestimated and the balance ofβ to δ is perturbed to the point of degeneracy (β = 2|δ|). Thislatter effect is due to the extremely high correlations requiredbetween deformations in order to mimic a rotation or a rescale(for example, in the case where there is no measurement errorconsider which deformations must be applied to the four edgesin Fig. 2(a) in order to match it to Fig. 2(c).

4.1.2. Estimation under different error models

Having seen the importance of including measurement errorin the modelling, we now investigate different possible modelsfor the tracking errors. It seems unlikely that the errors will inpractice be independent and so we consider the following threestructures:

Model 1 Independent errors, Φ= φ1 I2n

Model 2 First-order structure for the variance, Φ= [ Φ1 00 Φ1

]where Φ1 has a circulant structure with φ1 down the diagonal,and φ2 on the off-diagonals

Model 3 First-order structure for the inverse variance, Φ−1 =[ Φ1 0

0 Φ1] where Φ1 is as above, although note that φ1 and φ2

have different interpretations under model 2 and 3.

Our training data are a set of m simulated polygonal cell outlines,where m takes the values 25, 50 or 100. In the simplest noisemodel, we have also found estimates when the scale and rota-tion are taken to be unknown (the nuisance parameter R takesthe value 1). Table 2 summarises the results of the numericalexperiments, while Fig. 3 shows the pairwise scatterplots of theestimated parameters for the case m = 50 under the independentnoise model 1 with unknown scale and rotation. Both mean andmedian estimates have been quoted to highlight the skewed na-ture of the sampling distributions, a feature which can also beseen in Fig. 3. Figure 3 also demonstrates the high correlationstructure within the estimates, particularly between β and δ andbetween the deformation parameters and the observation errorvariance. Such relationships might be expected by noting that inthe unconstrained model whose inverse variance is given by (1),

Table 1. Parameter estimates constraining φ21 = 0 for m = 50, 250

repetitions with known R and α, true β = 100 and δ = −40 under dif-ferent true values of φ2

1

Mean MedianTrue noise value estimates estimates

φ21 = (0.0)2 β = 99.974 β = 99.411

δ = −39.856 δ = −39.904φ2

1 = (0.01)2 β = 94.623 β = 94.140δ = −37.003 δ = −36.923

φ21 = (0.05)2 β = 42.253 β = 42.347

δ = −9.668 δ = −9.583φ2

1 = (0.1)2 β = 20.446 β = 18.459δ = −2.008 δ = 0.287

Table 2. The simulation results for various scenarios

Noise Noisemodel 1 model 1 Noise Noise

φ1 = 0.0025 φ1 = 0.0025 model 2 model 3known unknown φ1 = 0.0025 φ1 = 500R and α R and α φ2 = 0.0005 φ2 = −100

m = 25Mean β 104.028 103.159 119.363 157.173Median β 98.915 98.940 98.242 94.870Mean δ −41.318 −30.520 −49.168 −68.168Median δ −39.042 −30.042 −39.638 −37.088Mean φ1 0.00248 0.00242 0.00249 1279.08Median φ1 0.00245 0.00246 0.00239 542.011Mean φ2 — — 0.000510 265.256Median φ2 — — 0.000473 −68.513Mean R — 1.0036 — —Median R — 1.0043 — —

m = 50Mean β 103.755 98.510 104.926 116.685Median β 101.173 97.065 96.726 101.414Mean δ −41.180 −28.797 −42.282 −47.685Median δ −40.806 −27.740 −38.360 −40.567Mean φ1 0.00252 0.00240 0.00246 695.541Median φ1 0.00250 0.00239 0.00245 523.072Mean φ2 — — 0.000488 −5.1085Median φ2 — — 0.000504 −88.9432Mean R — 1.0029 — —Median R — 1.0037 — —m = 100

Mean β 101.602 96.371 103.048 111.526Median β 99.121 96.344 101.340 102.223Mean δ −40.758 −28.281 −41.630 −45.361Median δ −39.783 −28.038 −40.511 −40.444Mean φ1 0.00248 0.00239 0.00247 561.077Median φ1 0.00246 0.00240 0.00247 501.423Mean φ2 — — 0.000476 −61.844Median φ2 — — 0.000462 −90.032Mean R — 1.0025 — —Median R — 1.0030 — —


Fig. 3. The m = 50 case under noise model (1) with unknown scale and rotation

Fig. 4. The two template types and the training data

the variance of any particular t (0)i given the remaining t (0)

−i com-ponents is 1/β while the correlation of two consecutive t (0)

i giventhe remaining components is −δ/β. The near non-identifiabilitydue to the partitioning of the variability of the observed vertex

Table 3. The parameter estimates under different noise models

Noise Noise Noisemodel 1 model 2 model 3

Circular template β 12.641 12.957 16.891δ −3.914 −2.633 0.167R 35.503 35.161 35.586

Oblong template β 12.660 17.349 16.003δ −2.329 3.917 0.082R 59.995 59.155 59.048

Noise parameters φ1 2.612 6.760 0.215φ2 — 3.202 −0.108

−2 ln(max likelihood) 5188.2 5178.7 5206.6

locations into that due to the deformations and that due to theobservational error is one of the reasons why this estimationproblem is particularly difficult.

4.2. Estimation for imaging data

Figure 4 shows the two template types together with 100 polyg-onal cell outlines traced by one of the authors from a seriesof images of cell cartilage. For display purposes, the outlineshave been relocated on a regular grid, the scalings and rotations


Fig. 5. Simulations using the estimated parameter values (top row with-out measurement error, bottom row the same realisations convolved withmeasurement error)

have not been altered. Using these training data, the proposedestimation technique has been applied assuming unknown scaleand rotation and a common measurement error for both tem-plate types. The same three measurement noise models havebeen considered. Table 3 gives the resulting parameter esti-mates together with the quantity −2× the natural logarithm ofthe likelihood at those parameter estimates; this quantity is usedto determine which of the three models is most appropriate

π (v, vn| · · ·)|vn=v∗

π (vn| · · ·)|vn=v∗= (2π )−(n−1)

( |Σ||Σ22|

)−1/2

× exp(−1/2((v − µ1)T Ψ11(v − µ) + 2(v − µ1)T Ψ12(v∗ − v0 + ε0) + (v∗ − v0 + ε0)T Ψ22(v∗ − v0 + ε0)))

exp(−1/2((v∗ − v0 + ε0)T Σ−122 (v∗ − v0 + ε0)))

using the Akaike Information Criterion (noise model 3 in thiscase). Figure 5 shows some simulations from this fitted model;the top row shows 50 realisation from each of the two deformedtemplate models, the bottom row shows the same realisationsconvolved with realisations from the fitted noise model. Notethat the rotations here have been set to the value zero for display-ing these outlines, which consequently appear less ordered thanthose in Fig. 4 as a result of the natural structure of the cartilage.

5. Discussion

User-friendly techniques for parameter estimation in deformabletemplate models are an important issue if such models are to findmore wide-spread acceptance in applied work. We have demon-strated that it is possible to treat all the parameters necessary fora Bayesian object recognition approach in either a fully Bayesianway or requiring only graphical intervention by the microscopeuser. We hope that approaches such as these will facilitate further

work in the area of statistical image analysis in collaboration withthe end-users of such techniques.

6. Appendix: Deriving the likelihood function

A.1. Imposing the constraint vn = v∗

Recalling the notation used in (6), denote v1, . . . , vn−1 by v , and(R, α,Φ, v0, ε0) by “· · ·” then

π (v, vn| · · ·) = (2π )−n∣∣Σ∣∣1/2 exp

{−1

2

[v − µ1

vn − v0 + ε0

]T

×Σ−1

[v − µ1

vn − v0 + ε0

] }

π (vn| · · ·)

= (2π )−1∣∣Σ22

∣∣1/2 exp

{−1

2[vn − v0 + ε0]T Σ−1

22 [vn − v0 + ε0]

}.

Using the same partitioning as in (7)

[Σ11 Σ12

ΣT12 Σ22

] [Ψ11 Ψ12

ΨT12 Ψ22

]=

[I2n−2 02n−2,2

0T2n−2,2 I2

]. (12)

Then the conditional distribution of interest is

To see that this is the density of a N2n−2(µ1 + Ψ−111 Ψ12(v0 −

ε0 + v∗),Ψ−111 ), we need two identities: The first is that

|Σ|/|Σ22| = |Ψ−111 | and the second is that Ψ22 − Σ−1

22 =ΨT

12Ψ−111 Ψ12. We begin with the second, using two of the matrix

equations of (12),

ΨT12Ψ

−111 Ψ12 = (−Σ−1

22 ΣT12Ψ11

)Ψ−1

11 Ψ12

= −Σ−122

(I2 − Σ22Ψ22

) = Ψ22 − Σ−122 (13)

For the first, we apply Frobenius’ theorem for the determinantof a partitioned matrix to Σ−1 and (13)

|Σ−1| = |Ψ11|∣∣Ψ22 − ΨT

12Ψ−111 Ψ12

∣∣= |Ψ11|

∣∣Σ−122

∣∣.A.2. The closure constraint

Since vT | (vn = v0 − ε0, R, α,Φ, v0, ε0) ∼ N2n−2(µ1,Ψ−111 ), to

find the distribution of vT under closure, we use the distribution


of ε0, integrating it out (i.e. considering all possible closures),

π (v| · · ·)π (ε0) =∣∣Ψ−1

11

∣∣−1/2|Φ22|−1/2

(2π )n

× exp

{− 1

2

((v − µ1)T Ψ11(v − µ1) + εT

0 Φ−122 ε0

)}.

Use the notationµ2 to denote the first 2n−2 rows of the reorderedI v0 + v Rα , then

(v −µ1)T Ψ11(v − µ1) + εT0 Φ−1

22 ε0

= εT0

(IT Ψ11I + Φ−1

22

)ε0 − 2εT

0 IT Ψ11(v − µ2)

+ (v − µ2)T Ψ11(v − µ2)

= (ε0 − (

IT Ψ11I + Φ−122

)−1IT Ψ11(v − µ2))T

× (IT Ψ11I + Φ−1

22

)× (

ε0 − (IT Ψ11I + Φ−1

22

)−1IT Ψ11(v − µ2))

+ (v − µ2)T(Ψ11 − Ψ11I

(IT Ψ11I + Φ−1

22

)−1IT Ψ11)

× (v − µ2)

Since ε0 can be seen to be Gaussian, integrating it out leaves thefollowing terms

(2π )∣∣Ψ−1

11

∣∣−1/2|Φ22|−1/2

(2π )n∣∣ IT Ψ11I + Φ−1

22

∣∣ 1/2× exp

{−1

2

((v − µ2)T

× (Ψ11 − Ψ11I

(IT Ψ11I + Φ−1

22

)−1IT Ψ11)(v − µ2)

)}

To see that this is the required density (9), first note by directevaluation that

Ψ11 −Ψ11I(IT Ψ11I+Φ−1

22

)−1IT Ψ11 = (Ψ−1

11 + IΦ22IT)−1

.

It then remains to show that |Ψ−111 + IΦ22IT | |IT Ψ11I +

Φ−122 |−1 = |Ψ−1

11 ||Ψ22|. Consider the constructed partitioned ma-

trix A = [ Ψ−111 I

IT −Φ−122

] and apply Frobenius’ formula for the deter-

minant of a partitioned matrix twice, first treating the top leftquadrant as the first factor, and then the bottom right quadrantas the first factor. This implies that

∣∣Ψ−111

∣∣ ∣∣ − Φ−122 − IT Ψ11I

∣∣ = |A| = ∣∣ − Φ−122

∣∣= ∣∣Ψ−1

11 + IΦ22IT∣∣

or, as required |Ψ−111 |Φ22| = |Ψ−1

11 + IΦ22IT | |Φ−122

+ IT Ψ11I|−1.

A.3. The rotation parameter

Notice that in the log likelihood, the rotation of cell i occurs onlythrough the v Rα contribution to the mean of (v1, . . . , vn−1)T forthat particular cell. Considering one particular cell denote thevariance matrix in (10) by V . Then the term to be minimisedover α for a given β, δ, R and Φ is (v − µ2)T V−1(v − µ2),where α occurs only through µ2 (the first 2n − 2 components ofthe reordered I v0 + v Rα). The scaled and rotated template v Rα

with the constrained vertex at zero can be written as a rotationapplied to an already scaled template:

(v Rα)i =[

cos α − sin α

sin α cos α

](v R)i

=[

(v R)xi −(v R)y

i

(v R)yi (v R)x

i

] [cos α

sin α

]

Writing this in the form (v Rα) j = 1,...,n−1 = v R Pα , the square to beminimised may be re-expressed as ( ˜v −v R Pα)T V−1( ˜v −v R Pα),where ˜v is the first 2n − 2 components of v − I v0. Let-ting Q = [Q1 Q2] denote −2 ˜v

TV−1v R and T = [ T1 T2

T2 T3] denote

(v R)T V−1vR , α minimises

S = QPα + PTα TPα

= Q1 cos α + Q2 sin α + T1 cos2 α + T3 sin2 α

+ 2T2 cos α sin α.

Differentiating with respect to α, and using the tan-transformt = tan(α/2), leads to a quartic in t

(2T2 − Q2)t4 + (4(T1 − T3) − 2Q1)t3 − 12T2t2

+ (4(T3 − T1) − 2Q1)t + (Q2 + 2T2) = 0

which may be readily solved in Matlab to find the minimising α.

Acknowledgments

MAH and HR are grateful to the TMR Spatial and Computa-tional Statistics Network (ERB-FMRX-CT96-0096). We thankDr N. White and Dr R. Errington of the Plant Sciences Depart-ment of Oxford University for providing the confocal data.

References

Baddeley A. and Turner R. 2000. Practical maximum pseudolikelihoodfor spatial point patterns. Australia and New Zealand. Journal ofStatistics 42: 283–322.

Baddeley A.J. and Van Lieshout M.N.M. 1993. Stochastic geometrymodels in high-level vision. In: Mardia K.V. and Kanji G.K. (Eds.),Statistics and Images, Vol. 20. Carfax Publishing, Abingdon,Chapter 11, pp. 235–256.

Blake A. and Isard M. 1998. Active Contours. Springer-Verlag, Berlin.Dryden I. and Mardia K. 1999. Statistical Shape Analysis. Wiley,

Chichester.


Gilks W.R., Richardson S., and Spiegelhalter D.J. 1996. Markov ChainMonte Carlo in Practice. Chapman & Hall, London.

Grenander U., Chow Y., and Keenan D.M. 1991. Hands: A PatternTheoretic Study of Biological Shapes, Research Notes on NeuralComputing. Springer, Berlin.

Grenander U. and Miller M.I. 1994. Representations of knowledge incomplex systems (with discussion). Journal of the Royal StatisticalSociety, Series B 56(4): 549–603.

Hurn M.A. 1998. Confocal fluorescence microscopy of leaf cells: Anapplication of Bayesian image analysis. Journal of the Royal Sta-tistical Society, Series C (47): 361–377.

Hurn M.A. and Rue H. 1997. High-level image priors in confocal mi-croscopy applications. In: Mardia K.V., Gill C.A. and Aykroyd R.(Eds), The Art and Science of Bayesian Image Analysis. Univer-

sity of Leeds, Leeds, pp. 36–43.Jain A.K., Zhong Y., and Lakshmanan S. 1996. Object matching using

deformable templates. IEEE Transactions on Pattern Analysis andMachine Intelligence 18(3): 267–278.

Kent J.T., Dryden I.L., and Anderson C.R. 2000. Using circulantsymmetry to model featureless objects. Bometrika 87: 527–544.

Kent J.T., Mardia K.V., and Walder A.N. 1996. Conditional cyclicMarkov random fields. Advances in Applied Probability (SGSA)28: 1–12.

Rue H. and Hurn M.A. 1999. Bayesian object identification. Biometrika86(3): 649–660.

The Math Works Inc. 1994. MATLAB User’s Guide, Version 4.2c. TheMath Works Inc, Natick.

parameter estimation for a deformable template model

Documents