mathematical theory of recursive symbol systems

Mathematical Theory ofRecursive Symbol Systems

Peter Fletcher

Email: [email protected]

Technical Report GEN0401

Department of Computer Science

School of Computing and Mathematics

Keele University

Keele, Staffs, ST5 5BG

U.K.

Tel: +441782 583260

Fax: +441782 713082

November 2004

Abstract

This paper introduces a new type of graph grammar for representing recursively struc

tured twodimensional geometric patterns, for the purposes of syntactic pattern recogni

tion. The grammars are noisetolerant, in the sense that they can accommodate patterns

that contain pixel noise, variability in the geometric relations between symbols, miss

ing parts, or ambiguous parts, and that overlap other patterns. Geometric variability

is modelled using fleximaps, drawing on concepts from the theory of Lie algebras and

tensor calculus.

The formalism introduced here is intended to be generalisable to all problem do

mains in which complex, manylayered structures occur in the presence of noise.

Keywords: graph grammars, contextfree grammars, recursive symbol processing, pars

ing, syntactic pattern recognition, noise tolerance, affine invariance.

Table of Contentsx1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4x2. Algebraic and analytic preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7x3. Affine transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16x4. Matching a template to an image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30x5. Fleximaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33x6. Networks and homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36x7. The recognition problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1. Introduction

1.1 Aims

This work is an exercise in syntactic pattern recognition. The aim is to represent complex

geometric patterns as symbol systems, in a robust way that can cope with missing or

deformed symbols, variability in the geometric relations between parts of a symbol, and

other types of noise. More generally, the longterm aim of my research is to combine

the expressive power of recursive symbol systems (such as formal grammars) with the

noisetolerance of massively parallel, distributed computational systems (such as neural

networks).

A symbol system is a structure consisting of finitely many elements of various types,

called symbols; a symbol may contain other symbols, and may stand in certain relations

to neighbouring symbols. The permitted relations of containment and neighbourhood

are specified by a grammar, which is also a symbol system. The grammar is finite,

but by means of recursion it can describe infinitely many symbol systems. The most

familiar examples of grammars are those used to describe programming languages and

natural languages; these are called string grammars, because they describe sequences

of symbols. Graph grammars are more powerful than string grammars, as they can

represent more complex, nonsequential configurations of symbols.

The importance of recursive symbol systems was emphasised in the socalled `clas

sical' tradition of artificial intelligence and cognitive science; it became the subject of

controversy between the `classical' tradition and connectionism in the 1980s and 1990s

(Smolensky, 1988; Fodor & Pylyshyn, 1988; Barnden & Pollack, 1991; Dinsmore, 1992;

Horgan & Tienson, 1996; Garfield, 1997; Sougné, 1998; Hadley, 1999). There is no

doubt that recursive symbol systems have an expressive power and computational flex

ibility that is unmatched by any other type of representation. Their big weakness is

their `brittleness': their inability to cope with noise, vagueness, incomplete or contra

dictory information, ambiguity, contextdependency, and so on. The aim of my work is

to overcome this brittleness.

In this paper I shall set up a mathematical theory of noisy symbol systems and

of how they may be used to represent complex, recursively structured twodimensional

geometric patterns. (My choice of geometric patterns as a problem domain is simply for

the sake of convenience; the theory is intended to be of wider application.)

The grammars I shall use are essentially a kind of graph grammar. In my previous

work (Fletcher, 2001) I showed how regular graph grammars describing twodimensional

geometric patterns could be parsed and learned from positive examples. The present

paper goes beyond this in the following ways.� The grammatical framework is extended from regular to contextfree graph gram

mars.� The full geometry of the plane is used, whereas previously the plane was represented

by a discrete square grid.� Noise is permitted, in the form of pixel noise, variability in the geometric relations

between parts, incomplete patterns, and overlapping patterns.4

x1.2 Terminology

We must distinguish between symbol types and symbol tokens. The Roman alphabet

consists of 26 letters, in upper and lower case, which gives 52 symbol types. A particular

occurrence of a symbol type at a particular position, orientation and size is a symbol

token. Thus a text document may contain thousands of symbol tokens, each of which

is an instance of one of the 52 symbol types (ignoring punctuation marks and other

nonletter symbols).

A grammar is a system of symbol types. A pattern is the system of all symbol

tokens present in a given image. A grammar specifies a (usually infinite) set of possible

patterns.

The shape of a symbol type is described by a template, which depicts how a token

of that type would look, occurring at a standard position, orientation and size, without

noise. Each symbol type has a template.

The position, orientation, size, and degree of stretching and shearing of a symbol

token in an image is specified by an affine transformation, known as the embedding of

the token. It is a mapping from the template of the symbol token's type into the image

plane.

1.3 The contents of this paperx2 provides necessary algebraic and analytic preliminaries from the theory of real

finitedimensional vector spaces and linear transformations. All the material here

is standard, but to ensure that this paper is selfcontained and rigorous I have

rederived all results in precisely the form in which I shall need them in the later

sections.x3 sets up the basic theory of affine transformations, which are used to describe the

embedding of symbol tokens in the image plane and the relations of symbol tokens

to one another. I use coordinatefree concepts from the theory of Lie algebras and

tensor calculus but also give coordinatebased calculation techniques for use in a

computational implementation. (See Price (1977), Hausner & Schwartz (1968) and

Boothby (1986) for more details on these mathematical concepts.)x4 deals with templates, which describe the concrete appearance of symbols in the

image plane. The goodness of match of a template with an image is measured by

a correlation function; I calculate the derivative of this function so that it can be

maximised.x5 defines the concept of a fleximap, which is used to represent a flexible affine transfor

mation, i.e., one that can be deformed from its nominal value along certain degrees

of freedom.x6 defines the central concept of a network, which is used to represent symbol systems

(both grammars and patterns). The parsing of a pattern under a grammar is

represented by a homomorphism between networks (rather than by a parse tree, as

in conventional grammatical formalisms).5

xx7 sets up the problem of recognising the pattern in a given image. I state the recog

nition problem formally with the help of a Match function and show how the em

bedding of the pattern can be optimised by gradient ascent on the Match function.

The next step (to be dealt with in a future paper) is to introduce a parsing algorithm to

find the best pattern and homomorphism. The problem of learning the grammar from

a given set of example images is left for future work.

In this paper lemmas and theorems will be numbered consecutively 1, 2, 3, : : : within

each section, and will be cited outside the section by adding the section number: e.g.,

lemma 2.3 is lemma 3 of x2.

6

2. Algebraic and Analytic Preliminaries

2.1 The norm of a vector

Let V be a real finitedimensional vector space. Choose a basis fe1, : : : eng, and define a

norm on V by �� nXi=1

viei

�� =vuut nXi=1

(vi)2.

This norm is said to be induced by the basis.

LEMMA 1. The norm satisfies the usual axioms:

(i) 8u, v2V ju + vj � juj+ jvj(ii) 8k2R8v2V jkvj = jkjjvj

(iii) v 6= 0 ) jvj > 0

Proof. (i) For any u =Pni=1 uiei and any v =Pn

i=1 viei, by the CauchySchwarz inequality

we haveju + vj2 = nXi=1

(ui + vi)2 = nXi=1

(ui)2 + nXi=1

(vi)2 + 2

nXi=1

uivi� nXi=1

(ui)2 + nXi=1

(vi)2 + 2

vuut nXi=1

(ui)2

vuut nXi=1

(vi)2 = juj2 + jvj2 + 2jujjvj = (juj+ jvj)2

so ju + vj � juj+ jvj.(ii) For any k2R and any v =Pn

i=1 viei,jkvj2 = nXi=1

(kvi)2 = k2

nXi=1

(vi)2 = k2jvj2so jkvj = jkjjvj.

(iii) For any v = Pni=1 viei, if v 6= 0 then we must have at least one vi 6= 0, sojvj2 =Pn

i=1(vi)2 > 0.

LEMMA 2. If a norm j � j is induced by a basis fe1, : : : eng, and a second norm j � j0 is

induced by another basis ff1, : : : fng, then the two norms are related by9α2R 8v2V jvj � αjvj0, 9β2R 8v2V jvj0 � βjvj.Proof. Since fe1, : : : eng is a basis, every vector fj can be expressed as a linear combination

fj =Pni=1 Mi

jei, for some real numbers Mij.

Any vector v2V can be expressed in terms of the basis ff1, : : : fng asPn

j=1 bjfj. Then

v = nXj=1

bjfj = nXj=1

nXi=1

bjMijei = nX

i=1

� nXj=1

bjMij

�ei

so, by the definition of the norms and the CauchySchwarz inequality,jvj2 = nXi=1

� nXj=1

bjMij

�2 � nXi=1

� nXj=1

(bj)2�� nX

j=1

(Mij)

2� = jvj02 nX

i,j=1

(Mij)

2

so taking α =qPni,j=1(Mi

j)2 verifies the first half of the lemma. The other half is proved

similarly. 7

x2.2 Convergence

The norm induced by a basis induces a topology on V:

xk ! l as k !1 iff 8ε > 0 9N 8k > N jxk � lj < ε.

By lemma 2, the topology is independent of the choice of basis.

2.3 Vanishing and limited functions

Let V and W be real finitedimensional vector spaces, each with a norm induced by a

basis.� A function f : V ! W is vanishing iff 8ε > 0 9δ > 0 8v2V jvj < δ ) jf (v)j � εjvj.� A function f : V ! W is limited iff 9ε > 0 9δ > 0 8v2V jvj < δ ) jf (v)j � εjvj.By lemma 2, these concepts are independent of the choice of bases. In the lemmas that

follow, U, V and W are any real finitedimensional vector spaces, each with a norm

induced by a basis.

LEMMA 3. Any vanishing function is limited.

Proof. Immediate from the definitions.

LEMMA 4. Any linear function is limited and continuous.

Proof. Let F: V ! W be linear. Let fe1, : : : emg be the basis on V and ff1, : : : fng the

basis on W used to induce the norms. Let A be the n � m matrix representing F, i.e.,

F(ei) = Pnj=1 Aj

ifj. Choose ε = qPmi=1

Pnj=1(Aj

i)2. Then, for any v = Pmi=1 viei in V,

F(v) =Pnj=1

�Pmi=1 Aj

ivi�fj, and, by the CauchySchwarz inequality,jF(v)j2 = nXj=1

� mXi=1

Ajiv

i�2 � nX

j=1

� mXi=1

(Aji)

2�� mX

i=1

(vi)2� = ε2jvj2

so jF(v)j � εjvj. This verifies that F is limited (δ may be chosen arbitrarily), and also

implies that F is continuous, since8x, a2V jF(x)� F(a)j = jF(x� a)j � εjx� aj.LEMMA 5. If f : V ! W is limited then f (0) = 0 and f is continuous at 0.

Proof. There exist positive ε0, δ0 such that8v2V jvj < δ0 ) jf (v)j � ε0jvj.Taking v = 0 gives jf (0)j � ε0j0j = 0, so f (0) = 0 as required.

To show that f is continuous at 0, for any ε > 0, choose δ = min(δ0, εε0

); then8v jvj < δ ) jf (v)j � ε0jvj < ε0δ � ε

as required. 8

xLEMMA 6. If f , g: V ! W are vanishing and k2R then f + g and kf are also vanishing.

Proof. For any ε > 0, we know that there exist δ1 > 0 and δ2 > 0 such that8v jvj < δ1 ) jf (v)j � ε2jvj,8v jvj < δ2 ) jg(v)j � ε

2jvj.

Now, choose δ = min(δ1, δ2). Then8v jvj < δ ) j(f + g)(v)j � jf (v)j + jg(v)j � εjvjso f + g is vanishing.

Secondly, for any ε > 0, we know that there exists δ > 0 such that8v jvj < δ ) jf (v)j � εjkj+ 1jvj.

Hence 8v jvj < δ ) j(kf )(v)j = jkjjf (v)j � εjkjjkj+ 1

jvj < εjvjso kf is vanishing.

LEMMA 7. If f , g: V ! W are limited and k2R then f + g and kf are also limited.

Proof. We know that there exist positive ε1, δ1, ε2, δ2 such that8v jvj < δ1 ) jf (v)j � ε1jvj,8v jvj < δ2 ) jg(v)j � ε2jvj.Now, choose ε = ε1 + ε2 and δ = min(δ1, δ2). Then8v jvj < δ ) j(f + g)(v)j � jf (v)j + jg(v)j � εjvjso f + g is limited.

For the second part, choose ε = (jkj+ 1)ε1 > 0. Then8v jvj < δ1 ) j(kf )(v)j = jkjjf (v)j � εjvjso kf is limited.

LEMMA 8. If f : V ! W is vanishing and g: U ! V is limited then f Æ g is vanishing.

Proof. We know that there exist positive ε1, δ1 such that8u2U juj < δ1 ) jg(u)j � ε1juj.Also, for any ε > 0, we know that there exists δ2 > 0 such that8v2V jvj < δ2 ) jf (v)j � ε

ε1

jvj.Choose δ = min(δ1, δ2

ε1). Then8u juj < δ ) jg(u)j � ε1juj < ε1δ � δ2 ) jf (g(u))j � ε

ε1

jg(u)j � εjujas required. 9

xLEMMA 9. If f : V ! W is limited and g: U ! V is vanishing then f Æ g is vanishing.

Proof. We know that there exist positive ε2, δ2 such that8v2V jvj < δ2 ) jf (v)j � ε2jvj.Also, for any ε > 0, we know that there exists δ1 > 0 such that8u2U juj < δ1 ) jg(u)j � min

� εε2

, δ2

�juj.Choose δ = min(δ1, 1). Then, for any u with juj < δ, we have juj < δ1, so jg(u)j �min

�ε

ε2, δ2

�juj, so jg(u)j � δ2juj < δ2 (since juj < 1). Hence jf (g(u))j � ε2jg(u)j. But we

also have jg(u)j � εε2juj, so jf (g(u))j � εjuj as required.

2.4 Application of the theory to linear transformations

For any real finitedimensional vector spaces V and W, let L(V, W) be the vector space of

all linear transformations from V to W. A basis fe1, : : : emg for V and a basis ff1, : : : fngfor W induce a basis fhi

j j 1 � i � m, 1 � j � n g for L(V, W), where

hij(ek) = � fj if i = k

0 if i 6= k

Then any A2L(V, W) can be represented in terms of this basis by

A = mXi=1

nXj=1

Ajih

ij

giving

A(ei) = nXj=1

Ajifj.

For any element v =Pmi=1 viei of V we have

A(v) = nXj=1

� mXi=1

Ajiv

i�

fj.

Assume that V, W and L(V, W) have the norms induced by these bases. For any

A =Pmi=1

Pnj=1 Aj

ihij we havejAj2 = mX

i=1

nXj=1

(Aji)

2 = mXi=1

jA(ei)j2.

In the lemmas that follow, let U, V, W be any real finitedimensional vector spaces with

norms induced by bases.

LEMMA 10. If v2V and A2L(V, W) then jAvj � jAjjvj.Proof. Let v =Pm

i=1 viei. ThenjA(v)j2 = �� mXi=1

viA(ei)��2 � � mX

i=1

jvijjA(ei)j�2 � mXi=1

(vi)2

mXi=1

jA(ei)j2 = jvj2jAj2using the triangle inequality and the CauchySchwarz inequality. Hence jA(v)j � jAjjvj.10

xLEMMA 11. If A 2 L(V, W) and B 2 L(U, V) then jABj � jAjjBj.Proof. Let fd1, : : : dlg be the basis on U that induces the norm. By the previous lemma,jABj2 = lX

k=1

jAB(dk)j2 � lXk=1

jAj2jB(dk)j2 = jAj2jBj2so jABj � jAjjBj.LEMMA 12. If (Ak) is a sequence in L(V, W) and L 2 L(V, W) then

Ak ! L as k !1 iff 8v2V Ak(v) ! L(v) as k !1.

(A special case of this is:P1

k=1 Vk = L iff 8v2VP1

k=1 Vk(v) = L(v).)

Proof. If Ak ! L as k !1 then, for any v2V,jAk(v)� L(v)j = j(Ak � L)(v)j � jAk � Ljjvjso, by comparison, Ak(v) ! L(v) as k !1, as required.

Conversely, suppose 8v Ak(v) ! L(v) as k ! 1. Let fe1, : : : emg be the basis that

induces the norm on V. ThenjAk � Lj2 = mXi=1

j(Ak � L)(ei)j2 ! 0 as k !1as required.

For any normed vector space S and any ε > 0, let Bε(S) = f v2S j jvj < ε g.LEMMA 13. If (cn) is a real sequence, and

P1n=0 cnln is absolutely convergent for some

positive real l, thenP1

n=0 cnAn is absolutely convergent for all A2Bl(L(V, V)), and if f

is a function satisfying 8A2Bl(L(V, V)) f (A) = 1Xn=0

cnAn

then f is continuous on Bl(L(V, V)).

Proof. Consider any A2Bl(L(V, V)). Let a = jAj < l, θ = l+a2

, and ε = l�a2

. For all n, we

have jcnAnj � jcnanj � jcnlnj, soP1

n=0 cnAn is absolutely convergent by comparison withP1n=0 cnln.

Also, for any X2Bε(L(V, V)), we havej(A + X)n � Anj = jRn(A, X)j � Rn(a, x)

where x = jXj, Rn(A, X) is the sum of the 2n � 1 terms in the expansion of (A + X)n

other than An, and Rn(a, x) is the same sum of terms with A and X replaced by the real

numbers a and x. 11

xNow,

0 � Rn(a, x) = (a + x)n � an = x�(a + x)n�1 + (a + x)n�2a + � � � + (a + x)an�2 + an�1

�� x(nθn�1) since a < θ and a + x < θ� x(θ + ε)n

εsince (θ + ε)n = θn + nθn�1ε + � � �+ εn= x

εln.

HenceP1

n=0 jcnRn(a, x)j exists, by comparison withP1

n=0 jcnlnj, with1Xn=0

jcnRn(a, x)j � x

ε

1Xn=0

jcnlnj = Mx

where M = 1ε

P1n=0 jcnlnj. Hence, by comparison,

P1n=0 jcn(A + X)n � cnAnj also exists,

with 1Xn=0

jcn(A + X)n � cnAnj � 1Xn=0

jcnRn(a, x)j � Mx.

Now, we have A, A+X 2 Bl(L(V, V)), so f (A) =P1n=0 cnAn and f (A+X) =P1

n=0 cn(A+X)n,

so jf (A + X)� f (A)j = �� 1Xn=0

cn(A + X)n � cnAn

�� 1Xn=0

jcn(A + X)n � cnAnj � Mx.

Hence f is continuous at A, as required.

LEMMA 14. If (cn) is a real sequence, andP1

n=2 cnln is absolutely convergent for some

positive real l, and f :L(V, V) ! L(V, V) satisfies8A2Bl(L(V, V)) f (A) = 1Xn=2

cnAn,

then f is vanishing.

Proof. For any A in Bl(L(V, V)) and any n � 2, we havejcnAnj � jcnjjAjn � jAj2l2

jcnlnjsoP1

n=2 jcnAnj converges, by comparison withP1

n=2 jcnlnj, andjf (A)j = �� 1Xn=2

cnAn

�� 1Xn=2

jcnAnj � 1Xn=2

jAj2l2

jcnlnj = MjAj2where M = 1

l2

P1n=2 jcnlnj. Hence we can show that f is vanishing: for any ε > 0, choose

δ = min(l, εM+1

), giving8A jAj < δ ) jf (A)j � MjAj2 � MδjAj � εjAjas required. 12

xLEMMA 15. If f , g: V ! L(W, W) are limited functions, and h: V ! L(W, W) is defined by8v2V h(v) = f (v)g(v), then h is vanishing.

Proof. We know that there exist positive ε1, δ1, ε2, δ2 such that8v jvj < δ1 ) jf (v)j � ε1jvj,8v jvj < δ2 ) jg(v)j � ε2jvj.Now, for any ε > 0, choose δ = min(δ1, δ2, ε

ε1ε2). Then8v jvj < δ ) jh(v)j � jf (v)jjg(v)j � ε1ε2jvj2 � ε1ε2δjvj � εjvj

so h is vanishing.

LEMMA 16. In L(V, V), if xn ! X and yn ! Y as n !1 then xnyn ! XY as n !1.

Proof. For any ε > 0, set ε0 = minf1, ε2(jXj+jYj+1)

g; there exists N1 such that8n > N1 jxn �Xj < ε0,and there exists N2 such that 8n > N2 jyn � Yj < ε0.Now, choose N = maxfN1, N2g. Then8n > N jxnyn � XYj = jxn(yn � Y) + (xn � X)Yj � jxnjjyn � Yj+ jxn � XjjYj� (jXj+ ε0)ε0 + ε0jYj � ε0(jXj+ jYj+ 1) � ε

2< ε,

so xnyn ! XY as n !1, as required.

LEMMA 17. (Cauchy product) In L(V, V), ifP1

n=0 an andP1

n=0 bn converge absolutely

thenP1

n=0 cn also converges absolutely, where cn =Pni=0 aibn�i, and1X

n=0

an

1Xn=0

bn = 1Xn=0

cn.

Proof. Let A0 = P1n=0 janj, B0 = P1

n=0 jbnj, A = P1n=0 an, B = P1

n=0 bn. Assume A0 > 0

and B0 > 0, since otherwise the result is immediate. First I shall show thatP1

n=0 cn

converges absolutely. For any n, we have

nXk=0

jckj = nXk=0

�� kXi=0

aibk�i

�� nXk=0

kXi=0

jaijjbk�ij � nXi=0

nXj=0

jaijjbjj = nXi=0

jaij nXj=0

jbjj � A0B0,

soP1

k=0 jckj exists, as required.

Next I shall derive the formula forP1

n=0 cn. For any n,�� nXk=0

ck � nXi=0

ai

nXj=0

bj

�� = �� X(i,j)2S

aibj

�� X(i,j)2S

jaijjbjj13

xwhere S = f (i, j) j 0 � i � n, 0 � j � n, i + j > n g. Now, if i, j � bn

2 then 2i, 2j � n, so

i + j � n, so (i, j) =2 S. Thus

S � �f1, : : : ng � f1, : : : ng� n �f1, : : : bn2 g � f1, : : : bn

2 g�

and henceX(i,j)2S

jaijjbjj � nXi,j=0

jaijjbjj � b n2 X

i,j=0

jaijjbjj = nXi=0

jaij nXj=0

jbjj � b n2 X

i=0

jaij b n2 X

j=0

jbjj.For any ε > 0, set ε0 = minfA0, B0, ε

A0+B0+1g > 0. We know that there exists N1 such

that 8n > N1 A0 � ε0 < nXi=1

jaij � A0

and there exists N2 such that8n > N2 B0 � ε0 < nXj=1

jbjj � B0.

We also have�Pn

i=1 ai

��Pnj=1 bj

�! AB as n !1, by lemma 16, so there exists N3 such

that 8n > N3

�� nXi=1

ai

nXj=1

bj � AB

�� < ε0.Choose N = max(2N1 + 1, 2N2 + 1, N3). Then, for any n > N, we have bn

2 > N1, N2, so�� nX

k=0

ck � AB

�� nXk=0

ck � nXi=1

ai

nXj=1

bj

��+ �� nXi=1

ai

nXj=1

bj � AB

�� nXi=0

jaij nXj=0

jbjj � b n2 X

i=0

jaij b n2 X

j=0

jbjj+ ε0< A0B0 � (A0 � ε0)(B0 � ε0) + ε0= ε0(A0 + B0 � ε0 + 1) < ε0(A0 + B0 + 1) � ε.

ThusP1

k=0 ck = AB, as required.

2.5 Application of the theory to Rn and matrices

Rn is a finitedimensional vector space. If we choose a basis consisting of unit vectors

then this induces the norm ��8>>>>>>: x1

...

xn

9>>>>>>;�� =vuut nXi=1

(xi)2.

Therefore all the above theory for finitedimensional vector spaces applies to Rn with

this basis and norm. 14

xLet Mnm be the vector space of all n�m matrices. Mnm is isomorphic to L(Rm, Rn)

in the obvious way: a matrix A with components Aji corresponds to the linear transfor

mationPm

i=1

Pnj=1 Aj

ihij. Hence we can transfer across from L(Rm, Rn) to Mnm the basis

and the induced norm, giving jAj =vuut mXi=1

nXj=1

(Aji)2.

Thanks to the isomorphism, lemmas 10–17 continue to hold when the L(V, W) spaces

are replaced by Mnm spaces. In the following sections I shall apply these lemmas both

to L(V, W) spaces and to Mnm spaces.

2.6 Notation

In the following sections I shall sometimes use the notation o(x) to mean a term of the

form R(x), for some vanishing function R, and the notation O(x) to mean a term of the

form R(x), for some limited function R.

15

3. Affine Transformations

3.1 Affine transformations and affine matrices

Any twodimensional affine transformation G: R2 ! R2 may be represented by a matrix

G = 8>>>>: h11 h12 t1

h21 h22 t2

0 0 1

9>>>>;such that, if we represent a point (x1, x2)2R2 by a column vector

(x1, x2) = 8>>>>: x1

x2

1

9>>>>; ,

then application of an affine transformation to a point is represented by matrix multi

plication: G(x) = G x . The homogeneous part of the transformation is represented by

h11, h12, h21, h22, and the translation part is represented by t1, t2. I shall refer to such

a matrix G as an affine matrix. I shall write the combination of two affine transfor

mations as G1 � G2, but shall represent matrix multiplication by juxtaposition; thus,

G1 �G2 = G1 G2 .

3.2 The Lie group and the Lie algebra

Let G be the Lie group of all nonsingular twodimensional affine transformations and

let A be the corresponding Lie algebra. Any member A of A may be represented by a

matrix A of the form

A = 8>>>>: h11 h12 t1

h21 h22 t2

0 0 0

9>>>>;These matrices are called affine generator matrices. Let M33 be the vector space of all

3�3 matrices, and let A0 be the subspace consisting of all affine generator matrices. A0is closed under the commutator operation [M, N] = MN �NM and so can be considered

as a Lie algebra; we can use this to define the Lie bracket [A, B] on A,

[A, B] = [A , B] = A B � B A .

A norm is defined on M33 by8M2M33 jMj =vuut 3Xa,b=1

(Mab)2,

and this induces a norm on A0 and a norm on A:8A2A jAj = jA j =vuut 2Xa=1

3Xb=1

(Aab)2.

Thus A and A0 are isomorphic as Lie algebras and normed vector spaces.16

xThere is an exponential function exp:M33 !M33, defined by8M2M33 exp M = 1X

n=0

Mn

n! .

This series converges absolutely for all M. Note that, for any affine generator matrix

M, exp M is an affine matrix. The inverse function log is defined on a subset of M33

including all affine matrices such that

h11h22 � h12h21 > 0 and h11 + h22 > �2p

h11h22 � h12h21.

log maps all such affine matrices to affine generator matrices. For all M 2M33 withjMj < 1, log(I + M) exists and is given by the absolutely convergent power series

log(I + M) = 1Xn=1

(�1)n+1Mn

n,

where I is the identity matrix.

There is also an exponential function exp:A ! G, with an inverse function log

defined on a subset of G, related to the exp and log functions on M33 by

exp A = exp A , log A = log A .

To calculate exp and log efficiently without using power series, we first consider the

problem of exponentiating 2� 2 matrices.

3.3 Exponential of a 2� 2 matrix

Let X be any 2� 2 matrix. Let X0 = X � 12

tr(X)I, the traceless part of X.

Case 1: det(X0) > 0. Then exp X = etr(X)=2( sin θθ � X0 + cos θ � I), where θ =pdet(X0).

Case 2: det(X0) = 0. Then exp X = etr(X)=2(X0 + I).

Case 3: det(X0) < 0. Then exp X = etr(X)=2( sinh θθ � X0 + cosh θ � I), where θ =p�det(X0).

Note that exp is injective on the set of matrices X with det(X � 12

tr(X)I) < π2.

3.4 Logarithm of a 2� 2 matrix

Let Y be any 2 � 2 matrix satisfying det(Y) > 0 and tr(Y) > �2p

det(Y). Let Y1 =Y=pdet(Y).

Case 1: �2 < tr(Y1) < 2. Then log Y = θsin θ (Y1 � 1

2tr(Y1)I) + 1

2log(det(Y))I, where θ =

cos�1( 12

tr(Y1)) 2 (0, π).

Case 2: tr(Y1) = 2. Then log Y = Y1 � 12

tr(Y1)I + 12

ln(det(Y))I.

Case 3: 2 < tr(Y1). Then log Y = θsinh θ (Y1 � 1

2tr(Y1)I) + 1

2ln(det(Y))I, where θ =

cosh�1

( 12

tr(Y1)). 17

x3.5 Exponential of an affine generator matrix

Let M be any affine generator matrix. Thus M is of the form

M = 8>: X x

0 0 0

9>;where X is 2� 2 and x is 2� 1. Then

exp M = 8>: exp X y

0 0 1

9>;where exp X is as given above and y is given as follows.

Case 1: det(X) 6= 0. Then y = (exp X � I)X�1x.

Case 2: det(X) = 0 and tr(X) 6= 0. Then y = (I + etr(X)�1�tr(X)tr(X)2 X)x.

Case 3: det(X) = 0 and tr(X) = 0. Then y = (I + 12X)x.

3.6 Logarithm of an affine matrix

Let M be any affine matrix. Thus M is of the form

M = 8>: Y y

0 0 1

9>;where Y is 2� 2 and y is 2� 1. Then log M exists iff log Y exists, in which case

log M = 8>: X x

0 0 0

9>;where X = log Y and x is given as follows.

Case 1: det(X) 6= 0. Then x = X(Y � I)�1y.

Case 2: det(X) = 0 and tr(X) 6= 0. Then x = (I + ( 1etr(X)�1

� 1tr(X)

)X)y.

Case 3: det(X) = 0 and tr(X) = 0. Then x = (I � 12X)y.

LEMMA 1. log(exp A) = A for all A2A with jAj < p2 π.

Proof. For any A2A with jAj < p2 π, we have

A = 8>: X x

0 0 0

9>; , with X = 8>: h11 h12

h21 h22

9>; .

and hence

det(X � 12

tr(X)I) = � 14(h11 � h22)2 � h12h21 � �h12h21 � 1

2(h2

12 + h221) � 1

2jAj2 < π2.

Hence log(exp X) = X, by the log construction for 2� 2 matrices, and log(exp A) = A by

the log construction for affine matrices. 18

x3.7 The dual vector space, FLet F be the dual vector space to A: that is, F consists of all linear transformations

from A to R.

Given any basis fa1, : : : a6g for A, we can define a dual basis ff 1, : : : f 6g for F by

f i(aj) = � 1 if i = j,

0 if i 6= j.

Any vectors A2A and F2F can be expressed in terms of the bases as A = P6i=1 Aiai

and F =P6i=1 Fif

i, for unique sequences of real numbers A1, : : :A6 and F1, : : :F6. Then

we have Fi = F(ai), Ai = f i(A), and F(A) =P6i=1 FiA

i.

For example, one possible basis fa1, : : : a6g for A is defined by

a1 = 8>>>>: 1 0 0

0 0 0

0 0 0

9>>>>; a2 = 8>>>>: 0 1 0

0 0 0

0 0 0

9>>>>; a3 = 8>>>>: 0 0 1

0 0 0

0 0 0

9>>>>;a4 = 8>>>>: 0 0 0

1 0 0

0 0 0

9>>>>; a5 = 8>>>>: 0 0 0

0 1 0

0 0 0

9>>>>; a6 = 8>>>>: 0 0 0

0 0 1

0 0 0

9>>>>; .

Then any element A2A, where

A = 8>>>>: h11 h12 t1

h21 h22 t2

0 0 0

9>>>>; ,

may be expressed uniquely in terms of this basis as A =P6i=1 Aiai, where

A1 = h11, A2 = h12, A3 = t1, A4 = h21, A5 = h22, A6 = t2.

The corresponding basis for F is ff 1, : : : f 6g, where

f 1(A) = h11, f 2(A) = h12, f 3(A) = t1,

f 4(A) = h21, f 5(A) = h22, f 6(A) = t2,

and any F2F operates on any A2A by the rule

F(A) = 6Xi=1

FiAi = F1h11 + F2h12 + F3t1 + F4h21 + F5h22 + F6t2.

Using this basis, F can be represented by a matrix

F = 8>>>>: F1 F4 a

F2 F5 b

F3 F6 c

9>>>>;where a, b, c are arbitrary; when I say that F represents F, what I mean is that F and

F are related by the formula 8A2A F(A) = tr(F A ).19

xThe norm induced on A by fa1, : : : a6g (see x2.1) is the one we have already defined:jAj2 = �� 6X

i=1

Aiai

��2 = 6Xi=1

(Ai)2 = 6Xi=1

f i(A)2.

Similarly, a norm is induced on F by ff 1, : : : f 6g:jFj2 = �� 6Xi=1

Fifi��2 = 6X

i=1

(Fi)2 = 6X

i=1

F(ai)2.

Given any linear transformation T:A ! A the adjoint linear transformation Ty:F !F is defined by 8F2F 8A2A Ty(F)(A) = F(T(A)).

The basis fa1, : : : a6g induces a basis on L(A,A), the vector space of linear transfor

mations from A to A, as described in x2.4. This basis induces a norm on L(A,A):jTj2 = 6Xi,j=1

(f j(T(ai)))2 = 6X

i=1

jT(ai)j2.

Likewise the basis ff 1, : : : f 6g induces a basis on L(F ,F), which induces a norm onL(F ,F): jSj2 = 6Xi,j=1

(S(f i)(aj))2 = 6X

i=1

jS(f i)j2.

LEMMA 2. For any T2L(A,A), jTyj = jTj.Proof. jTyj2 =P6

i,j=1(Ty(f i)(aj))2 =P6

i,j=1(f i(T(aj)))2 = jTj2.

3.8 Inner automorphisms of the Lie algebra

Define Ad:G ! (A ! A), in terms of matrix representations, by8G2G 8A2A Ad(G)(A) = G A G�1.

Note that, for any G 2 G, Ad(G) is a linear transformation on A, and that Ad(G1) ÆAd(G2) = Ad(G1G2).

LEMMA 3. For any G2G and A2A, exp(Ad(G)(A)) = G � exp(A) �G�1.

Proof. In terms of the matrix representations,

exp(Ad(G)(A)) = exp Ad(G)(A) = exp(G A G�1)= 1Xn=0

1

n! (G A G�1)n = G� 1X

n=0

1

n! An�

G�1= G exp(A )G�1 = G exp(A) G�1 = G � exp(A) �G�1 .20

xLEMMA 4. For any G2 G and F 2 F , if F is a matrix representing F then the matrix

G�1 F G represents Ad(G)y(F).

Proof. For any A2A,

Ad(G)y(F)(A) = F(Ad(G)(A)) = tr(F Ad(G)(A)) = tr(F G A G�1) = tr(G�1 F G A)

which shows that G�1 F G represents Ad(G)y(F).

Define a linear function ad:A ! (A ! A) by8A, V2A ad(A)(V) = [A, V].

Note that, for any A, ad(A):A ! A is a linear transformation.

LEMMA 5. For any A2A, jad(A)j � p5jAj.

Proof. Let fa1, : : : a6g be the basis that induces the norm on A. The representing matrix

A of any A2A can be written in the form

A = 8>>>>: h11 h12 t1

h21 h22 t2

0 0 0

9>>>>; .

Then we havejad(A)j2 = 6Xi=1

jad(A)(ai)j2 = 6Xi=1

j[A, ai]j2 = 6Xi=1

��[A, ai]��2 = 6X

i=1

��[A , ai ]��2= 2(h11 � h22)2 + h2

11 + h222 + 5h2

12 + 5h221 + 2t2

1 + 2t22� 5h2

11 + 5h222 + 5h2

12 + 5h221 + 2t2

1 + 2t22 � 5jA j2 = 5jAj2

so jad(A)j � p5jAj.

LEMMA 6. For any A2A and F 2 F , if F is a matrix representing F then the matrix

[F , A] represents ad(A)y(F).

Proof. For any V2A,

ad(A)y(F)(V) = F(ad(A)(V)) = tr(F ad(A)(V) ) = tr(F [A, F]) = tr(F [A , V ]) = tr([F , A]V )

which shows that [F , A] represents ad(A)y(F).

3.9 Approximation lemmas for exp and log

LEMMA 7. For all A2A0, exp A = I + A + RE(A), where RE:A0 ! A0 is vanishing.

Proof. Define R:M33 ! M33 by 8A2M33 R(A) = exp A � I � A. By the power series

expansion for exp, we have 8A2M33 R(A) = 1Xn=2

An

n! .

The corresponding real power seriesP1

n=2tn

n! is absolutely convergent for all t2R, so,

by lemma 2.14 (transferred to M33, as explained in x2.5), R is vanishing. Note that

R(A0) � A0, so we can define RE as the restriction of R to A0.21

xLEMMA 8. For all A 2 A0 with jAj < 1, log(I + A) = A + RL(A), where RL:A0 ! A0 is

vanishing.

Proof. Let N be the set of all A2M33 such that jAj < 1. Define R:M33 !M33 such that8A2N R(A) = log(I + A) � A (the values of R outside N are arbitrary). By the power

series expansion for log, we have8A2N R(A) = 1Xn=2

(�1)n+1An

n.


n=2(�1)n+1tn

nis absolutely convergent for all t 2

(�1, 1), so, by lemma 2.14 (transferred to M33), R is vanishing. Note that R(A0) � A0,so we can define RL as the restriction of R to A0.LEMMA 9. If S is any finitedimensional vector space, F: S ! A0 is linear, and R1: S ! A0is vanishing, then there is a vanishing function R2: S ! A0 such that, for all V2S in a

neighbourhood of 0,

I + F(V) + R1(V) = exp(F(V) + R2(V)).

Proof. F is linear and hence limited, R1 is vanishing and hence limited, so F + R1 is

limited and hence continuous at 0, with (F + R1)(0) = 0 (lemmas 2.3, 2.4, 2.5, 2.7).

Therefore there is a neighbourhood N of 0 for which8V2N jF(V) + R1(V)j < 1.

For all V2N, we have

log(I + F(V) + R1(V)) = F(V) + R1(V) + RL(F(V) + R1(V))

by lemma 8. Define R2 = R1 + RL Æ (F + R1), giving8V2N I + F(V) + R1(V) = exp(F(V) + R2(V)).

Now, RL is vanishing and F + R1 is limited, so RL Æ (F + R1) is vanishing, so R2 is

vanishing (lemmas 2.6, 2.8), as required.

LEMMA 10. For any A, B2A, there is a vanishing function R: R ! A, such that, for all

real ε in a neighbourhood of 0,

exp(εA) � exp(εB) = exp(ε(A + B) + R(ε)).

Proof. By lemma 7,8ε exp(ε A) exp(ε B) = (I + ε A + RE(ε A))(I + ε B + RE(ε B)) = I + ε(A + B ) + R1(ε)

where R1: R ! A0 is defined by8ε R1(ε) = RE(ε A) + RE(ε B ) + ε2 A B + ε A RE(ε B ) + RE(ε A) ε B + RE(ε A) RE(ε B).22

xIf we define linear functions L1, L2: R ! A0 by8ε L1(ε) = ε A , L2(ε) = ε B ,

then L1, L2 are limited, so RE ÆL1, RE ÆL2 are vanishing, so R1 is vanishing (lemmas 2.3,

2.4, 2.6, 2.8, 2.15).

Then, by lemma 9, there is a vanishing function R2: R ! A0 such that, for all ε in

a neighbourhood of 0,

I + ε(A + B ) + R1(ε) = exp(ε(A + B ) + R2(ε))

giving

exp(ε A) exp(ε B ) = exp(ε(A + B) + R2(ε)).

Using the isomorphism between A0 and A we can obtain from R2 a vanishing function

R: R ! A such that, for all ε in a neighbourhood of 0,

exp(εA) � exp(εB) = exp(ε(A + B) + R(ε))

as required.

3.10 Derivatives of functions to and from GLet S be any finitedimensional vector space.

The derivative of a partial function f :G ! S is the unique maximal partial function

f�:G ! (A ! S) such that, for every G2 G for which f�(G) is defined, f�(G) is a linear

transformation and there is a vanishing function R:A ! S such that, for all V2A in a

neighbourhood of 0,

f (G � exp V) = f (G) + f�(G)(V) + R(V).

f is differentiable at G iff f�(G) is defined.

The derivative of a partial function f : S ! G is the unique maximal partial function

f�: S ! (S ! A) such that, for every x 2 S for which f�(x) is defined, f�(x) is a linear

transformation and there is a vanishing function R: S ! A such that, for all v2S in a

neighbourhood of 0,

f (x + v) = f (x) � exp(f�(x)(v) + R(v)).

f is differentiable at x iff f�(x) exists.

LEMMA 11. If G2G, f :G ! S is differentiable at G, and R1: R ! A is vanishing, then

there is a vanishing function R2: R ! S such that, for all V2A and for all real ε in a

neighbourhood of 0,

f (G � exp(εV + R1(ε))) = f (G) + εf�(G)(V) + R2(ε).

Proof. Since f is differentiable at G, there is a vanishing function R:A ! S and a

neighbourhood N � A of 0 such that8A2N f (G � exp A) = f (G) + f�(G)(A) + R(A).23

xGiven any V2A, define the linear function L: R ! A by 8ε2R L(ε) = εV. Now, L and

R1 are limited, so L + R1 is limited and hence continuous at 0, with (L + R1)(0) = 0

(lemmas 2.3, 2.4, 2.5, 2.7). Hence there is a neighbourhood N0 � R of 0 such that, for

all ε2N0, we have εV + R1(ε) = (L + R1)(ε) 2 N and hence

f (G � exp(εV +R1(ε))) = f (G) + f�(G)(εV +R1(ε))+R(εV +R1(ε)) = f (G)+ εf�(G)(V) +R2(ε)

where R2 = f�(G)ÆR1 +RÆ(L+R1). Now, f�(G) is linear and hence limited, so f�(G)ÆR1 is

vanishing. Also, RÆ(L+R1) is vanishing. Thus R2 is vanishing, as required (lemmas 2.4,

2.6, 2.8, 2.9).

3.11 Calculation of the derivative of exp in terms of matrices

For this section only, we shall need a function ad:A0 ! (A0 ! A0) defined by 8A, V 2A0 ad(A)(V) = [A, V], analogous to the function ad:A ! (A ! A) already defined.

LEMMA 12. For any natural number n and A, V2A0,VAn = nX

i=0

(�1)i

�n

i

�An�i ad(A)i(V).

Proof. By induction on n.

THEOREM 13. For any A2A,

exp�(A) = 1Xn=0

(�1)nad(A)n

(n + 1)!and thus, for any A, V2A,

exp�(A)(V) = 1Xn=0

(�1)nad(A)n(V)

(n + 1)! = V � 12[A, V] + 1

6[A, [A, V]] � 1

24[A, [A, [A, V]]] + � � � .

Both series converge absolutely for all A and V.

Proof. I shall prove the theorem for A, V 2A0 and then transfer the result to A, V 2A,

using the isomorphism between A0 and A.

For any A, V 2 A0,exp(A + V) = 1X

n=0

1

n! (A + V)n = 1Xn=0

(Xn(A) + Yn(A, V) + Zn(A, V))

where Xn(A) = 1n!An, Yn(A, V) = 1

n!Pnr=1 An�rVAr�1, and Zn(A, V) is the sum of the

remaining terms in the expansion of 1n! (A +V)n. Note that Zn(A, V) consists of a sum of

products of A and V, with positive coefficients. For every n we have a similar expansion

of 1n! (jAj+ jVj)n:

1

n! (jAj+ jVj)n = Xn(jAj) + Yn(jAj, jVj) + Zn(jAj, jVj)24

xwith jXn(A)j � Xn(jAj) = 1

n! jAjn,jYn(A, V)j � Yn(jAj, jVj) � 1

n! (jAj+ jVj)n,jZn(A, V)j � Zn(jAj, jVj) � 1

n! (jAj+ jVj)n.

HenceP1

n=0 Xn(A),P1

n=0 Yn(A, V), andP1

n=0 Zn(A, V) all converge absolutely, by com

parison withP1

n=01n! (jAj+ jVj)n = ejAj+jVj. Hence

exp(A + V) = 1Xn=0

Xn(A) + 1Xn=0

Yn(A, V) + 1Xn=0

Zn(A, V).

We already know the sum of the first series:P1

n=0 Xn(A) = exp A. As for the third series,

for fixed A, define RA:A0 ! A0 by8V2A0 RA(V) = 1Xn=0

Zn(A, V).

I shall show that RA is vanishing. As pointed out above, we havejZn(A, V)j � Zn(jAj, jVj) = 1

n! (jAj+ jVj)n � Xn(jAj)� Yn(jAj, jVj)= 1

n! ((jAj+ jVj)n � jAjn � njAjn�1jVj)so jRA(V)j � 1X

n=0

jZn(A, V)j � 1Xn=0

1

n! ((jAj+ jVj)n � jAjn � njAjn�1jVj)= ejAj+jVj � ejAj � ejAjjVj = ejAjf (jVj)where f : R ! R is defined by 8t2R f (t) =P1

n=2tn

n! . This real power series is absolutely

convergent for all t, so, by lemma 2.14 (transferred to M11, i.e., R), f is vanishing.

Hence RA is also vanishing.

Returning to the second series,P1

n=0 Yn(A, V), we have Y0(A, V) = 0 and, for any

n > 0, by lemma 12,

Yn(A, V) = 1

n! nXr=1

An�rVAr�1= 1

n! nXr=1

r�1Xi=0

(�1)i

�r� 1

i

�An�i�1 ad(A)i(V)= 1

n! n�1Xi=0

nXr=i+1

(�1)i

�r� 1

i

�An�i�1 ad(A)i(V)= 1

n! n�1Xi=0

(�1)i

�n

i + 1

�An�i�1 ad(A)i(V)= n�1X

i=0

An�i�1

(n� i� 1)! (�1)iad(A)i(V)

(i + 1)! .25

xI wish to apply the Cauchy product formula (lemma 2.17, transferred to M33) to this

expression to evaluateP1

n=1 Yn(A, V). This is justified since the seriesP1

j=0 Aj=j! andP1i=0(�1)iad(A)i(V)=(i + 1)! are absolutely convergent: in the case of the latter series we

have �� (�1)iad(A)i(V)

(i + 1)! �� jp5Aji(i + 1)! jVj

by lemma 5. The Cauchy product formula gives1Xn=1

Yn(A, V) = 1Xn=1

n�1Xi=0

An�i�1

(n� i� 1)! (�1)iad(A)i(V)

(i + 1)!= 1Xm=0

mXi=0

Am�i

(m� i)! (�1)iad(A)i(V)

(i + 1)!= 1Xj=0

Aj

j! 1Xi=0

(�1)iad(A)i(V)

(i + 1)!= exp(A)

1Xi=0

(�1)iad(A)i(V)

(i + 1)! .

Putting all three pieces together gives

exp(A + V) = exp(A) + exp(A)

1Xi=0

(�1)iad(A)i(V)

(i + 1)! + RA(V)= exp(A)

�I + 1X

i=0

(�1)iad(A)i(V)

(i + 1)! + EA(RA(V))

�where EA:A0 ! A0 is the linear transformation defined by 8X2A0 EA(X) = exp(�A) X.

By lemma 2.4, EA is limited, so, by lemma 2.9, EA ÆRA is vanishing. Hence, by lemma 9,

there is a vanishing function R:A0 ! A0 such that, for all V in a neighbourhood of 0,

exp(A + V) = exp(A) exp

� 1Xi=0

(�1)iad(A)i(V)

(i + 1)! + R(V)

�.

This equation has been proved for A, V 2 A0. Since A0 is isomorphic to A the equation

also holds for A, V 2 A. This gives exp�(A)(V) = P1i=0

(�1)iad(A)i(V)(i+1)! and hence exp�(A) =P1

i=0(�1)iad(A)i

(i+1)! by lemma 2.12, as required. As indicated above, these series are absolutely

convergent by comparison withP1

i=0jp5Aji(i+1)! jVj and

P1i=0

jp5Aji(i+1)! .

3.12 Calculation of log�Define a real sequence (an) by

an = nXm=0

Xr1,:::rm�1

r1+��+rm=n

(�1)n�m

(r1 + 1)! � � � (rm + 1)! .LEMMA 14. The real power series

P1n=0 anxn converges absolutely if jxj < 1.26

xProof. We havejanj � nX

m=0

Xr1,:::rm�1

r1+��+rm=n

1

(r1 + 1)! � � � (rm + 1)! � nXm=0

� nXr=1

1

(r + 1)!�m< nXm=0

(e� 2)m < 1

1� (e� 2)= 1

3� e

so the seriesP1

n=0 janxnj converges by comparison with 13�e

P1n=0 jxjn, if jxj < 1.

In fact, the convergence of the power series is much better than this: the first few

terms of (an) are (1, 12, 1

12, 0,� 1

720, 0, 1

30240, 0,� 1

1209600, : : :). The next lemma shows thatP1

n=0 anxn is the inverse ofP1

n=0(�1)n

(n+1)!xn, in the sense of the Cauchy product of series.

LEMMA 15. For any p � 0,

pXn=0

(�1)n

(n + 1)!ap�n = � 1 if p = 0

0 if p > 0

Proof. Define Sn = (�1)n+1

(n+1)! . Then the definition of an may be rewritten as

an = nXm=0

bnm, where bn

m = Xr1,:::rm�1

r1+��+rm=n

Sr1Sr2

� � �Srm.

Note that S0 = �1, b00 = 1, b

p0 = 0 if p > 0, and bn

m = 0 if m > n. Then

pXn=0

(�1)n

(n + 1)!ap�n = pXn=0

(�Sn)

p�nXm=0

bp�nm = � pX

n=0

pXm=0

Snbp�nm= � pX

m=0

S0bpm � pX

n=1

pXm=0

Snbp�nm = pX

m=0

bpm � pX

m=0

pXn=1

Snbp�nm= pX

m=0

bpm � pX

m=0

bpm+1 = b

p0 � b

pp+1 = � 1 if p = 0

0 if p > 0

THEOREM 16. For any A2A with jAj < 1p5,

log�(exp A) = 1Xn=0

an ad(A)n;

thus, for any V2A,

log�(exp A)(V) = 1Xn=0

an ad(A)n(V)= V + 12[A, V] + 1

12[A, [A, V]] � 1

720[A, [A, [A, [A, V]]]]+ 1

30240[A, [A, [A, [A, [A, [A, V]]]]]]� 1

1209600[A, [A, [A, [A, [A, [A, [A, [A, V]]]]]]]] + � � � .27

xBoth series converge absolutely for jAj < 1p

5.

Proof. For jAj < 1p5

we have jad(A)j < 1, by lemma 5, soP1

n=0 an ad(A)n andP1n=0 an ad(A)n(V) converge absolutely by lemma 14 and lemma 2.13. I intend to apply

the inverse function theorem, so I must show that exp is continuously differentiable and

exp�(A) is invertible.

Theorem 13 shows that exp is differentiable, with exp� = F Æ ad, where F:L(A,A)! L(A,A) is defined by 8T2L(A,A) F(T) = 1Xn=0

(�1)n

(n + 1)!Tn.


n=0(�1)n

(n+1)!xn is absolutely convergent for all x2R,

so by lemma 2.13 the power series for F is absolutely convergent for all T and F is

continuous. The linear function ad is also continuous, by lemma 2.4, so exp� = F Æ ad

is continuous.

Since the seriesP1

n=0(�1)n

(n+1)!ad(A)n andP1

n=0 an ad(A)n are absolutely convergent (forjAj < 1p5), we can compose them, using the Cauchy product formula (lemma 2.17) and

lemma 15, to give1Xn=0

(�1)n

(n + 1)!ad(A)n Æ 1Xn=0

an ad(A)n = 1Xp=0

� pXn=0

(�1)n

(n + 1)!ap�n

�ad(A)p= 1 � ad(A)0 + 0 � ad(A)1 + 0 � ad(A)2 + � � � = I

where I is the identity function. This shows that exp�(A) is invertible and thatP1n=0 an ad(A)n is its inverse.

Hence, by the inverse function theorem, exp maps an open neighbourhood of A

bijectively onto an open neighbourhood of exp A, and its inverse log is continuously dif

ferentiable. It follows that log�(exp A) is the inverse of exp�(A), which isP1

n=0 an ad(A)n.

Applying lemma 2.12 gives log�(exp A)(V) =P1n=0 an ad(A)n(V).

3.13 The dual mapping, log�

We can define a function log�:G ! (F ! F) dual to log�:G ! (A ! A) by8G2G log

�(G) = (log�(G))y

or, more explicitly,8G2G 8F2F 8V2A log�(G)(F)(V) = F(log�(G)(V)).

THEOREM 17. For any A2A with jAj < 1p5,

log�(exp A) = 1Xn=0

an(ad(A)y)n28

xwhere the real sequence (an) is as above; thus, for any F2F ,

log�(exp A)(F) = 1Xn=0

an(ad(A)y)n(F).

Both series converge absolutely for jAj < 1p5.

Proof. For any V2A we have

log�(exp A)(V) = 1Xn=0

an ad(A)n(V)

by theorem 16. So, for any F2F ,

F(log�(exp A)(V)) = 1Xn=0

anF(ad(A)n(V))

since F is continuous by lemma 2.4. Hence

log�(exp A)(F)(V) = F(log�(exp A)(V)) = 1Xn=0

anF(ad(A)n(V)) = 1Xn=0

an(ad(A)y)n(F)(V).

Applying lemma 2.12 gives log�(exp A)(F) =P1

n=0 an(ad(A)y)n(F) and log�(exp A) =P1

n=0 an(ad(A)y)n, as required. These series converge absolutely for jAj < 1p5

sincejad(A)yj = jad(A)j < 1, by lemma 2 and lemma 5.

29

4. Matching a Template to an Image

4.1 Images and templates

An image is a function I: R2 ! [0,1). For any point p 2 R2, I(p) is the image intensity

at the pixel p. The domain of I is called the image plane.

As mentioned in x1, each symbol type has a template, depicting the appearance of

any token of that symbol type at a standard position, size and orientation, in the absence

of noise. Formally, a template is a differentiable function T: R2 ! [0,1) such that the

set f x2R2 j T(x) > 0 g is bounded. The domain of T is called the template plane.

Suppose that we believe that a token of this symbol type occurs in the image at a

certain position, orientation and size (with a certain degree of stretching and shearing).

We can describe this by an affine transformation G from the template plane into the

image plane, which is called the embedding of the symbol token in the image. If the

symbol token really is present where we think it is, there will be a good match between

the functions T and I ÆG, at least in the region where T is nonzero.

The aim of this section is to measure the goodness of match by a correlation function

between T and IÆG, and to calculate its derivative so that we can adjust G incrementally

to maximise the correlation.

Recall from x3.1 that an affine transformation G can be represented by a 3 � 3

matrix G and a point u2R2 can be represented by a 3� 1 column vector u , such that

G(u) = G u . For any template T we can define a modified function T on 3 � 1 column

vectors, such that 8u2R2 T (u ) = T(u).

4.2 The correlation function

Given T, I and G, define the correlation ρI,T(G) between T and I ÆG by

ρI,T(G) = jdet(G )jZ T(u) (I(G(u))� I0) d2u = Z T(G�1(x)) (I(x) � I0) d

2x

where I0 is a positive real constant associated with T. The integrals are over the whole

of R2, or equivalently over a large enough region to include in its interior all the points

where T is nonzero.

LEMMA 1. For any template T, any image I, and any affine transformations G, G0,ρI,TÆG0(G �G0) = ρI,T(G).

Proof.

ρI,TÆG0(G �G0) = Z (T ÆG0)((G �G0)�1(x)) (I(x) � I0) d2x= Z T(G0(G0�1(G�1(x)))) (I(x) � I0) d2x= Z T(G�1(x)) (I(x) � I0) d

2x = ρI,T(G).

30

x4.3 Derivative of the correlation function

Let G be an affine transformation, and consider the effect on ρI,T(G) of making a small

change in G, i.e., replacing G by G � exp A, for some A2A. We have

ρI,T(G � exp A) = Z T((G � exp A)�1(x)) (I(x) � I0) d2x= Z T ((G exp A)�1 x ) (I(x)� I0) d

2x= Z T (exp(�A )G�1 x) (I(x) � I0) d2x= Z T ((I � A + o(A))G�1 x ) (I(x)� I0) d

2x= ρI,T(G) � Z

(rT(G�1(x)))A G�1 x (I(x)� I0) d2x + o(A)

where rT is defined by rT(u) = ( �T(u)�u1

�T(u)�u20 ) .

Hence the derivative ρI,T �:G ! (A ! R) is given by8G2G 8A2A ρI,T �(G)(A) = �Z (rT(G�1(x)))A G�1 x (I(x)� I0) d2x= �jdet(G )jZ (rT(u))A u (I(G(u))� I0) d

2u.

ρI,T �(G) is a linear transformation from A to R, and hence is an element of F ; it is

called the force exerted by I and T on G.

4.4 Matrix representation of the force

LEMMA 2. Using the matrix representation of F (see x3.7), the force ρI,T �(G) is repre

sented by the matrix

ρI,T �(G) = �jdet(G )jZ 8>>>>: u1

u2

1

9>>>>; ( �T(u)�u1

�T(u)�u20 ) (I(G(u))� I0) d

2u= �jdet(G )jZ u (rT(u)) (I(G(u))� I0) d

2u.

Proof. Let M be the matrix on the righthand side of the equation. For any A2A, we

have

tr(MA ) = �jdet(G )jZ tr(u (rT(u))A) (I(G(u))� I0) d2u= �jdet(G )jZ tr((rT(u))A u ) (I(G(u))� I0) d

2u= �jdet(G )jZ (rT(u))A u (I(G(u))� I0) d

2u= ρI,T �(G)(A)

showing that M does indeed represent ρI,T �(G).31

x4.5 The mass of a symbol token

The mass, m, of a symbol token, whose type has template T, and which is embedded in

the image I by G, is defined by

m = jdet(G )jZ T(u) d2u = Z T(G�1(x)) d

2x.

32

5. Fleximaps

5.1 Introduction

A symbol contains other symbols as parts. The geometric relation between a part and

the whole, or between two neighbouring parts, is not absolutely rigid but is allowed to

vary along certain degrees of freedom, by certain amounts. For example, figure 1 shows

a symbol token of type `A', with three line tokens as parts. The figure shows the nominal

relationships between the parts, i.e., the relationships in the absence of noise. But the

relationship between two parts σ1 and σ2 may vary in several ways: σ1 may rotate (by

a limited angle) around a certain point on σ2; σ1 may slide up or down σ2 (by a limited

distance); and σ1 may be shifted across σ2 (but only by a very small amount).

σ1

2σ

Figure 1. The geometric relation between symbol tokens σ1 and σ2 may vary

along several degrees of freedom.

The purpose of a fleximap is to represent both the nominal geometric relationship

and the permitted degrees of variation. The actual geometric relationship G between

the symbol tokens σ1 and σ2 may be expressed in the form G = F � exp A, where F is

the nominal relationship and A is an element of the Lie algebra A, representing the

deviation of the actual relationship from the nominal one.

We wish to impose `soft' constraints on the deviation A. For example, we would like

to say, not `you may rotate up to �θ0 radians and no further', but `there is a penalty

of kθ2 for a rotation by θ radians'; the smaller the value of the coefficient k, the more

rotation is tolerated. In general, the penalty function is a quadratic function depending

on six independent variables (e.g., angle of rotation around a certain point, distance of

translation in a certain direction, amount of dilation about a certain centre, etc.); there

must always be six degrees of freedom because A is a sixdimensional vector space. This

quadratic penalty function is technically known as a metric tensor.

The fleximap, then, is a pair (F, g), where F is the nominal relationship and g is

the metric tensor specifying the penalty for deviations from the nominal.

5.2 Metric tensors

A metric tensor is a function g:A ! (A ! R) such that� 8c1, c22R 8A, B1, B22A g(A)(c1B1 + c2B2) = c1g(A)(B1) + c2g(A)(B2),33

x� 8A, B2A g(A)(B) = g(B)(A),� 8A2A A 6= 0 ) g(A)(A) > 0.

Note that, as a consequence of the first condition, 8A 2 A g(A) 2 F ; in fact, g is an

isomorphism from A to F .

5.3 Fleximaps

A fleximap is a pair (F, g), where F 2 G and g is a metric tensor. Let Flex be the set of

all fleximaps. Associated with any fleximap τ = (F, g) is an energy function Eτ:G ! R

defined by 8G2G Eτ(G) = g(log(F�1 �G))(log(F�1 �G)).

The informal interpretation of a fleximap τ = (F, g) is as indicated above. Any affine

transformation G not too far from F can be represented in the form G = F � exp A, for

some A2A; then we have Eτ(G) = g(A)(A). F is called the nominal transformation, A

measures how far G deviates from F, and Eτ(G) is the penalty incurred by the deviation.

The way in which this works can be seen most clearly if we adopt a basis for A under

which g is diagonal: then, in terms of components, we have g(A)(A) =P6i=1 gii(A

i)2. For

example, the six basis elements might represent a rotation around a certain point, a

dilation about a certain centre, a onedimensional stretch in a certain direction, a shear

in a certain direction, a translation in a certain direction, and a translation in another

direction; these types of deviation would be penalised at the rates g11, g22, g33, g44, g55, g66

respectively. A metric tensor is a convenient way of summing up the six independent

degrees of freedom and the six penalty coefficients g11, g22, g33, g44, g55, g66 associated

with them.

The derivative of Eτ may be calculated as follows. For any G2G and any V2A, we

have

log(F�1 �G � exp V) = log(F�1 �G) + log�(F�1 �G)(V) + o(V)

(see x3.10), so

Eτ(G � exp V) = g(log(F�1 �G � exp V))(log(F�1 �G � exp V))= g(log(F�1 �G))(log(F�1 �G)) + 2g(log(F�1 �G))(log�(F�1 �G)(V)) + o(V)= Eτ(G) + 2 log�(F�1 �G)(g(log(F�1 �G)))(V) + o(V)

so Eτ�(G)(V) = 2 log�(F�1 �G)(g(log(F�1 �G))).

Given a fleximap (F, g) and a transformation G2G, we can define the composition

of the two byG � (F, g) = (G � F, g)

(F, g) �G = (F �G, g0)where g0 is the metric defined by8A, B2A g0(A)(B) = g(Ad(G)(A))(Ad(G)(B)).

LEMMA 1. For any fleximap τ and any G, G02G,

EG0�τ(G0 �G) = Eτ(G) = Eτ�G0(G �G0).34

xProof. Let τ = (F, g). For the first equation,

EG0�τ(G0 �G) = g(log((G0 � F)�1 �G0 �G))(log((G0 � F)�1 �G0 �G))= g(log(F�1 �G))(log(F�1 �G))= Eτ(G).

For the second equation,

Eτ�G0(G �G0) = g0(log((F �G0)�1 �G �G0))(log((F �G0)�1 �G �G0))= g0(log(G0�1 � F�1 �G �G0))(log(G0�1 � F�1 �G �G0))= g0(Ad(G0�1)(log(F�1 �G)))(Ad(G0�1)(log(F�1 �G)))= g(log(F�1 �G))(log(F�1 �G)) = Eτ(G)

using lemma 3.3.

35

6. Networks and Homomorphisms

6.1 Introduction

The central concept of this paper is that of a network, which represents a symbol system.

Networks are used for two purposes: a grammar is represented as a network of symbol

types, and a particular pattern is represented as a network of symbol tokens. The rela

tionship between a pattern and a grammar is represented by a homomorphism from the

pattern network to the grammar network. The task of constructing the homomorphism

is called parsing. (All this is very different from conventional grammatical formalisms,

in which a grammar is a set of production rules and the relationship between a pattern

and a grammar is represented by a derivation or a parse tree.)

A pattern is the system of all symbol tokens present (or believed to be present)

in a given image. The position of each symbol token in the image is expressed by its

embedding, an affine transformation. The embeddings of all the symbol tokens of the

pattern are combined together into a single mathematical object, called the embedding

token of the pattern.

The grammar includes constraints on the embedding token, expressed in terms of

fleximaps. These constraints are combined together into a single mathematical object,

called the embedding type of the grammar.

6.2 Networks

A network is a 9tuple (Σ, N, H, E, W, P, A, F, S), where Σ, N, H, E are disjoint sets, and

W, P: N ! Σ, A: H ! N and F, S: E ! H. The elements of Σ, N, H, E are called symbols,

nodes, hooks and edges, respectively. The purpose of a node is to represent a partwhole

relation between symbols: if a symbol σ1 is a part of a symbol σ2 then there is a node

n with P(n) = σ1 and W(n) = σ2. Each node has zero or more hooks attached to it; if a

hook h is attached to a node n then A(h) = n. The purpose of an edge is to represent

neighbourhood relations between two parts of a common whole: if symbols σ1 and σ2 are

both parts of a symbol σ3, and n1 and n2 are the nodes representing the two partwhole

relations, then there may be an edge e from one of the hooks of n1 to one of the hooks

of n2; this will be represented by F(e) = h1 and S(e) = h2, where h1 is a hook of n1 and

h2 is a hook of n2.

A network is coherent iff W is the coequaliser of A Æ F and A Æ S in the category of

sets. This is equivalent to the conjunction of the following two conditions:� W Æ A Æ F = W Æ A Æ S,� for every σ2Σ, the graph whose set of nodes is W�1(fσg) and whose set of edges is

F�1(A�1(W�1(fσg))), where each edge e connects A(F(e)) to A(S(e)), is connected.

A network is definite iff the following conditions hold:� 8h2H jF�1(fhg)j + jS�1(fhg)j = 1,� 8σ2Σ jP�1(fσg)j � 1.

A grammar is required to be coherent but not necessarily definite. A pattern is required

to be both coherent and definite at the end of the recognition process; but while it is

being constructed it is not required to be coherent or definite.36

x6.3 Homomorphisms

The parse of a pattern is represented by a homomorphism between the pattern and the

grammar. If N = (Σ, N, H, E, W, P, A, F, S) and N 0 = (Σ0, N0, H0, E0, W0, P0, A0, F0, S0) are

networks, a homomorphism f :N ! N 0 is a function from Σ[N[H[E to Σ0[N0[H0[E0such that

f jΣ: Σ ! Σ0, f jN : N ! N0, f jH : H ! H0, f jE: E ! E0,W0 Æ f = f ÆW, P0 Æ f = f Æ P, F0 Æ f = f Æ F, S0 Æ f = f Æ S, and

Hf jH��! H0?yA

?yA0N

f jN��! N0 is a pullback (in the category of sets).

(Note that the pullback condition means that, for every n2N, f maps the hooks of n

bijectively onto the hooks of f (n).)

An isomorphism is a homomorphism that is a bijection. An automorphism is an isomor

phism from a network N to itself.

6.4 Embeddings

An embedding token for a network (Σ, N, H, E, W, P, A, F, S) is a function u: Σ[N ! G. It

specifies the way a pattern is embedded in the image plane: u(σ) is the embedding of a

symbol σ, and u(n) is the embedding of a node n. If P(n) = σ then u(σ) and u(n) will be

very closely related: they will differ only by one of σ's symmetry transformations (this

constraint is called the symmetry condition; see x7.3 below).

Given two embedding tokens, u1 and u2, for the same network, we can define a new

embedding token u1 � u2 by8x2Σ [N (u1 � u2)(x) = u1(x) � u2(x).

Given a homomorphism f :N ! N 0 and an embedding token u for N 0, clearly u Æ f

is an embedding token for N .

An embedding type for a network (Σ, N, H, E, W, P, A, F, S) is a quintuple (con, rel,

symm, tem, in), in which con: N ! Flex, rel: E ! Flex, symm: Σ ! sub(G), tem: Σ ! Tem,

and in: Σ ! Flex where Flex is the set of all fleximaps, sub(G) is the set of all subgroups

of G, and Tem is the set of all templates; for every σ 2 Σ, the fleximap in(σ) must have

nominal part equal to the identity transformation.

The purpose of an embedding type is to impose constraints on the embedding token.

For any node n, con(n) is a fleximap describing the relationship between the embeddings

of a part and a whole (or, more precisely, of the node n and the symbol W(n)). For any

edge e, rel(e) is a fleximap describing the relationship between the embeddings of two

neighbouring parts (or, more precisely, of their nodes A(F(e)) and A(S(e))). For any symbol

σ, symm(σ) is its symmetry group, tem(σ) is its template, and in(σ) is (I, g), where I is

the identity transformation and g is called the inertial metric of σ and determines how

σ moves in response to forces (see x7.7).

The Match function, defined below in x7.3, defines how well a given embedding

token matches a given embedding type. 37

xGiven an embedding type v = (con1, rel1, symm1, tem1, in1) and an embedding token

u for a network (Σ, N, H, E, W, P, A, F, S), we can define an embedding type v � u for the

same network by v � u = (con2, rel2, symm1, tem2, in2), where8n2N con2(n) = u(W(n))�1 � con1(n) � u(n)8e2E rel2(e) = u(A(S(e)))�1 � rel1(e) � u(A(F(e)))8σ2Σ tem2(σ) = tem1(σ) Æ u(σ)8σ2Σ in2(σ) = u(σ)�1 � in1(σ) � u(σ)

Given an embedding type v = (con, rel, symm, tem, in) for a network N 0 and a homo

morphism f :N ! N 0, we can define an embedding type v Æ f for N by

v Æ f = (con Æ f , rel Æ f , symm Æ f , tem Æ f , in Æ f ).

LEMMA 1. If f :N ! N 0 is a homomorphism, v is an embedding type for N 0, and u is an

embedding token for N 0, then

(v � u) Æ f = (v Æ f ) � (u Æ f ).

Proof. Let N = (Σ, N, H, E, W, P, A, F, S), N 0 = (Σ0, N0, H0, E0, W0, P0, A0, F0, S0) and v =(con1, rel1, symm1, tem1, in1). Then v � u = (con2, rel2, symm1, tem2, in2), where8n2N0 con2(n) = u(W0(n))�1 � con1(n) � u(n)8e2E0 rel2(e) = u(A0(S0(e)))�1 � rel1(e) � u(A0(F0(e)))8σ2Σ0 tem2(σ) = tem1(σ) Æ u(σ)8σ2Σ0 in2(σ) = u(σ)�1 � in1(σ) � u(σ)

and hence

(v � u) Æ f = (con2 Æ f , rel2 Æ f , symm1 Æ f , tem2 Æ f , in2 Æ f ).

We also have v Æ f = (con1 Æ f , rel1 Æ f , symm1 Æ f , tem1 Æ f , in1 Æ f ), and hence (v Æ f ) � (u Æ f ) =(con3, rel3, symm1 Æ f , tem3, in3), where8n2N con3(n) = (u Æ f )(W(n))�1 � (con1 Æ f )(n) � (u Æ f )(n)= u(f (W(n)))�1 � con1(f (n)) � u(f (n))= u(W0(f (n)))�1 � con1(f (n)) � u(f (n))= con2(f (n)) = (con2 Æ f )(n)8e2E rel3(e) = (u Æ f )(A(S(e)))�1 � (rel1 Æ f )(e) � (u Æ f )(A(F(e)))= u(f (A(S(e))))�1 � rel1(f (e)) � u(f (A(F(e))))= u(A0(S0(f (e))))�1 � rel1(f (e)) � u(A0(F0(f (e))))= rel2(f (e)) = (rel2 Æ f )(e)8σ2Σ tem3(σ) = (tem1 Æ f )(σ) Æ (u Æ f )(σ)= tem1(f (σ)) Æ u(f (σ))= tem2(f (σ)) = (tem2 Æ f )(σ)8σ2Σ in3(σ) = (u Æ f )(σ)�1 � (in1 Æ f )(σ) � (u Æ f )(σ)= u(f (σ))�1 � in1(f (σ)) � u(f (σ))= in2(f (σ)) = (in2 Æ f )(σ)

which shows that (v � u) Æ f = (v Æ f ) � (u Æ f ).38

7. The Recognition Problem

7.1 Introduction

Given an image, we wish to detect all the symbol tokens present in it, determine how

they are embedded in it and how they are related to one another, and relate the system

of symbol tokens to the grammar. This is the recognition problem. To restate this using

the terminology of x6: we need to construct the pattern, determine the embedding token

of the pattern, and construct the homomorphism from the pattern to the grammar. The

recognition problem will be stated formally in x7.4.

While the pattern is being constructed it may temporarily be in a inconsistent state:

it may be incoherent or indefinite (these terms were defined in x6.2). Also, the symbols,

nodes and edges of the pattern may be only `tentatively present': this is represented by

inclusion functions (e.g., i(σ) is the degree to which it is believed that a symbol token σshould be present in the pattern). However, by the end of the recognition process the

pattern must become coherent and definite, and every symbol, node and edge must be

unequivocally present.

The recognition process is governed by a Match function, which measures how well

a pattern matches the given image and grammar. The aim of recognition is to maximise

this function. Only one aspect of the recognition problem is solved in this paper: for a

given pattern and homomorphism, the optimal embedding token for the pattern is found

by a process of gradient descent on the Match function (see x7.7).

7.2 Inclusion functions

During the course of recognition, the pattern is accompanied by inclusion functions

i: Σ [N [E ! [0, 1] and j: Σ [N [H ! [0, 1], which are subject to the conditions8σ2Σ i(σ) = j(σ) + Xn2P�1(fσg)

i(n)8n2N i(W(n)) = j(n) + i(n)8h2H i(A(h)) = j(h) + Xe2F�1(fhg)[S�1(fhg)

i(e)

These functions reflect the uncertainty during recognition concerning which parts of the

pattern should really be present: i(σ) represents the degree of belief that σ should be

present, and similarly for i(n) and i(e) (there is no need for an inclusion value for hooks

because a hook is present iff its node is). The j function is concerned with missing

parts: j(h) represents the degree of belief that the hook h should be present but that

the correct edge to be connected to it has not yet been found; j(n) represents the degree

of belief that the symbol W(n) should be present but the node n should not be; j(σ)

represents the degree of belief that σ should be present but as a toplevel symbol, i.e.,

with P�1(fσg) = ;.At the end of recognition we must have 8x2Σ[N[E i(x) = 1, i.e., complete certainty.

In addition there is a function B: H ! R that imposes a penalty on bare hooks: if

a hook h has no edges attached to it then a penalty of B(h) is incurred. When a node39

xis first formed its hooks have low values of B(h), since they cannot be expected to have

acquired edges yet; but if they remain bare the value of B(h) is increased to force them

to acquire edges. At the end of the recognition process every hook must have an edge,

to satisfy the definiteness condition.

7.3 The match function between an embedding type and an embedding token

Given a network N = (Σ, N, H, E, W, P, A, F, S), an image I, an embedding token u forN , and an embedding type v = (con, rel, symm, tem) for N , the match function, which

measures how well u matches v and I, is defined by

Match(I,N , u, v) =Xσ2Σ

�i(σ)ρI,tem(σ)(u(σ))� j(σ)θ

��Xn2N

i(n)Econ(n)(u(W(n))�1 � u(n))�Xh2H

j(h)B(h)�Xe2E

i(e)�Erel(e)(u(A(S(e)))�1 � u(A(F(e)))) + Ein(W(A(F(e))))(u(W(A(S(e))))�1 � u(W(A(F(e)))))

�where θ is a positive real constant. The symmetry condition for N , u, v is8n2N u(P(n))�1 � u(n) 2 symm(P(n)).

The terms in the definition of Match can be explained as follows. For any symbol σ,

u(σ) is the embedding of σ in the image plane. For any node n, u(n) is equal to u(P(n)),

the embedding of the symbol P(n), possibly adjusted by a symmetry transformation

(a member of symm(P(n))). The term ρI,tem(σ)(u(σ)) measures how well the template

of σ, embedded in the image by u(σ), matches the image (see x4). The term θ is

a fixed penalty incurred by any toplevel symbol, i.e., any symbol that is not a part

of any other symbol. The term Econ(n)(u(W(n))�1 � u(n)) measures how well the part

whole relationship between P(n) and W(n) matches the fleximap con(n). The term

B(h) is a penalty incurred by a hook h with no incident edges (see x7.2). The term

Erel(e)(u(A(S(e)))�1 � u(A(F(e)))) measures how well the neighbourhood relation between

the symbols P(A(F(e))) and P(A(S(e))) matches the fleximap rel(e). The final term,

Ein(W(A(F(e))))(u(W(A(S(e))))�1 �u(W(A(F(e))))) is only nonzero if the network is not coherent:

if an edge e has W(A(S(e))) 6= W(A(F(e))) then this term penalises the difference in

embedding between W(A(S(e))) and W(A(F(e))). By the end of the recognition process,

the symbols W(A(S(e))) and W(A(F(e))) will either have been forced together by this term

and merged, or the edge will have been removed.

All these terms are weighted by inclusion functions, i(σ), j(σ), i(n), j(h), i(e), to reflect

the different degrees to which σ, etc., are believed to be present.

LEMMA 1. For any image I, network N , embedding tokens u, u0 for N , and embedding

type v for N ,

Match(I,N , u � u0, v � u0) = Match(I,N , u, v),

provided u0 ÆW Æ A Æ F = u0 ÆW Æ A Æ S. 40

xProof. Let v = (con1, rel1, symm1, tem1, in1); then v � u0 = (con2, rel2, symm1, tem2, in2),

where 8n2N con2(n) = u0(W(n))�1 � con1(n) � u0(n)8e2E rel2(e) = u0(A(S(e)))�1 � rel1(e) � u0(A(F(e)))8σ2Σ tem2(σ) = tem1(σ) Æ u0(σ)8σ2Σ in2(σ) = u0(σ)�1 � in1(σ) � u0(σ)

Now, first, for any σ2Σ,

ρI,tem2(σ)((u � u0)(σ)) = ρI,tem1(σ)Æu0(σ)(u(σ) � u0(σ)) = ρI,tem1(σ)(u(σ))

using lemma 4.1.

Secondly, for any n2N,

Econ2(n)((u � u0)(W(n))�1 � (u � u0)(n))= Eu0(W(n))�1�con1(n)�u0(n)(u0(W(n))�1 � u(W(n))�1 � u(n) � u0(n))= Econ1(n)(u(W(n))�1 � u(n))

by lemma 5.1.

Thirdly, for any e2E, using the abbreviations n1 = A(F(e)) and n2 = A(S(e)),

Erel2(e)((u � u0)(n2)�1 � (u � u0)(n1)) = Eu0(n2)�1�rel1(e)�u0(n1)(u0(n2)�1 � u(n2)�1 � u(n1) � u0(n1))= Erel1(e)(u(n2)�1 � u(n1))

and, using the abbreviations σ1 = W(A(F(e))) and σ2 = W(A(S(e))),

Ein2(σ1)((u � u0)(σ2)�1 � (u � u0)(σ1)) = Eu0(σ1)�1�in1(σ1)�u0(σ1)(u0(σ2)�1 � u(σ2)�1 � u(σ1) � u0(σ1))= Ein1(σ1)(u(σ2)�1 � u(σ1))

since u0(σ1) = u0(σ2).

Thus each term of Match(I,N , u � u0, v � u0) equals the corresponding term of

Match(I,N , u, v).

7.4 The recognition problem

Given a coherent network N0 (representing a grammar), an embedding type v for N0,

and an image I, the recognition problem is to find a coherent and definite network N1

(representing a pattern), with inclusion function i satisfying 8x2Σ1 [N1 [ E1 i(x) = 1,

a homomorphism p:N1 ! N0 (representing a parse of the pattern according to the

grammar), and an embedding token u for N1, that maximises Match(I,N1, u, v Æ p),

subject to the symmetry condition for N1, u, v Æ p.41

x7.5 Symmetries

A symmetry of a network N = (Σ, N, H, E, W, P, A, F, S), with respect to the embedding

type v = (con, rel, symm, tem, in), is an automorphism a:N ! N together with an em

bedding token s for N , such that8σ2Σ s(σ) 2 symm(σ),8n2N s(n) 2 symm(P(n)),

v � s = v Æ a.

LEMMA 2. (Global application of a symmetry.) Given a symmetry a, s of the grammar N0

with respect to the embedding type v, a homomorphism p:N1 ! N0, and an embedding

token u for N1, we can produce a new parse p0 = a Æ p:N1 ! N0 and embedding token

u0 = u � (s Æ p). Provided the grammar is coherent, this gives

Match(I,N1, u0, v Æ p0) = Match(I,N1, u, v Æ p),

and the symmetry condition holds for N1, u0, v Æ p0 iff it holds for N1, u, v Æ p.

Proof. Let N0 = (Σ0, N0, H0, E0, W0, P0, A0, F0, S0) and N1 = (Σ1, N1, H1, E1, W1, P1, A1, F1,

S1). Applying the definition of a symmetry, followed by lemma 6.1 and lemma 1, gives

Match(I,N1, u0, v Æ p0) = Match(I,N1, u � (s Æ p), v Æ a Æ p)= Match(I,N1, u � (s Æ p), (v � s) Æ p)= Match(I,N1, u � (s Æ p), (v Æ p) � (s Æ p))= Match(I,N1, u, v Æ p).

Note that the application of lemma 1 is justified since we have s Æ p Æ W1 Æ A1 Æ F1 =s ÆW0 Æ A0 Æ F0 Æ p = s ÆW0 Æ A0 Æ S0 Æ p = s Æ p ÆW1 Æ A1 Æ S1, since N0 is coherent.

The symmetry condition is checked as follows. Let v = (con, rel, symm, tem, in). For

any n2N1, we have

u0(P1(n))�1 � u0(n) = s(p(P1(n)))�1 � u(P1(n))�1 � u(n) � s(p(n)).

Note that, since a, s is a symmetry of N0, we have s(p(P1(n))) 2 symm(p(P1(n))) and

s(p(n)) 2 symm(P0(p(n))) = symm(p(P1(n))), and symm = symm Æ a. Now, the symmetry

condition for N1, u, v Æ p requires

u(P1(n))�1 � u(n) 2 symm(p(P1(n))),

whereas the symmetry condition for N1, u0, v Æ p0 requires

u0(P1(n))�1 � u0(n) 2 symm(a(p(P1(n)))).

From the above information we see that these two conditions are equivalent.42

x7.6 The derivative of the match function

Given a network N = (Σ, N, H, E, W, P, A, F, S), an image I, an embedding token u for N ,

and an embedding type v = (con, rel, symm, tem, in) for N , we can consider the derivative

of the Match function with respect to the embeddings of the symbols: that is, we can

consider the change in Match(I,N , u, v) produced by a small change in u.

Let us introduce the following abbreviations:8n2N Sn = u(P(n))�1 � u(n)8n2N Cn = u(W(n))�1 � u(n)8e2E Re = u(A(S(e)))�1 � u(A(F(e)))8e2E R0e = u(W(A(S(e))))�1 � u(W(A(F(e))))

This gives

Match(I,N , u, v) =Xσ2Σ

�i(σ)ρI,tem(σ)(u(σ))� j(σ)θ

��Xn2N

i(n)Econ(n)(Cn)�Xh2H

j(h)B(h)�Xe2E

i(e)�Erel(e)(Re) + Ein(W(A(F(e))))(R

0e)�

and the symmetry condition is 8n2N Sn 2 symm(P(n)).

Now, consider the effect of making a small change in the embedding token from u to

u � ∆u, where 8σ2Σ ∆u(σ) = exp(εVσ)8n2N ∆u(n) = exp(εVn)

where ε is a real number in a neighbourhood of 0 and Vσ, Vn 2 A. The movement Vσ

of each symbol σ may be chosen arbitrarily, but, in order to preserve the symmetry

condition, the movement Vn of each node n must then be Vn = Ad(S�1n )(VP(n)), thus

leaving Sn invariant.

THEOREM 3. For ∆u defined as above,

Match(I,N , u � ∆u, v) = Match(I,N , u, v) + εXσ2Σ

Fσ(Vσ) + o(ε)

where, for each σ2Σ, n2N and e2E,

Fσ = F0σ + F1

σ + F2σ + F3

σ 2 FF0

σ = i(σ) ρI,tem(σ) �(u(σ))

F1σ = � X

n2W�1(fσg)

Ad(C�1n )y(Fn) + X

n2P�1(fσg)

Ad(S�1n )y(Fn)

F2σ = X

n2P�1(fσg)

Ad(S�1n )y� � X

e2S�1(A�1(fng))

Ad(R�1e )y(Fe) + X

e2F�1(A�1(fng))

Fe

�F3

σ = � Xe2S�1(A�1(W�1(fσg)))

Ad(R0e�1

)y(F0e) + X

e2F�1(A�1(W�1(fσg)))

F0e

Fn = �i(n) Econ(n) �(Cn)

Fe = �i(e) Erel(e) �(Re)

F0e = �i(e) Ein(W(A(F(e)))) �(R0

e) 43

xProof. The effect of the change u 7! u � ∆u on Cn is

Cn 7! exp(�εVW(n)) � Cn � exp(εVn)= Cn � exp(�εAd(C�1n )(VW(n))) � exp(εVn)= Cn � exp

��εAd(C�1n )(VW(n)) + εVn + o(ε)

�using lemma 3.10. The effect of the change u 7! u � ∆u on Re is

Re 7! exp(�εVA(S(e))) �Re � exp(εVA(F(e)))= Re � exp(�εAd(R�1e )(VA(S(e)))) � exp(εVA(F(e)))= Re � exp(εVe + o(ε))

where Ve = �Ad(R�1e )(VA(S(e)))+VA(F(e)), using lemma 3.10 again. The effect of the change

u 7! u � ∆u on R0e is

R0e 7! exp(�εVW(A(S(e)))) � R0

e � exp(εVW(A(F(e))))= R0e � exp(�εAd(R0

e�1

)(VW(A(S(e))))) � exp(εVW(A(F(e))))= R0e � exp(εV 0

e + o(ε))

where V 0e = �Ad(R0

e�1

)(VW(A(S(e)))) + VW(A(F(e))), by lemma 3.10. We are interested in

calculating the consequent change in the value of the match function. Using lemma 3.11,


i(σ) ρI,tem(σ) �(u(σ))(Vσ)� εXn2N

i(n) Econ(n) �(Cn)(�Ad(C�1n )(VW(n)) + Vn)� ε

Xe2E

i(e)�Erel(e) �(Re)(Ve) + Ein(W(A(F(e)))) �(R0

e)(V0e)�+ o(ε). (�)

Using the above definition of Fn, the sum over nodes in equation (�) can be rewritten:�εXn2N

i(n) Econ(n) �(Cn)(�Ad(C�1n )(VW(n)) + Vn)= �ε

Xn2N

Fn(Ad(C�1n )(VW(n))) + ε

Xn2N

Fn(Vn)= �εXn2N

Fn(Ad(C�1n )(VW(n))) + ε

Xn2N

Fn(Ad(S�1n )(VP(n)))= �ε

Xn2N

Ad(C�1n )y(Fn)(VW(n)) + ε

Xn2N

Ad(S�1n )y(Fn)(VP(n))= ε

Xσ2Σ

F1σ (Vσ).

Similarly, using the above definition of Fe, we can rewrite part of the sum over edges in

equation (�) as�εXe2E

i(e) Erel(e) �(Re)(Ve) = εXe2E

Fe(�Ad(R�1e )(VA(S(e))) + VA(F(e)))= �ε

Xe2E

Fe(Ad(R�1e )(VA(S(e)))) + ε

Xe2E

Fe(VA(F(e)))= �εXe2E

Fe(Ad(R�1e )(Ad(S�1

A(S(e)))(VP(A(S(e)))))) + ε

Xe2E

Fe(Ad(S�1A(F(e))

)(VP(A(F(e)))))= �εXe2E

Ad(S�1A(S(e))

)y(Ad(R�1e )y(Fe))(VP(A(S(e)))) + ε

Xe2E

Ad(S�1A(F(e))

)y(Fe)(VP(A(F(e))))= εXσ2Σ

F2σ (Vσ). 44

xUsing the above definition of F0

e, we can rewrite the other part of the sum over edges in

equation (�):�εXe2E

i(e) Ein(W(A(F(e)))) �(R0e)(V

0e) = ε

Xe2E

F0e(�Ad(R0

e�1

)(VW(A(S(e)))) + VW(A(F(e))))= �εXe2E

F0e(Ad(R0

e�1

)(VW(A(S(e))))) + εXe2E

F0e(VW(A(F(e))))= �ε

Xe2E

Ad(R0e�1

)y(F0e)(VW(A(S(e)))) + ε

Xe2E

F0e(VW(A(F(e))))= ε

Xσ2Σ

F3σ (Vσ).

Hence equation (�) finally reduces to


Fσ(Vσ) + o(ε)

as required.

Thus Fσ 2 F measures the effect on the value of the Match function, to first order in

ε, of the change in the embedding of σ. Fσ is called the force on σ, and consists of

four components: F0σ , the force exerted by σ's template; F1

σ , the force exerted by the

partwhole fleximaps in which σ participates; F2σ , the force exerted by the neighbourhood

fleximaps in which σ participates; and F3σ , the force that pulls two symbols W(A(F(e)))

and W(A(S(e))) together, whenever any edge e has W(A(F(e))) 6= W(A(S(e))).

7.7 Gradient ascent on the match function

The aim of gradient ascent is to make a small change to the embedding token u of

the pattern so as to increase Match(I,N1, u, v Æ p) as much as possible (while pre

serving the symmetry condition). Let N1 = (Σ1, N1, H1, E1, W1, P1, A1, F1, S1) and v =(con, rel, symm, tem, in). As shown in theorem 3, the effect of changing the embeddings

of the symbol tokens by 8σ2Σ1 u(σ) 7! u(σ) � exp(εVσ)

(with a corresponding change in the embeddings of the nodes) is to produce a change in

Match(I,N1, u, v Æ p) of εP

σ2Σ1Fσ(Vσ), to first order in ε.

The cost of making this change is defined as 12

εP

σ2Σ1mσgσ(Vσ)(Vσ), where mσ is

the mass of σ (see x4.5) and (I, gσ) = in(p(σ)), i.e., gσ is the inertial metric tensor for σprovided by v Æ p. Hence we wish to choose the Vσ's to maximise

εXσ2Σ1

Fσ(Vσ)� 12

εXσ2Σ1

mσgσ(Vσ)(Vσ)= 12

εXσ2Σ1

1mσ

Fσ(g�1σ (Fσ))� 1

2εXσ2Σ1

1mσ

(mσgσ(Vσ)� Fσ)(mσVσ � g�1σ (Fσ))

This is maximised by taking Vσ = 1mσ

g�1σ (Fσ), for each σ.45

x7.8 The grand conclusion

Given a grammar N0, an embedding type v for N0, and an image I, the recognition

problem is to construct a pattern N1, a parse homomorphism p:N1 ! N0, and an

embedding token u for N1. This paper has solved one part of the problem: it has shown

that u can be optimised incrementally by applying the gradient ascent rule8σ2Σ1 u(σ) 7! u(σ) � exp(εVσ)8n2N1 u(n) 7! u(n) � exp(εAd(S�1n )(VP(n))),

where

Vσ = 1mσ

g�1σ (Fσ)

and Fσ is the force on σ, defined in theorem 3.

46

References

Barnden, J.A. & Pollack, J.B. (1991) Problems for highlevel connectionism. In

Barnden, J.A. & Pollack, J.B. (eds) HighLevel Connectionist Models. Advances

in Connectionist and Neural Computation Theory, vol. 1. Norwood, New Jersey:

Ablex, pp. 1–16.

Boothby, W.M. (1986) An Introduction to Differentiable Manifolds and Riemannian

Geometry. Orlando, Florida: Academic Press.

Dinsmore, J. (ed.) (1992) The Symbolic and Connectionist Paradigms: Closing the Gap.

Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Fletcher, P. (2001) Connectionist learning of regular graph grammars. Connection

Science, 13, no. 2, 127–188.

Fodor, J.A. & Pylyshyn, Z.W. (1988) Connectionism and cognitive architecture: a critical

analysis. Cognition, 28, nos 1 & 2, 3–71.

Garfield, J.L. (1997) Mentalese not spoken here: computation, cognition and causation.

Philosophical Psychology, 10, no. 4, 413–435.

Hadley, R.F. (1999) Connectionism and novel combinations of skills: implications for

cognitive architecture. Minds and Machines, 9, no. 2, 197–221.

Hausner, M & Schwartz, J.T. (1968) Lie Groups, Lie Algebras. London: Nelson.

Horgan, T. & Tienson, J. (1996) Connectionism and the Philosophy of Psychology.

Cambridge, Massachusetts: MIT Press.

Price, J.F. (1977) Lie Groups and Compact Groups. London Mathematical Society

Lecture Note Series, vol. 25. Cambridge: Cambridge University Press.

Smolensky, P. (1988) On the proper treatment of connectionism. Behavioral and Brain

Sciences, 11, no. 1, 1–23.

Sougné, J. (1998) Connectionism and the problem of multiple instantiation. Trends in

Cognitive Science, 2, no. 5, 183–189.

47

mathematical theory of recursive symbol systems

Documents