mathematical theory of recursive symbol systems
TRANSCRIPT
![Page 1: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/1.jpg)
Mathematical Theory ofRecursive Symbol Systems
Peter Fletcher
Email: [email protected]
Technical Report GEN0401
Department of Computer Science
School of Computing and Mathematics
Keele University
Keele, Staffs, ST5 5BG
U.K.
Tel: +441782 583260
Fax: +441782 713082
November 2004
![Page 2: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/2.jpg)
Abstract
This paper introduces a new type of graph grammar for representing recursively struc
tured twodimensional geometric patterns, for the purposes of syntactic pattern recogni
tion. The grammars are noisetolerant, in the sense that they can accommodate patterns
that contain pixel noise, variability in the geometric relations between symbols, miss
ing parts, or ambiguous parts, and that overlap other patterns. Geometric variability
is modelled using fleximaps, drawing on concepts from the theory of Lie algebras and
tensor calculus.
The formalism introduced here is intended to be generalisable to all problem do
mains in which complex, manylayered structures occur in the presence of noise.
Keywords: graph grammars, contextfree grammars, recursive symbol processing, pars
ing, syntactic pattern recognition, noise tolerance, affine invariance.
![Page 3: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/3.jpg)
Table of Contentsx1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4x2. Algebraic and analytic preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7x3. Affine transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16x4. Matching a template to an image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30x5. Fleximaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33x6. Networks and homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36x7. The recognition problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
![Page 4: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/4.jpg)
1. Introduction
1.1 Aims
This work is an exercise in syntactic pattern recognition. The aim is to represent complex
geometric patterns as symbol systems, in a robust way that can cope with missing or
deformed symbols, variability in the geometric relations between parts of a symbol, and
other types of noise. More generally, the longterm aim of my research is to combine
the expressive power of recursive symbol systems (such as formal grammars) with the
noisetolerance of massively parallel, distributed computational systems (such as neural
networks).
A symbol system is a structure consisting of finitely many elements of various types,
called symbols; a symbol may contain other symbols, and may stand in certain relations
to neighbouring symbols. The permitted relations of containment and neighbourhood
are specified by a grammar, which is also a symbol system. The grammar is finite,
but by means of recursion it can describe infinitely many symbol systems. The most
familiar examples of grammars are those used to describe programming languages and
natural languages; these are called string grammars, because they describe sequences
of symbols. Graph grammars are more powerful than string grammars, as they can
represent more complex, nonsequential configurations of symbols.
The importance of recursive symbol systems was emphasised in the socalled `clas
sical' tradition of artificial intelligence and cognitive science; it became the subject of
controversy between the `classical' tradition and connectionism in the 1980s and 1990s
(Smolensky, 1988; Fodor & Pylyshyn, 1988; Barnden & Pollack, 1991; Dinsmore, 1992;
Horgan & Tienson, 1996; Garfield, 1997; Sougné, 1998; Hadley, 1999). There is no
doubt that recursive symbol systems have an expressive power and computational flex
ibility that is unmatched by any other type of representation. Their big weakness is
their `brittleness': their inability to cope with noise, vagueness, incomplete or contra
dictory information, ambiguity, contextdependency, and so on. The aim of my work is
to overcome this brittleness.
In this paper I shall set up a mathematical theory of noisy symbol systems and
of how they may be used to represent complex, recursively structured twodimensional
geometric patterns. (My choice of geometric patterns as a problem domain is simply for
the sake of convenience; the theory is intended to be of wider application.)
The grammars I shall use are essentially a kind of graph grammar. In my previous
work (Fletcher, 2001) I showed how regular graph grammars describing twodimensional
geometric patterns could be parsed and learned from positive examples. The present
paper goes beyond this in the following ways.� The grammatical framework is extended from regular to contextfree graph gram
mars.� The full geometry of the plane is used, whereas previously the plane was represented
by a discrete square grid.� Noise is permitted, in the form of pixel noise, variability in the geometric relations
between parts, incomplete patterns, and overlapping patterns.4
![Page 5: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/5.jpg)
x1.2 Terminology
We must distinguish between symbol types and symbol tokens. The Roman alphabet
consists of 26 letters, in upper and lower case, which gives 52 symbol types. A particular
occurrence of a symbol type at a particular position, orientation and size is a symbol
token. Thus a text document may contain thousands of symbol tokens, each of which
is an instance of one of the 52 symbol types (ignoring punctuation marks and other
nonletter symbols).
A grammar is a system of symbol types. A pattern is the system of all symbol
tokens present in a given image. A grammar specifies a (usually infinite) set of possible
patterns.
The shape of a symbol type is described by a template, which depicts how a token
of that type would look, occurring at a standard position, orientation and size, without
noise. Each symbol type has a template.
The position, orientation, size, and degree of stretching and shearing of a symbol
token in an image is specified by an affine transformation, known as the embedding of
the token. It is a mapping from the template of the symbol token's type into the image
plane.
1.3 The contents of this paperx2 provides necessary algebraic and analytic preliminaries from the theory of real
finitedimensional vector spaces and linear transformations. All the material here
is standard, but to ensure that this paper is selfcontained and rigorous I have
rederived all results in precisely the form in which I shall need them in the later
sections.x3 sets up the basic theory of affine transformations, which are used to describe the
embedding of symbol tokens in the image plane and the relations of symbol tokens
to one another. I use coordinatefree concepts from the theory of Lie algebras and
tensor calculus but also give coordinatebased calculation techniques for use in a
computational implementation. (See Price (1977), Hausner & Schwartz (1968) and
Boothby (1986) for more details on these mathematical concepts.)x4 deals with templates, which describe the concrete appearance of symbols in the
image plane. The goodness of match of a template with an image is measured by
a correlation function; I calculate the derivative of this function so that it can be
maximised.x5 defines the concept of a fleximap, which is used to represent a flexible affine transfor
mation, i.e., one that can be deformed from its nominal value along certain degrees
of freedom.x6 defines the central concept of a network, which is used to represent symbol systems
(both grammars and patterns). The parsing of a pattern under a grammar is
represented by a homomorphism between networks (rather than by a parse tree, as
in conventional grammatical formalisms).5
![Page 6: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/6.jpg)
xx7 sets up the problem of recognising the pattern in a given image. I state the recog
nition problem formally with the help of a Match function and show how the em
bedding of the pattern can be optimised by gradient ascent on the Match function.
The next step (to be dealt with in a future paper) is to introduce a parsing algorithm to
find the best pattern and homomorphism. The problem of learning the grammar from
a given set of example images is left for future work.
In this paper lemmas and theorems will be numbered consecutively 1, 2, 3, : : : within
each section, and will be cited outside the section by adding the section number: e.g.,
lemma 2.3 is lemma 3 of x2.
6
![Page 7: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/7.jpg)
2. Algebraic and Analytic Preliminaries
2.1 The norm of a vector
Let V be a real finitedimensional vector space. Choose a basis fe1, : : : eng, and define a
norm on V by ���� nXi=1
viei
���� =vuut nXi=1
(vi)2.
This norm is said to be induced by the basis.
LEMMA 1. The norm satisfies the usual axioms:
(i) 8u, v2V ju + vj � juj+ jvj(ii) 8k2R8v2V jkvj = jkjjvj
(iii) v 6= 0 ) jvj > 0
Proof. (i) For any u =Pni=1 uiei and any v =Pn
i=1 viei, by the CauchySchwarz inequality
we haveju + vj2 = nXi=1
(ui + vi)2 = nXi=1
(ui)2 + nXi=1
(vi)2 + 2
nXi=1
uivi� nXi=1
(ui)2 + nXi=1
(vi)2 + 2
vuut nXi=1
(ui)2
vuut nXi=1
(vi)2 = juj2 + jvj2 + 2jujjvj = (juj+ jvj)2
so ju + vj � juj+ jvj.(ii) For any k2R and any v =Pn
i=1 viei,jkvj2 = nXi=1
(kvi)2 = k2
nXi=1
(vi)2 = k2jvj2so jkvj = jkjjvj.
(iii) For any v = Pni=1 viei, if v 6= 0 then we must have at least one vi 6= 0, sojvj2 =Pn
i=1(vi)2 > 0.
LEMMA 2. If a norm j � j is induced by a basis fe1, : : : eng, and a second norm j � j0 is
induced by another basis ff1, : : : fng, then the two norms are related by9α2R 8v2V jvj � αjvj0, 9β2R 8v2V jvj0 � βjvj.Proof. Since fe1, : : : eng is a basis, every vector fj can be expressed as a linear combination
fj =Pni=1 Mi
jei, for some real numbers Mij.
Any vector v2V can be expressed in terms of the basis ff1, : : : fng asPn
j=1 bjfj. Then
v = nXj=1
bjfj = nXj=1
nXi=1
bjMijei = nX
i=1
� nXj=1
bjMij
�ei
so, by the definition of the norms and the CauchySchwarz inequality,jvj2 = nXi=1
� nXj=1
bjMij
�2 � nXi=1
� nXj=1
(bj)2�� nX
j=1
(Mij)
2� = jvj02 nX
i,j=1
(Mij)
2
so taking α =qPni,j=1(Mi
j)2 verifies the first half of the lemma. The other half is proved
similarly. 7
![Page 8: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/8.jpg)
x2.2 Convergence
The norm induced by a basis induces a topology on V:
xk ! l as k !1 iff 8ε > 0 9N 8k > N jxk � lj < ε.
By lemma 2, the topology is independent of the choice of basis.
2.3 Vanishing and limited functions
Let V and W be real finitedimensional vector spaces, each with a norm induced by a
basis.� A function f : V ! W is vanishing iff 8ε > 0 9δ > 0 8v2V jvj < δ ) jf (v)j � εjvj.� A function f : V ! W is limited iff 9ε > 0 9δ > 0 8v2V jvj < δ ) jf (v)j � εjvj.By lemma 2, these concepts are independent of the choice of bases. In the lemmas that
follow, U, V and W are any real finitedimensional vector spaces, each with a norm
induced by a basis.
LEMMA 3. Any vanishing function is limited.
Proof. Immediate from the definitions.
LEMMA 4. Any linear function is limited and continuous.
Proof. Let F: V ! W be linear. Let fe1, : : : emg be the basis on V and ff1, : : : fng the
basis on W used to induce the norms. Let A be the n � m matrix representing F, i.e.,
F(ei) = Pnj=1 Aj
ifj. Choose ε = qPmi=1
Pnj=1(Aj
i)2. Then, for any v = Pmi=1 viei in V,
F(v) =Pnj=1
�Pmi=1 Aj
ivi�fj, and, by the CauchySchwarz inequality,jF(v)j2 = nXj=1
� mXi=1
Ajiv
i�2 � nX
j=1
� mXi=1
(Aji)
2�� mX
i=1
(vi)2� = ε2jvj2
so jF(v)j � εjvj. This verifies that F is limited (δ may be chosen arbitrarily), and also
implies that F is continuous, since8x, a2V jF(x)� F(a)j = jF(x� a)j � εjx� aj.LEMMA 5. If f : V ! W is limited then f (0) = 0 and f is continuous at 0.
Proof. There exist positive ε0, δ0 such that8v2V jvj < δ0 ) jf (v)j � ε0jvj.Taking v = 0 gives jf (0)j � ε0j0j = 0, so f (0) = 0 as required.
To show that f is continuous at 0, for any ε > 0, choose δ = min(δ0, εε0
); then8v jvj < δ ) jf (v)j � ε0jvj < ε0δ � ε
as required. 8
![Page 9: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/9.jpg)
xLEMMA 6. If f , g: V ! W are vanishing and k2R then f + g and kf are also vanishing.
Proof. For any ε > 0, we know that there exist δ1 > 0 and δ2 > 0 such that8v jvj < δ1 ) jf (v)j � ε2jvj,8v jvj < δ2 ) jg(v)j � ε
2jvj.
Now, choose δ = min(δ1, δ2). Then8v jvj < δ ) j(f + g)(v)j � jf (v)j + jg(v)j � εjvjso f + g is vanishing.
Secondly, for any ε > 0, we know that there exists δ > 0 such that8v jvj < δ ) jf (v)j � εjkj+ 1jvj.
Hence 8v jvj < δ ) j(kf )(v)j = jkjjf (v)j � εjkjjkj+ 1
jvj < εjvjso kf is vanishing.
LEMMA 7. If f , g: V ! W are limited and k2R then f + g and kf are also limited.
Proof. We know that there exist positive ε1, δ1, ε2, δ2 such that8v jvj < δ1 ) jf (v)j � ε1jvj,8v jvj < δ2 ) jg(v)j � ε2jvj.Now, choose ε = ε1 + ε2 and δ = min(δ1, δ2). Then8v jvj < δ ) j(f + g)(v)j � jf (v)j + jg(v)j � εjvjso f + g is limited.
For the second part, choose ε = (jkj+ 1)ε1 > 0. Then8v jvj < δ1 ) j(kf )(v)j = jkjjf (v)j � εjvjso kf is limited.
LEMMA 8. If f : V ! W is vanishing and g: U ! V is limited then f Æ g is vanishing.
Proof. We know that there exist positive ε1, δ1 such that8u2U juj < δ1 ) jg(u)j � ε1juj.Also, for any ε > 0, we know that there exists δ2 > 0 such that8v2V jvj < δ2 ) jf (v)j � ε
ε1
jvj.Choose δ = min(δ1, δ2
ε1). Then8u juj < δ ) jg(u)j � ε1juj < ε1δ � δ2 ) jf (g(u))j � ε
ε1
jg(u)j � εjujas required. 9
![Page 10: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/10.jpg)
xLEMMA 9. If f : V ! W is limited and g: U ! V is vanishing then f Æ g is vanishing.
Proof. We know that there exist positive ε2, δ2 such that8v2V jvj < δ2 ) jf (v)j � ε2jvj.Also, for any ε > 0, we know that there exists δ1 > 0 such that8u2U juj < δ1 ) jg(u)j � min
� εε2
, δ2
�juj.Choose δ = min(δ1, 1). Then, for any u with juj < δ, we have juj < δ1, so jg(u)j �min
�ε
ε2, δ2
�juj, so jg(u)j � δ2juj < δ2 (since juj < 1). Hence jf (g(u))j � ε2jg(u)j. But we
also have jg(u)j � εε2juj, so jf (g(u))j � εjuj as required.
2.4 Application of the theory to linear transformations
For any real finitedimensional vector spaces V and W, let L(V, W) be the vector space of
all linear transformations from V to W. A basis fe1, : : : emg for V and a basis ff1, : : : fngfor W induce a basis fhi
j j 1 � i � m, 1 � j � n g for L(V, W), where
hij(ek) = � fj if i = k
0 if i 6= k
Then any A2L(V, W) can be represented in terms of this basis by
A = mXi=1
nXj=1
Ajih
ij
giving
A(ei) = nXj=1
Ajifj.
For any element v =Pmi=1 viei of V we have
A(v) = nXj=1
� mXi=1
Ajiv
i�
fj.
Assume that V, W and L(V, W) have the norms induced by these bases. For any
A =Pmi=1
Pnj=1 Aj
ihij we havejAj2 = mX
i=1
nXj=1
(Aji)
2 = mXi=1
jA(ei)j2.
In the lemmas that follow, let U, V, W be any real finitedimensional vector spaces with
norms induced by bases.
LEMMA 10. If v2V and A2L(V, W) then jAvj � jAjjvj.Proof. Let v =Pm
i=1 viei. ThenjA(v)j2 = ��� mXi=1
viA(ei)���2 � � mX
i=1
jvijjA(ei)j�2 � mXi=1
(vi)2
mXi=1
jA(ei)j2 = jvj2jAj2using the triangle inequality and the CauchySchwarz inequality. Hence jA(v)j � jAjjvj.10
![Page 11: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/11.jpg)
xLEMMA 11. If A 2 L(V, W) and B 2 L(U, V) then jABj � jAjjBj.Proof. Let fd1, : : : dlg be the basis on U that induces the norm. By the previous lemma,jABj2 = lX
k=1
jAB(dk)j2 � lXk=1
jAj2jB(dk)j2 = jAj2jBj2so jABj � jAjjBj.LEMMA 12. If (Ak) is a sequence in L(V, W) and L 2 L(V, W) then
Ak ! L as k !1 iff 8v2V Ak(v) ! L(v) as k !1.
(A special case of this is:P1
k=1 Vk = L iff 8v2VP1
k=1 Vk(v) = L(v).)
Proof. If Ak ! L as k !1 then, for any v2V,jAk(v)� L(v)j = j(Ak � L)(v)j � jAk � Ljjvjso, by comparison, Ak(v) ! L(v) as k !1, as required.
Conversely, suppose 8v Ak(v) ! L(v) as k ! 1. Let fe1, : : : emg be the basis that
induces the norm on V. ThenjAk � Lj2 = mXi=1
j(Ak � L)(ei)j2 ! 0 as k !1as required.
For any normed vector space S and any ε > 0, let Bε(S) = f v2S j jvj < ε g.LEMMA 13. If (cn) is a real sequence, and
P1n=0 cnln is absolutely convergent for some
positive real l, thenP1
n=0 cnAn is absolutely convergent for all A2Bl(L(V, V)), and if f
is a function satisfying 8A2Bl(L(V, V)) f (A) = 1Xn=0
cnAn
then f is continuous on Bl(L(V, V)).
Proof. Consider any A2Bl(L(V, V)). Let a = jAj < l, θ = l+a2
, and ε = l�a2
. For all n, we
have jcnAnj � jcnanj � jcnlnj, soP1
n=0 cnAn is absolutely convergent by comparison withP1n=0 cnln.
Also, for any X2Bε(L(V, V)), we havej(A + X)n � Anj = jRn(A, X)j � Rn(a, x)
where x = jXj, Rn(A, X) is the sum of the 2n � 1 terms in the expansion of (A + X)n
other than An, and Rn(a, x) is the same sum of terms with A and X replaced by the real
numbers a and x. 11
![Page 12: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/12.jpg)
xNow,
0 � Rn(a, x) = (a + x)n � an = x�(a + x)n�1 + (a + x)n�2a + � � � + (a + x)an�2 + an�1
�� x(nθn�1) since a < θ and a + x < θ� x(θ + ε)n
εsince (θ + ε)n = θn + nθn�1ε + � � �+ εn= x
εln.
HenceP1
n=0 jcnRn(a, x)j exists, by comparison withP1
n=0 jcnlnj, with1Xn=0
jcnRn(a, x)j � x
ε
1Xn=0
jcnlnj = Mx
where M = 1ε
P1n=0 jcnlnj. Hence, by comparison,
P1n=0 jcn(A + X)n � cnAnj also exists,
with 1Xn=0
jcn(A + X)n � cnAnj � 1Xn=0
jcnRn(a, x)j � Mx.
Now, we have A, A+X 2 Bl(L(V, V)), so f (A) =P1n=0 cnAn and f (A+X) =P1
n=0 cn(A+X)n,
so jf (A + X)� f (A)j = ���� 1Xn=0
cn(A + X)n � cnAn
���� � 1Xn=0
jcn(A + X)n � cnAnj � Mx.
Hence f is continuous at A, as required.
LEMMA 14. If (cn) is a real sequence, andP1
n=2 cnln is absolutely convergent for some
positive real l, and f :L(V, V) ! L(V, V) satisfies8A2Bl(L(V, V)) f (A) = 1Xn=2
cnAn,
then f is vanishing.
Proof. For any A in Bl(L(V, V)) and any n � 2, we havejcnAnj � jcnjjAjn � jAj2l2
jcnlnjsoP1
n=2 jcnAnj converges, by comparison withP1
n=2 jcnlnj, andjf (A)j = ���� 1Xn=2
cnAn
���� � 1Xn=2
jcnAnj � 1Xn=2
jAj2l2
jcnlnj = MjAj2where M = 1
l2
P1n=2 jcnlnj. Hence we can show that f is vanishing: for any ε > 0, choose
δ = min(l, εM+1
), giving8A jAj < δ ) jf (A)j � MjAj2 � MδjAj � εjAjas required. 12
![Page 13: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/13.jpg)
xLEMMA 15. If f , g: V ! L(W, W) are limited functions, and h: V ! L(W, W) is defined by8v2V h(v) = f (v)g(v), then h is vanishing.
Proof. We know that there exist positive ε1, δ1, ε2, δ2 such that8v jvj < δ1 ) jf (v)j � ε1jvj,8v jvj < δ2 ) jg(v)j � ε2jvj.Now, for any ε > 0, choose δ = min(δ1, δ2, ε
ε1ε2). Then8v jvj < δ ) jh(v)j � jf (v)jjg(v)j � ε1ε2jvj2 � ε1ε2δjvj � εjvj
so h is vanishing.
LEMMA 16. In L(V, V), if xn ! X and yn ! Y as n !1 then xnyn ! XY as n !1.
Proof. For any ε > 0, set ε0 = minf1, ε2(jXj+jYj+1)
g; there exists N1 such that8n > N1 jxn �Xj < ε0,and there exists N2 such that 8n > N2 jyn � Yj < ε0.Now, choose N = maxfN1, N2g. Then8n > N jxnyn � XYj = jxn(yn � Y) + (xn � X)Yj � jxnjjyn � Yj+ jxn � XjjYj� (jXj+ ε0)ε0 + ε0jYj � ε0(jXj+ jYj+ 1) � ε
2< ε,
so xnyn ! XY as n !1, as required.
LEMMA 17. (Cauchy product) In L(V, V), ifP1
n=0 an andP1
n=0 bn converge absolutely
thenP1
n=0 cn also converges absolutely, where cn =Pni=0 aibn�i, and1X
n=0
an
1Xn=0
bn = 1Xn=0
cn.
Proof. Let A0 = P1n=0 janj, B0 = P1
n=0 jbnj, A = P1n=0 an, B = P1
n=0 bn. Assume A0 > 0
and B0 > 0, since otherwise the result is immediate. First I shall show thatP1
n=0 cn
converges absolutely. For any n, we have
nXk=0
jckj = nXk=0
���� kXi=0
aibk�i
���� � nXk=0
kXi=0
jaijjbk�ij � nXi=0
nXj=0
jaijjbjj = nXi=0
jaij nXj=0
jbjj � A0B0,
soP1
k=0 jckj exists, as required.
Next I shall derive the formula forP1
n=0 cn. For any n,���� nXk=0
ck � nXi=0
ai
nXj=0
bj
���� = ���� X(i,j)2S
aibj
���� � X(i,j)2S
jaijjbjj13
![Page 14: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/14.jpg)
xwhere S = f (i, j) j 0 � i � n, 0 � j � n, i + j > n g. Now, if i, j � bn
2 then 2i, 2j � n, so
i + j � n, so (i, j) =2 S. Thus
S � �f1, : : : ng � f1, : : : ng� n �f1, : : : bn2 g � f1, : : : bn
2 g�
and henceX(i,j)2S
jaijjbjj � nXi,j=0
jaijjbjj � b n2 X
i,j=0
jaijjbjj = nXi=0
jaij nXj=0
jbjj � b n2 X
i=0
jaij b n2 X
j=0
jbjj.For any ε > 0, set ε0 = minfA0, B0, ε
A0+B0+1g > 0. We know that there exists N1 such
that 8n > N1 A0 � ε0 < nXi=1
jaij � A0
and there exists N2 such that8n > N2 B0 � ε0 < nXj=1
jbjj � B0.
We also have�Pn
i=1 ai
��Pnj=1 bj
�! AB as n !1, by lemma 16, so there exists N3 such
that 8n > N3
���� nXi=1
ai
nXj=1
bj � AB
���� < ε0.Choose N = max(2N1 + 1, 2N2 + 1, N3). Then, for any n > N, we have bn
2 > N1, N2, so���� nX
k=0
ck � AB
���� � ���� nXk=0
ck � nXi=1
ai
nXj=1
bj
����+ ���� nXi=1
ai
nXj=1
bj � AB
����� nXi=0
jaij nXj=0
jbjj � b n2 X
i=0
jaij b n2 X
j=0
jbjj+ ε0< A0B0 � (A0 � ε0)(B0 � ε0) + ε0= ε0(A0 + B0 � ε0 + 1) < ε0(A0 + B0 + 1) � ε.
ThusP1
k=0 ck = AB, as required.
2.5 Application of the theory to Rn and matrices
Rn is a finitedimensional vector space. If we choose a basis consisting of unit vectors
then this induces the norm ������8>>>>>>: x1
...
xn
9>>>>>>;������ =vuut nXi=1
(xi)2.
Therefore all the above theory for finitedimensional vector spaces applies to Rn with
this basis and norm. 14
![Page 15: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/15.jpg)
xLet Mnm be the vector space of all n�m matrices. Mnm is isomorphic to L(Rm, Rn)
in the obvious way: a matrix A with components Aji corresponds to the linear transfor
mationPm
i=1
Pnj=1 Aj
ihij. Hence we can transfer across from L(Rm, Rn) to Mnm the basis
and the induced norm, giving jAj =vuut mXi=1
nXj=1
(Aji)2.
Thanks to the isomorphism, lemmas 10–17 continue to hold when the L(V, W) spaces
are replaced by Mnm spaces. In the following sections I shall apply these lemmas both
to L(V, W) spaces and to Mnm spaces.
2.6 Notation
In the following sections I shall sometimes use the notation o(x) to mean a term of the
form R(x), for some vanishing function R, and the notation O(x) to mean a term of the
form R(x), for some limited function R.
15
![Page 16: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/16.jpg)
3. Affine Transformations
3.1 Affine transformations and affine matrices
Any twodimensional affine transformation G: R2 ! R2 may be represented by a matrix
G = 8>>>>: h11 h12 t1
h21 h22 t2
0 0 1
9>>>>;such that, if we represent a point (x1, x2)2R2 by a column vector
(x1, x2) = 8>>>>: x1
x2
1
9>>>>; ,
then application of an affine transformation to a point is represented by matrix multi
plication: G(x) = G x . The homogeneous part of the transformation is represented by
h11, h12, h21, h22, and the translation part is represented by t1, t2. I shall refer to such
a matrix G as an affine matrix. I shall write the combination of two affine transfor
mations as G1 � G2, but shall represent matrix multiplication by juxtaposition; thus,
G1 �G2 = G1 G2 .
3.2 The Lie group and the Lie algebra
Let G be the Lie group of all nonsingular twodimensional affine transformations and
let A be the corresponding Lie algebra. Any member A of A may be represented by a
matrix A of the form
A = 8>>>>: h11 h12 t1
h21 h22 t2
0 0 0
9>>>>;These matrices are called affine generator matrices. Let M33 be the vector space of all
3�3 matrices, and let A0 be the subspace consisting of all affine generator matrices. A0is closed under the commutator operation [M, N] = MN �NM and so can be considered
as a Lie algebra; we can use this to define the Lie bracket [A, B] on A,
[A, B] = [A , B] = A B � B A .
A norm is defined on M33 by8M2M33 jMj =vuut 3Xa,b=1
(Mab)2,
and this induces a norm on A0 and a norm on A:8A2A jAj = jA j =vuut 2Xa=1
3Xb=1
(Aab)2.
Thus A and A0 are isomorphic as Lie algebras and normed vector spaces.16
![Page 17: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/17.jpg)
xThere is an exponential function exp:M33 !M33, defined by8M2M33 exp M = 1X
n=0
Mn
n! .
This series converges absolutely for all M. Note that, for any affine generator matrix
M, exp M is an affine matrix. The inverse function log is defined on a subset of M33
including all affine matrices such that
h11h22 � h12h21 > 0 and h11 + h22 > �2p
h11h22 � h12h21.
log maps all such affine matrices to affine generator matrices. For all M 2M33 withjMj < 1, log(I + M) exists and is given by the absolutely convergent power series
log(I + M) = 1Xn=1
(�1)n+1Mn
n,
where I is the identity matrix.
There is also an exponential function exp:A ! G, with an inverse function log
defined on a subset of G, related to the exp and log functions on M33 by
exp A = exp A , log A = log A .
To calculate exp and log efficiently without using power series, we first consider the
problem of exponentiating 2� 2 matrices.
3.3 Exponential of a 2� 2 matrix
Let X be any 2� 2 matrix. Let X0 = X � 12
tr(X)I, the traceless part of X.
Case 1: det(X0) > 0. Then exp X = etr(X)=2( sin θθ � X0 + cos θ � I), where θ =pdet(X0).
Case 2: det(X0) = 0. Then exp X = etr(X)=2(X0 + I).
Case 3: det(X0) < 0. Then exp X = etr(X)=2( sinh θθ � X0 + cosh θ � I), where θ =p�det(X0).
Note that exp is injective on the set of matrices X with det(X � 12
tr(X)I) < π2.
3.4 Logarithm of a 2� 2 matrix
Let Y be any 2 � 2 matrix satisfying det(Y) > 0 and tr(Y) > �2p
det(Y). Let Y1 =Y=pdet(Y).
Case 1: �2 < tr(Y1) < 2. Then log Y = θsin θ (Y1 � 1
2tr(Y1)I) + 1
2log(det(Y))I, where θ =
cos�1( 12
tr(Y1)) 2 (0, π).
Case 2: tr(Y1) = 2. Then log Y = Y1 � 12
tr(Y1)I + 12
ln(det(Y))I.
Case 3: 2 < tr(Y1). Then log Y = θsinh θ (Y1 � 1
2tr(Y1)I) + 1
2ln(det(Y))I, where θ =
cosh�1
( 12
tr(Y1)). 17
![Page 18: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/18.jpg)
x3.5 Exponential of an affine generator matrix
Let M be any affine generator matrix. Thus M is of the form
M = 8>: X x
0 0 0
9>;where X is 2� 2 and x is 2� 1. Then
exp M = 8>: exp X y
0 0 1
9>;where exp X is as given above and y is given as follows.
Case 1: det(X) 6= 0. Then y = (exp X � I)X�1x.
Case 2: det(X) = 0 and tr(X) 6= 0. Then y = (I + etr(X)�1�tr(X)tr(X)2 X)x.
Case 3: det(X) = 0 and tr(X) = 0. Then y = (I + 12X)x.
3.6 Logarithm of an affine matrix
Let M be any affine matrix. Thus M is of the form
M = 8>: Y y
0 0 1
9>;where Y is 2� 2 and y is 2� 1. Then log M exists iff log Y exists, in which case
log M = 8>: X x
0 0 0
9>;where X = log Y and x is given as follows.
Case 1: det(X) 6= 0. Then x = X(Y � I)�1y.
Case 2: det(X) = 0 and tr(X) 6= 0. Then x = (I + ( 1etr(X)�1
� 1tr(X)
)X)y.
Case 3: det(X) = 0 and tr(X) = 0. Then x = (I � 12X)y.
LEMMA 1. log(exp A) = A for all A2A with jAj < p2 π.
Proof. For any A2A with jAj < p2 π, we have
A = 8>: X x
0 0 0
9>; , with X = 8>: h11 h12
h21 h22
9>; .
and hence
det(X � 12
tr(X)I) = � 14(h11 � h22)2 � h12h21 � �h12h21 � 1
2(h2
12 + h221) � 1
2jAj2 < π2.
Hence log(exp X) = X, by the log construction for 2� 2 matrices, and log(exp A) = A by
the log construction for affine matrices. 18
![Page 19: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/19.jpg)
x3.7 The dual vector space, FLet F be the dual vector space to A: that is, F consists of all linear transformations
from A to R.
Given any basis fa1, : : : a6g for A, we can define a dual basis ff 1, : : : f 6g for F by
f i(aj) = � 1 if i = j,
0 if i 6= j.
Any vectors A2A and F2F can be expressed in terms of the bases as A = P6i=1 Aiai
and F =P6i=1 Fif
i, for unique sequences of real numbers A1, : : :A6 and F1, : : :F6. Then
we have Fi = F(ai), Ai = f i(A), and F(A) =P6i=1 FiA
i.
For example, one possible basis fa1, : : : a6g for A is defined by
a1 = 8>>>>: 1 0 0
0 0 0
0 0 0
9>>>>; a2 = 8>>>>: 0 1 0
0 0 0
0 0 0
9>>>>; a3 = 8>>>>: 0 0 1
0 0 0
0 0 0
9>>>>;a4 = 8>>>>: 0 0 0
1 0 0
0 0 0
9>>>>; a5 = 8>>>>: 0 0 0
0 1 0
0 0 0
9>>>>; a6 = 8>>>>: 0 0 0
0 0 1
0 0 0
9>>>>; .
Then any element A2A, where
A = 8>>>>: h11 h12 t1
h21 h22 t2
0 0 0
9>>>>; ,
may be expressed uniquely in terms of this basis as A =P6i=1 Aiai, where
A1 = h11, A2 = h12, A3 = t1, A4 = h21, A5 = h22, A6 = t2.
The corresponding basis for F is ff 1, : : : f 6g, where
f 1(A) = h11, f 2(A) = h12, f 3(A) = t1,
f 4(A) = h21, f 5(A) = h22, f 6(A) = t2,
and any F2F operates on any A2A by the rule
F(A) = 6Xi=1
FiAi = F1h11 + F2h12 + F3t1 + F4h21 + F5h22 + F6t2.
Using this basis, F can be represented by a matrix
F = 8>>>>: F1 F4 a
F2 F5 b
F3 F6 c
9>>>>;where a, b, c are arbitrary; when I say that F represents F, what I mean is that F and
F are related by the formula 8A2A F(A) = tr(F A ).19
![Page 20: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/20.jpg)
xThe norm induced on A by fa1, : : : a6g (see x2.1) is the one we have already defined:jAj2 = ��� 6X
i=1
Aiai
���2 = 6Xi=1
(Ai)2 = 6Xi=1
f i(A)2.
Similarly, a norm is induced on F by ff 1, : : : f 6g:jFj2 = ��� 6Xi=1
Fifi���2 = 6X
i=1
(Fi)2 = 6X
i=1
F(ai)2.
Given any linear transformation T:A ! A the adjoint linear transformation Ty:F !F is defined by 8F2F 8A2A Ty(F)(A) = F(T(A)).
The basis fa1, : : : a6g induces a basis on L(A,A), the vector space of linear transfor
mations from A to A, as described in x2.4. This basis induces a norm on L(A,A):jTj2 = 6Xi,j=1
(f j(T(ai)))2 = 6X
i=1
jT(ai)j2.
Likewise the basis ff 1, : : : f 6g induces a basis on L(F ,F), which induces a norm onL(F ,F): jSj2 = 6Xi,j=1
(S(f i)(aj))2 = 6X
i=1
jS(f i)j2.
LEMMA 2. For any T2L(A,A), jTyj = jTj.Proof. jTyj2 =P6
i,j=1(Ty(f i)(aj))2 =P6
i,j=1(f i(T(aj)))2 = jTj2.
3.8 Inner automorphisms of the Lie algebra
Define Ad:G ! (A ! A), in terms of matrix representations, by8G2G 8A2A Ad(G)(A) = G A G�1.
Note that, for any G 2 G, Ad(G) is a linear transformation on A, and that Ad(G1) ÆAd(G2) = Ad(G1G2).
LEMMA 3. For any G2G and A2A, exp(Ad(G)(A)) = G � exp(A) �G�1.
Proof. In terms of the matrix representations,
exp(Ad(G)(A)) = exp Ad(G)(A) = exp(G A G�1)= 1Xn=0
1
n! (G A G�1)n = G� 1X
n=0
1
n! An�
G�1= G exp(A )G�1 = G exp(A) G�1 = G � exp(A) �G�1 .20
![Page 21: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/21.jpg)
xLEMMA 4. For any G2 G and F 2 F , if F is a matrix representing F then the matrix
G�1 F G represents Ad(G)y(F).
Proof. For any A2A,
Ad(G)y(F)(A) = F(Ad(G)(A)) = tr(F Ad(G)(A)) = tr(F G A G�1) = tr(G�1 F G A)
which shows that G�1 F G represents Ad(G)y(F).
Define a linear function ad:A ! (A ! A) by8A, V2A ad(A)(V) = [A, V].
Note that, for any A, ad(A):A ! A is a linear transformation.
LEMMA 5. For any A2A, jad(A)j � p5jAj.
Proof. Let fa1, : : : a6g be the basis that induces the norm on A. The representing matrix
A of any A2A can be written in the form
A = 8>>>>: h11 h12 t1
h21 h22 t2
0 0 0
9>>>>; .
Then we havejad(A)j2 = 6Xi=1
jad(A)(ai)j2 = 6Xi=1
j[A, ai]j2 = 6Xi=1
��[A, ai]��2 = 6X
i=1
��[A , ai ]��2= 2(h11 � h22)2 + h2
11 + h222 + 5h2
12 + 5h221 + 2t2
1 + 2t22� 5h2
11 + 5h222 + 5h2
12 + 5h221 + 2t2
1 + 2t22 � 5jA j2 = 5jAj2
so jad(A)j � p5jAj.
LEMMA 6. For any A2A and F 2 F , if F is a matrix representing F then the matrix
[F , A] represents ad(A)y(F).
Proof. For any V2A,
ad(A)y(F)(V) = F(ad(A)(V)) = tr(F ad(A)(V) ) = tr(F [A, F]) = tr(F [A , V ]) = tr([F , A]V )
which shows that [F , A] represents ad(A)y(F).
3.9 Approximation lemmas for exp and log
LEMMA 7. For all A2A0, exp A = I + A + RE(A), where RE:A0 ! A0 is vanishing.
Proof. Define R:M33 ! M33 by 8A2M33 R(A) = exp A � I � A. By the power series
expansion for exp, we have 8A2M33 R(A) = 1Xn=2
An
n! .
The corresponding real power seriesP1
n=2tn
n! is absolutely convergent for all t2R, so,
by lemma 2.14 (transferred to M33, as explained in x2.5), R is vanishing. Note that
R(A0) � A0, so we can define RE as the restriction of R to A0.21
![Page 22: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/22.jpg)
xLEMMA 8. For all A 2 A0 with jAj < 1, log(I + A) = A + RL(A), where RL:A0 ! A0 is
vanishing.
Proof. Let N be the set of all A2M33 such that jAj < 1. Define R:M33 !M33 such that8A2N R(A) = log(I + A) � A (the values of R outside N are arbitrary). By the power
series expansion for log, we have8A2N R(A) = 1Xn=2
(�1)n+1An
n.
The corresponding real power seriesP1
n=2(�1)n+1tn
nis absolutely convergent for all t 2
(�1, 1), so, by lemma 2.14 (transferred to M33), R is vanishing. Note that R(A0) � A0,so we can define RL as the restriction of R to A0.LEMMA 9. If S is any finitedimensional vector space, F: S ! A0 is linear, and R1: S ! A0is vanishing, then there is a vanishing function R2: S ! A0 such that, for all V2S in a
neighbourhood of 0,
I + F(V) + R1(V) = exp(F(V) + R2(V)).
Proof. F is linear and hence limited, R1 is vanishing and hence limited, so F + R1 is
limited and hence continuous at 0, with (F + R1)(0) = 0 (lemmas 2.3, 2.4, 2.5, 2.7).
Therefore there is a neighbourhood N of 0 for which8V2N jF(V) + R1(V)j < 1.
For all V2N, we have
log(I + F(V) + R1(V)) = F(V) + R1(V) + RL(F(V) + R1(V))
by lemma 8. Define R2 = R1 + RL Æ (F + R1), giving8V2N I + F(V) + R1(V) = exp(F(V) + R2(V)).
Now, RL is vanishing and F + R1 is limited, so RL Æ (F + R1) is vanishing, so R2 is
vanishing (lemmas 2.6, 2.8), as required.
LEMMA 10. For any A, B2A, there is a vanishing function R: R ! A, such that, for all
real ε in a neighbourhood of 0,
exp(εA) � exp(εB) = exp(ε(A + B) + R(ε)).
Proof. By lemma 7,8ε exp(ε A) exp(ε B) = (I + ε A + RE(ε A))(I + ε B + RE(ε B)) = I + ε(A + B ) + R1(ε)
where R1: R ! A0 is defined by8ε R1(ε) = RE(ε A) + RE(ε B ) + ε2 A B + ε A RE(ε B ) + RE(ε A) ε B + RE(ε A) RE(ε B).22
![Page 23: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/23.jpg)
xIf we define linear functions L1, L2: R ! A0 by8ε L1(ε) = ε A , L2(ε) = ε B ,
then L1, L2 are limited, so RE ÆL1, RE ÆL2 are vanishing, so R1 is vanishing (lemmas 2.3,
2.4, 2.6, 2.8, 2.15).
Then, by lemma 9, there is a vanishing function R2: R ! A0 such that, for all ε in
a neighbourhood of 0,
I + ε(A + B ) + R1(ε) = exp(ε(A + B ) + R2(ε))
giving
exp(ε A) exp(ε B ) = exp(ε(A + B) + R2(ε)).
Using the isomorphism between A0 and A we can obtain from R2 a vanishing function
R: R ! A such that, for all ε in a neighbourhood of 0,
exp(εA) � exp(εB) = exp(ε(A + B) + R(ε))
as required.
3.10 Derivatives of functions to and from GLet S be any finitedimensional vector space.
The derivative of a partial function f :G ! S is the unique maximal partial function
f�:G ! (A ! S) such that, for every G2 G for which f�(G) is defined, f�(G) is a linear
transformation and there is a vanishing function R:A ! S such that, for all V2A in a
neighbourhood of 0,
f (G � exp V) = f (G) + f�(G)(V) + R(V).
f is differentiable at G iff f�(G) is defined.
The derivative of a partial function f : S ! G is the unique maximal partial function
f�: S ! (S ! A) such that, for every x 2 S for which f�(x) is defined, f�(x) is a linear
transformation and there is a vanishing function R: S ! A such that, for all v2S in a
neighbourhood of 0,
f (x + v) = f (x) � exp(f�(x)(v) + R(v)).
f is differentiable at x iff f�(x) exists.
LEMMA 11. If G2G, f :G ! S is differentiable at G, and R1: R ! A is vanishing, then
there is a vanishing function R2: R ! S such that, for all V2A and for all real ε in a
neighbourhood of 0,
f (G � exp(εV + R1(ε))) = f (G) + εf�(G)(V) + R2(ε).
Proof. Since f is differentiable at G, there is a vanishing function R:A ! S and a
neighbourhood N � A of 0 such that8A2N f (G � exp A) = f (G) + f�(G)(A) + R(A).23
![Page 24: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/24.jpg)
xGiven any V2A, define the linear function L: R ! A by 8ε2R L(ε) = εV. Now, L and
R1 are limited, so L + R1 is limited and hence continuous at 0, with (L + R1)(0) = 0
(lemmas 2.3, 2.4, 2.5, 2.7). Hence there is a neighbourhood N0 � R of 0 such that, for
all ε2N0, we have εV + R1(ε) = (L + R1)(ε) 2 N and hence
f (G � exp(εV +R1(ε))) = f (G) + f�(G)(εV +R1(ε))+R(εV +R1(ε)) = f (G)+ εf�(G)(V) +R2(ε)
where R2 = f�(G)ÆR1 +RÆ(L+R1). Now, f�(G) is linear and hence limited, so f�(G)ÆR1 is
vanishing. Also, RÆ(L+R1) is vanishing. Thus R2 is vanishing, as required (lemmas 2.4,
2.6, 2.8, 2.9).
3.11 Calculation of the derivative of exp in terms of matrices
For this section only, we shall need a function ad:A0 ! (A0 ! A0) defined by 8A, V 2A0 ad(A)(V) = [A, V], analogous to the function ad:A ! (A ! A) already defined.
LEMMA 12. For any natural number n and A, V2A0,VAn = nX
i=0
(�1)i
�n
i
�An�i ad(A)i(V).
Proof. By induction on n.
THEOREM 13. For any A2A,
exp�(A) = 1Xn=0
(�1)nad(A)n
(n + 1)!and thus, for any A, V2A,
exp�(A)(V) = 1Xn=0
(�1)nad(A)n(V)
(n + 1)! = V � 12[A, V] + 1
6[A, [A, V]] � 1
24[A, [A, [A, V]]] + � � � .
Both series converge absolutely for all A and V.
Proof. I shall prove the theorem for A, V 2A0 and then transfer the result to A, V 2A,
using the isomorphism between A0 and A.
For any A, V 2 A0,exp(A + V) = 1X
n=0
1
n! (A + V)n = 1Xn=0
(Xn(A) + Yn(A, V) + Zn(A, V))
where Xn(A) = 1n!An, Yn(A, V) = 1
n!Pnr=1 An�rVAr�1, and Zn(A, V) is the sum of the
remaining terms in the expansion of 1n! (A +V)n. Note that Zn(A, V) consists of a sum of
products of A and V, with positive coefficients. For every n we have a similar expansion
of 1n! (jAj+ jVj)n:
1
n! (jAj+ jVj)n = Xn(jAj) + Yn(jAj, jVj) + Zn(jAj, jVj)24
![Page 25: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/25.jpg)
xwith jXn(A)j � Xn(jAj) = 1
n! jAjn,jYn(A, V)j � Yn(jAj, jVj) � 1
n! (jAj+ jVj)n,jZn(A, V)j � Zn(jAj, jVj) � 1
n! (jAj+ jVj)n.
HenceP1
n=0 Xn(A),P1
n=0 Yn(A, V), andP1
n=0 Zn(A, V) all converge absolutely, by com
parison withP1
n=01n! (jAj+ jVj)n = ejAj+jVj. Hence
exp(A + V) = 1Xn=0
Xn(A) + 1Xn=0
Yn(A, V) + 1Xn=0
Zn(A, V).
We already know the sum of the first series:P1
n=0 Xn(A) = exp A. As for the third series,
for fixed A, define RA:A0 ! A0 by8V2A0 RA(V) = 1Xn=0
Zn(A, V).
I shall show that RA is vanishing. As pointed out above, we havejZn(A, V)j � Zn(jAj, jVj) = 1
n! (jAj+ jVj)n � Xn(jAj)� Yn(jAj, jVj)= 1
n! ((jAj+ jVj)n � jAjn � njAjn�1jVj)so jRA(V)j � 1X
n=0
jZn(A, V)j � 1Xn=0
1
n! ((jAj+ jVj)n � jAjn � njAjn�1jVj)= ejAj+jVj � ejAj � ejAjjVj = ejAjf (jVj)where f : R ! R is defined by 8t2R f (t) =P1
n=2tn
n! . This real power series is absolutely
convergent for all t, so, by lemma 2.14 (transferred to M11, i.e., R), f is vanishing.
Hence RA is also vanishing.
Returning to the second series,P1
n=0 Yn(A, V), we have Y0(A, V) = 0 and, for any
n > 0, by lemma 12,
Yn(A, V) = 1
n! nXr=1
An�rVAr�1= 1
n! nXr=1
r�1Xi=0
(�1)i
�r� 1
i
�An�i�1 ad(A)i(V)= 1
n! n�1Xi=0
nXr=i+1
(�1)i
�r� 1
i
�An�i�1 ad(A)i(V)= 1
n! n�1Xi=0
(�1)i
�n
i + 1
�An�i�1 ad(A)i(V)= n�1X
i=0
An�i�1
(n� i� 1)! (�1)iad(A)i(V)
(i + 1)! .25
![Page 26: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/26.jpg)
xI wish to apply the Cauchy product formula (lemma 2.17, transferred to M33) to this
expression to evaluateP1
n=1 Yn(A, V). This is justified since the seriesP1
j=0 Aj=j! andP1i=0(�1)iad(A)i(V)=(i + 1)! are absolutely convergent: in the case of the latter series we
have ���� (�1)iad(A)i(V)
(i + 1)! ���� � jp5Aji(i + 1)! jVj
by lemma 5. The Cauchy product formula gives1Xn=1
Yn(A, V) = 1Xn=1
n�1Xi=0
An�i�1
(n� i� 1)! (�1)iad(A)i(V)
(i + 1)!= 1Xm=0
mXi=0
Am�i
(m� i)! (�1)iad(A)i(V)
(i + 1)!= 1Xj=0
Aj
j! 1Xi=0
(�1)iad(A)i(V)
(i + 1)!= exp(A)
1Xi=0
(�1)iad(A)i(V)
(i + 1)! .
Putting all three pieces together gives
exp(A + V) = exp(A) + exp(A)
1Xi=0
(�1)iad(A)i(V)
(i + 1)! + RA(V)= exp(A)
�I + 1X
i=0
(�1)iad(A)i(V)
(i + 1)! + EA(RA(V))
�where EA:A0 ! A0 is the linear transformation defined by 8X2A0 EA(X) = exp(�A) X.
By lemma 2.4, EA is limited, so, by lemma 2.9, EA ÆRA is vanishing. Hence, by lemma 9,
there is a vanishing function R:A0 ! A0 such that, for all V in a neighbourhood of 0,
exp(A + V) = exp(A) exp
� 1Xi=0
(�1)iad(A)i(V)
(i + 1)! + R(V)
�.
This equation has been proved for A, V 2 A0. Since A0 is isomorphic to A the equation
also holds for A, V 2 A. This gives exp�(A)(V) = P1i=0
(�1)iad(A)i(V)(i+1)! and hence exp�(A) =P1
i=0(�1)iad(A)i
(i+1)! by lemma 2.12, as required. As indicated above, these series are absolutely
convergent by comparison withP1
i=0jp5Aji(i+1)! jVj and
P1i=0
jp5Aji(i+1)! .
3.12 Calculation of log�Define a real sequence (an) by
an = nXm=0
Xr1,:::rm�1
r1+���+rm=n
(�1)n�m
(r1 + 1)! � � � (rm + 1)! .LEMMA 14. The real power series
P1n=0 anxn converges absolutely if jxj < 1.26
![Page 27: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/27.jpg)
xProof. We havejanj � nX
m=0
Xr1,:::rm�1
r1+���+rm=n
1
(r1 + 1)! � � � (rm + 1)! � nXm=0
� nXr=1
1
(r + 1)!�m< nXm=0
(e� 2)m < 1
1� (e� 2)= 1
3� e
so the seriesP1
n=0 janxnj converges by comparison with 13�e
P1n=0 jxjn, if jxj < 1.
In fact, the convergence of the power series is much better than this: the first few
terms of (an) are (1, 12, 1
12, 0,� 1
720, 0, 1
30240, 0,� 1
1209600, : : :). The next lemma shows thatP1
n=0 anxn is the inverse ofP1
n=0(�1)n
(n+1)!xn, in the sense of the Cauchy product of series.
LEMMA 15. For any p � 0,
pXn=0
(�1)n
(n + 1)!ap�n = � 1 if p = 0
0 if p > 0
Proof. Define Sn = (�1)n+1
(n+1)! . Then the definition of an may be rewritten as
an = nXm=0
bnm, where bn
m = Xr1,:::rm�1
r1+���+rm=n
Sr1Sr2
� � �Srm.
Note that S0 = �1, b00 = 1, b
p0 = 0 if p > 0, and bn
m = 0 if m > n. Then
pXn=0
(�1)n
(n + 1)!ap�n = pXn=0
(�Sn)
p�nXm=0
bp�nm = � pX
n=0
pXm=0
Snbp�nm= � pX
m=0
S0bpm � pX
n=1
pXm=0
Snbp�nm = pX
m=0
bpm � pX
m=0
pXn=1
Snbp�nm= pX
m=0
bpm � pX
m=0
bpm+1 = b
p0 � b
pp+1 = � 1 if p = 0
0 if p > 0
THEOREM 16. For any A2A with jAj < 1p5,
log�(exp A) = 1Xn=0
an ad(A)n;
thus, for any V2A,
log�(exp A)(V) = 1Xn=0
an ad(A)n(V)= V + 12[A, V] + 1
12[A, [A, V]] � 1
720[A, [A, [A, [A, V]]]]+ 1
30240[A, [A, [A, [A, [A, [A, V]]]]]]� 1
1209600[A, [A, [A, [A, [A, [A, [A, [A, V]]]]]]]] + � � � .27
![Page 28: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/28.jpg)
xBoth series converge absolutely for jAj < 1p
5.
Proof. For jAj < 1p5
we have jad(A)j < 1, by lemma 5, soP1
n=0 an ad(A)n andP1n=0 an ad(A)n(V) converge absolutely by lemma 14 and lemma 2.13. I intend to apply
the inverse function theorem, so I must show that exp is continuously differentiable and
exp�(A) is invertible.
Theorem 13 shows that exp is differentiable, with exp� = F Æ ad, where F:L(A,A)! L(A,A) is defined by 8T2L(A,A) F(T) = 1Xn=0
(�1)n
(n + 1)!Tn.
The corresponding real power seriesP1
n=0(�1)n
(n+1)!xn is absolutely convergent for all x2R,
so by lemma 2.13 the power series for F is absolutely convergent for all T and F is
continuous. The linear function ad is also continuous, by lemma 2.4, so exp� = F Æ ad
is continuous.
Since the seriesP1
n=0(�1)n
(n+1)!ad(A)n andP1
n=0 an ad(A)n are absolutely convergent (forjAj < 1p5), we can compose them, using the Cauchy product formula (lemma 2.17) and
lemma 15, to give1Xn=0
(�1)n
(n + 1)!ad(A)n Æ 1Xn=0
an ad(A)n = 1Xp=0
� pXn=0
(�1)n
(n + 1)!ap�n
�ad(A)p= 1 � ad(A)0 + 0 � ad(A)1 + 0 � ad(A)2 + � � � = I
where I is the identity function. This shows that exp�(A) is invertible and thatP1n=0 an ad(A)n is its inverse.
Hence, by the inverse function theorem, exp maps an open neighbourhood of A
bijectively onto an open neighbourhood of exp A, and its inverse log is continuously dif
ferentiable. It follows that log�(exp A) is the inverse of exp�(A), which isP1
n=0 an ad(A)n.
Applying lemma 2.12 gives log�(exp A)(V) =P1n=0 an ad(A)n(V).
3.13 The dual mapping, log�
We can define a function log�:G ! (F ! F) dual to log�:G ! (A ! A) by8G2G log
�(G) = (log�(G))y
or, more explicitly,8G2G 8F2F 8V2A log�(G)(F)(V) = F(log�(G)(V)).
THEOREM 17. For any A2A with jAj < 1p5,
log�(exp A) = 1Xn=0
an(ad(A)y)n28
![Page 29: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/29.jpg)
xwhere the real sequence (an) is as above; thus, for any F2F ,
log�(exp A)(F) = 1Xn=0
an(ad(A)y)n(F).
Both series converge absolutely for jAj < 1p5.
Proof. For any V2A we have
log�(exp A)(V) = 1Xn=0
an ad(A)n(V)
by theorem 16. So, for any F2F ,
F(log�(exp A)(V)) = 1Xn=0
anF(ad(A)n(V))
since F is continuous by lemma 2.4. Hence
log�(exp A)(F)(V) = F(log�(exp A)(V)) = 1Xn=0
anF(ad(A)n(V)) = 1Xn=0
an(ad(A)y)n(F)(V).
Applying lemma 2.12 gives log�(exp A)(F) =P1
n=0 an(ad(A)y)n(F) and log�(exp A) =P1
n=0 an(ad(A)y)n, as required. These series converge absolutely for jAj < 1p5
sincejad(A)yj = jad(A)j < 1, by lemma 2 and lemma 5.
29
![Page 30: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/30.jpg)
4. Matching a Template to an Image
4.1 Images and templates
An image is a function I: R2 ! [0,1). For any point p 2 R2, I(p) is the image intensity
at the pixel p. The domain of I is called the image plane.
As mentioned in x1, each symbol type has a template, depicting the appearance of
any token of that symbol type at a standard position, size and orientation, in the absence
of noise. Formally, a template is a differentiable function T: R2 ! [0,1) such that the
set f x2R2 j T(x) > 0 g is bounded. The domain of T is called the template plane.
Suppose that we believe that a token of this symbol type occurs in the image at a
certain position, orientation and size (with a certain degree of stretching and shearing).
We can describe this by an affine transformation G from the template plane into the
image plane, which is called the embedding of the symbol token in the image. If the
symbol token really is present where we think it is, there will be a good match between
the functions T and I ÆG, at least in the region where T is nonzero.
The aim of this section is to measure the goodness of match by a correlation function
between T and IÆG, and to calculate its derivative so that we can adjust G incrementally
to maximise the correlation.
Recall from x3.1 that an affine transformation G can be represented by a 3 � 3
matrix G and a point u2R2 can be represented by a 3� 1 column vector u , such that
G(u) = G u . For any template T we can define a modified function T on 3 � 1 column
vectors, such that 8u2R2 T (u ) = T(u).
4.2 The correlation function
Given T, I and G, define the correlation ρI,T(G) between T and I ÆG by
ρI,T(G) = jdet(G )jZ T(u) (I(G(u))� I0) d2u = Z T(G�1(x)) (I(x) � I0) d
2x
where I0 is a positive real constant associated with T. The integrals are over the whole
of R2, or equivalently over a large enough region to include in its interior all the points
where T is nonzero.
LEMMA 1. For any template T, any image I, and any affine transformations G, G0,ρI,TÆG0(G �G0) = ρI,T(G).
Proof.
ρI,TÆG0(G �G0) = Z (T ÆG0)((G �G0)�1(x)) (I(x) � I0) d2x= Z T(G0(G0�1(G�1(x)))) (I(x) � I0) d2x= Z T(G�1(x)) (I(x) � I0) d
2x = ρI,T(G).
30
![Page 31: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/31.jpg)
x4.3 Derivative of the correlation function
Let G be an affine transformation, and consider the effect on ρI,T(G) of making a small
change in G, i.e., replacing G by G � exp A, for some A2A. We have
ρI,T(G � exp A) = Z T((G � exp A)�1(x)) (I(x) � I0) d2x= Z T ((G exp A)�1 x ) (I(x)� I0) d
2x= Z T (exp(�A )G�1 x) (I(x) � I0) d2x= Z T ((I � A + o(A))G�1 x ) (I(x)� I0) d
2x= ρI,T(G) � Z
(rT(G�1(x)))A G�1 x (I(x)� I0) d2x + o(A)
where rT is defined by rT(u) = ( �T(u)�u1
�T(u)�u20 ) .
Hence the derivative ρI,T �:G ! (A ! R) is given by8G2G 8A2A ρI,T �(G)(A) = �Z (rT(G�1(x)))A G�1 x (I(x)� I0) d2x= �jdet(G )jZ (rT(u))A u (I(G(u))� I0) d
2u.
ρI,T �(G) is a linear transformation from A to R, and hence is an element of F ; it is
called the force exerted by I and T on G.
4.4 Matrix representation of the force
LEMMA 2. Using the matrix representation of F (see x3.7), the force ρI,T �(G) is repre
sented by the matrix
ρI,T �(G) = �jdet(G )jZ 8>>>>: u1
u2
1
9>>>>; ( �T(u)�u1
�T(u)�u20 ) (I(G(u))� I0) d
2u= �jdet(G )jZ u (rT(u)) (I(G(u))� I0) d
2u.
Proof. Let M be the matrix on the righthand side of the equation. For any A2A, we
have
tr(MA ) = �jdet(G )jZ tr(u (rT(u))A) (I(G(u))� I0) d2u= �jdet(G )jZ tr((rT(u))A u ) (I(G(u))� I0) d
2u= �jdet(G )jZ (rT(u))A u (I(G(u))� I0) d
2u= ρI,T �(G)(A)
showing that M does indeed represent ρI,T �(G).31
![Page 32: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/32.jpg)
x4.5 The mass of a symbol token
The mass, m, of a symbol token, whose type has template T, and which is embedded in
the image I by G, is defined by
m = jdet(G )jZ T(u) d2u = Z T(G�1(x)) d
2x.
32
![Page 33: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/33.jpg)
5. Fleximaps
5.1 Introduction
A symbol contains other symbols as parts. The geometric relation between a part and
the whole, or between two neighbouring parts, is not absolutely rigid but is allowed to
vary along certain degrees of freedom, by certain amounts. For example, figure 1 shows
a symbol token of type `A', with three line tokens as parts. The figure shows the nominal
relationships between the parts, i.e., the relationships in the absence of noise. But the
relationship between two parts σ1 and σ2 may vary in several ways: σ1 may rotate (by
a limited angle) around a certain point on σ2; σ1 may slide up or down σ2 (by a limited
distance); and σ1 may be shifted across σ2 (but only by a very small amount).
σ1
2σ
Figure 1. The geometric relation between symbol tokens σ1 and σ2 may vary
along several degrees of freedom.
The purpose of a fleximap is to represent both the nominal geometric relationship
and the permitted degrees of variation. The actual geometric relationship G between
the symbol tokens σ1 and σ2 may be expressed in the form G = F � exp A, where F is
the nominal relationship and A is an element of the Lie algebra A, representing the
deviation of the actual relationship from the nominal one.
We wish to impose `soft' constraints on the deviation A. For example, we would like
to say, not `you may rotate up to �θ0 radians and no further', but `there is a penalty
of kθ2 for a rotation by θ radians'; the smaller the value of the coefficient k, the more
rotation is tolerated. In general, the penalty function is a quadratic function depending
on six independent variables (e.g., angle of rotation around a certain point, distance of
translation in a certain direction, amount of dilation about a certain centre, etc.); there
must always be six degrees of freedom because A is a sixdimensional vector space. This
quadratic penalty function is technically known as a metric tensor.
The fleximap, then, is a pair (F, g), where F is the nominal relationship and g is
the metric tensor specifying the penalty for deviations from the nominal.
5.2 Metric tensors
A metric tensor is a function g:A ! (A ! R) such that� 8c1, c22R 8A, B1, B22A g(A)(c1B1 + c2B2) = c1g(A)(B1) + c2g(A)(B2),33
![Page 34: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/34.jpg)
x� 8A, B2A g(A)(B) = g(B)(A),� 8A2A A 6= 0 ) g(A)(A) > 0.
Note that, as a consequence of the first condition, 8A 2 A g(A) 2 F ; in fact, g is an
isomorphism from A to F .
5.3 Fleximaps
A fleximap is a pair (F, g), where F 2 G and g is a metric tensor. Let Flex be the set of
all fleximaps. Associated with any fleximap τ = (F, g) is an energy function Eτ:G ! R
defined by 8G2G Eτ(G) = g(log(F�1 �G))(log(F�1 �G)).
The informal interpretation of a fleximap τ = (F, g) is as indicated above. Any affine
transformation G not too far from F can be represented in the form G = F � exp A, for
some A2A; then we have Eτ(G) = g(A)(A). F is called the nominal transformation, A
measures how far G deviates from F, and Eτ(G) is the penalty incurred by the deviation.
The way in which this works can be seen most clearly if we adopt a basis for A under
which g is diagonal: then, in terms of components, we have g(A)(A) =P6i=1 gii(A
i)2. For
example, the six basis elements might represent a rotation around a certain point, a
dilation about a certain centre, a onedimensional stretch in a certain direction, a shear
in a certain direction, a translation in a certain direction, and a translation in another
direction; these types of deviation would be penalised at the rates g11, g22, g33, g44, g55, g66
respectively. A metric tensor is a convenient way of summing up the six independent
degrees of freedom and the six penalty coefficients g11, g22, g33, g44, g55, g66 associated
with them.
The derivative of Eτ may be calculated as follows. For any G2G and any V2A, we
have
log(F�1 �G � exp V) = log(F�1 �G) + log�(F�1 �G)(V) + o(V)
(see x3.10), so
Eτ(G � exp V) = g(log(F�1 �G � exp V))(log(F�1 �G � exp V))= g(log(F�1 �G))(log(F�1 �G)) + 2g(log(F�1 �G))(log�(F�1 �G)(V)) + o(V)= Eτ(G) + 2 log�(F�1 �G)(g(log(F�1 �G)))(V) + o(V)
so Eτ�(G)(V) = 2 log�(F�1 �G)(g(log(F�1 �G))).
Given a fleximap (F, g) and a transformation G2G, we can define the composition
of the two byG � (F, g) = (G � F, g)
(F, g) �G = (F �G, g0)where g0 is the metric defined by8A, B2A g0(A)(B) = g(Ad(G)(A))(Ad(G)(B)).
LEMMA 1. For any fleximap τ and any G, G02G,
EG0�τ(G0 �G) = Eτ(G) = Eτ�G0(G �G0).34
![Page 35: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/35.jpg)
xProof. Let τ = (F, g). For the first equation,
EG0�τ(G0 �G) = g(log((G0 � F)�1 �G0 �G))(log((G0 � F)�1 �G0 �G))= g(log(F�1 �G))(log(F�1 �G))= Eτ(G).
For the second equation,
Eτ�G0(G �G0) = g0(log((F �G0)�1 �G �G0))(log((F �G0)�1 �G �G0))= g0(log(G0�1 � F�1 �G �G0))(log(G0�1 � F�1 �G �G0))= g0(Ad(G0�1)(log(F�1 �G)))(Ad(G0�1)(log(F�1 �G)))= g(log(F�1 �G))(log(F�1 �G)) = Eτ(G)
using lemma 3.3.
35
![Page 36: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/36.jpg)
6. Networks and Homomorphisms
6.1 Introduction
The central concept of this paper is that of a network, which represents a symbol system.
Networks are used for two purposes: a grammar is represented as a network of symbol
types, and a particular pattern is represented as a network of symbol tokens. The rela
tionship between a pattern and a grammar is represented by a homomorphism from the
pattern network to the grammar network. The task of constructing the homomorphism
is called parsing. (All this is very different from conventional grammatical formalisms,
in which a grammar is a set of production rules and the relationship between a pattern
and a grammar is represented by a derivation or a parse tree.)
A pattern is the system of all symbol tokens present (or believed to be present)
in a given image. The position of each symbol token in the image is expressed by its
embedding, an affine transformation. The embeddings of all the symbol tokens of the
pattern are combined together into a single mathematical object, called the embedding
token of the pattern.
The grammar includes constraints on the embedding token, expressed in terms of
fleximaps. These constraints are combined together into a single mathematical object,
called the embedding type of the grammar.
6.2 Networks
A network is a 9tuple (Σ, N, H, E, W, P, A, F, S), where Σ, N, H, E are disjoint sets, and
W, P: N ! Σ, A: H ! N and F, S: E ! H. The elements of Σ, N, H, E are called symbols,
nodes, hooks and edges, respectively. The purpose of a node is to represent a partwhole
relation between symbols: if a symbol σ1 is a part of a symbol σ2 then there is a node
n with P(n) = σ1 and W(n) = σ2. Each node has zero or more hooks attached to it; if a
hook h is attached to a node n then A(h) = n. The purpose of an edge is to represent
neighbourhood relations between two parts of a common whole: if symbols σ1 and σ2 are
both parts of a symbol σ3, and n1 and n2 are the nodes representing the two partwhole
relations, then there may be an edge e from one of the hooks of n1 to one of the hooks
of n2; this will be represented by F(e) = h1 and S(e) = h2, where h1 is a hook of n1 and
h2 is a hook of n2.
A network is coherent iff W is the coequaliser of A Æ F and A Æ S in the category of
sets. This is equivalent to the conjunction of the following two conditions:� W Æ A Æ F = W Æ A Æ S,� for every σ2Σ, the graph whose set of nodes is W�1(fσg) and whose set of edges is
F�1(A�1(W�1(fσg))), where each edge e connects A(F(e)) to A(S(e)), is connected.
A network is definite iff the following conditions hold:� 8h2H jF�1(fhg)j + jS�1(fhg)j = 1,� 8σ2Σ jP�1(fσg)j � 1.
A grammar is required to be coherent but not necessarily definite. A pattern is required
to be both coherent and definite at the end of the recognition process; but while it is
being constructed it is not required to be coherent or definite.36
![Page 37: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/37.jpg)
x6.3 Homomorphisms
The parse of a pattern is represented by a homomorphism between the pattern and the
grammar. If N = (Σ, N, H, E, W, P, A, F, S) and N 0 = (Σ0, N0, H0, E0, W0, P0, A0, F0, S0) are
networks, a homomorphism f :N ! N 0 is a function from Σ[N[H[E to Σ0[N0[H0[E0such that
f jΣ: Σ ! Σ0, f jN : N ! N0, f jH : H ! H0, f jE: E ! E0,W0 Æ f = f ÆW, P0 Æ f = f Æ P, F0 Æ f = f Æ F, S0 Æ f = f Æ S, and
Hf jH��! H0?yA
?yA0N
f jN��! N0 is a pullback (in the category of sets).
(Note that the pullback condition means that, for every n2N, f maps the hooks of n
bijectively onto the hooks of f (n).)
An isomorphism is a homomorphism that is a bijection. An automorphism is an isomor
phism from a network N to itself.
6.4 Embeddings
An embedding token for a network (Σ, N, H, E, W, P, A, F, S) is a function u: Σ[N ! G. It
specifies the way a pattern is embedded in the image plane: u(σ) is the embedding of a
symbol σ, and u(n) is the embedding of a node n. If P(n) = σ then u(σ) and u(n) will be
very closely related: they will differ only by one of σ's symmetry transformations (this
constraint is called the symmetry condition; see x7.3 below).
Given two embedding tokens, u1 and u2, for the same network, we can define a new
embedding token u1 � u2 by8x2Σ [N (u1 � u2)(x) = u1(x) � u2(x).
Given a homomorphism f :N ! N 0 and an embedding token u for N 0, clearly u Æ f
is an embedding token for N .
An embedding type for a network (Σ, N, H, E, W, P, A, F, S) is a quintuple (con, rel,
symm, tem, in), in which con: N ! Flex, rel: E ! Flex, symm: Σ ! sub(G), tem: Σ ! Tem,
and in: Σ ! Flex where Flex is the set of all fleximaps, sub(G) is the set of all subgroups
of G, and Tem is the set of all templates; for every σ 2 Σ, the fleximap in(σ) must have
nominal part equal to the identity transformation.
The purpose of an embedding type is to impose constraints on the embedding token.
For any node n, con(n) is a fleximap describing the relationship between the embeddings
of a part and a whole (or, more precisely, of the node n and the symbol W(n)). For any
edge e, rel(e) is a fleximap describing the relationship between the embeddings of two
neighbouring parts (or, more precisely, of their nodes A(F(e)) and A(S(e))). For any symbol
σ, symm(σ) is its symmetry group, tem(σ) is its template, and in(σ) is (I, g), where I is
the identity transformation and g is called the inertial metric of σ and determines how
σ moves in response to forces (see x7.7).
The Match function, defined below in x7.3, defines how well a given embedding
token matches a given embedding type. 37
![Page 38: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/38.jpg)
xGiven an embedding type v = (con1, rel1, symm1, tem1, in1) and an embedding token
u for a network (Σ, N, H, E, W, P, A, F, S), we can define an embedding type v � u for the
same network by v � u = (con2, rel2, symm1, tem2, in2), where8n2N con2(n) = u(W(n))�1 � con1(n) � u(n)8e2E rel2(e) = u(A(S(e)))�1 � rel1(e) � u(A(F(e)))8σ2Σ tem2(σ) = tem1(σ) Æ u(σ)8σ2Σ in2(σ) = u(σ)�1 � in1(σ) � u(σ)
Given an embedding type v = (con, rel, symm, tem, in) for a network N 0 and a homo
morphism f :N ! N 0, we can define an embedding type v Æ f for N by
v Æ f = (con Æ f , rel Æ f , symm Æ f , tem Æ f , in Æ f ).
LEMMA 1. If f :N ! N 0 is a homomorphism, v is an embedding type for N 0, and u is an
embedding token for N 0, then
(v � u) Æ f = (v Æ f ) � (u Æ f ).
Proof. Let N = (Σ, N, H, E, W, P, A, F, S), N 0 = (Σ0, N0, H0, E0, W0, P0, A0, F0, S0) and v =(con1, rel1, symm1, tem1, in1). Then v � u = (con2, rel2, symm1, tem2, in2), where8n2N0 con2(n) = u(W0(n))�1 � con1(n) � u(n)8e2E0 rel2(e) = u(A0(S0(e)))�1 � rel1(e) � u(A0(F0(e)))8σ2Σ0 tem2(σ) = tem1(σ) Æ u(σ)8σ2Σ0 in2(σ) = u(σ)�1 � in1(σ) � u(σ)
and hence
(v � u) Æ f = (con2 Æ f , rel2 Æ f , symm1 Æ f , tem2 Æ f , in2 Æ f ).
We also have v Æ f = (con1 Æ f , rel1 Æ f , symm1 Æ f , tem1 Æ f , in1 Æ f ), and hence (v Æ f ) � (u Æ f ) =(con3, rel3, symm1 Æ f , tem3, in3), where8n2N con3(n) = (u Æ f )(W(n))�1 � (con1 Æ f )(n) � (u Æ f )(n)= u(f (W(n)))�1 � con1(f (n)) � u(f (n))= u(W0(f (n)))�1 � con1(f (n)) � u(f (n))= con2(f (n)) = (con2 Æ f )(n)8e2E rel3(e) = (u Æ f )(A(S(e)))�1 � (rel1 Æ f )(e) � (u Æ f )(A(F(e)))= u(f (A(S(e))))�1 � rel1(f (e)) � u(f (A(F(e))))= u(A0(S0(f (e))))�1 � rel1(f (e)) � u(A0(F0(f (e))))= rel2(f (e)) = (rel2 Æ f )(e)8σ2Σ tem3(σ) = (tem1 Æ f )(σ) Æ (u Æ f )(σ)= tem1(f (σ)) Æ u(f (σ))= tem2(f (σ)) = (tem2 Æ f )(σ)8σ2Σ in3(σ) = (u Æ f )(σ)�1 � (in1 Æ f )(σ) � (u Æ f )(σ)= u(f (σ))�1 � in1(f (σ)) � u(f (σ))= in2(f (σ)) = (in2 Æ f )(σ)
which shows that (v � u) Æ f = (v Æ f ) � (u Æ f ).38
![Page 39: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/39.jpg)
7. The Recognition Problem
7.1 Introduction
Given an image, we wish to detect all the symbol tokens present in it, determine how
they are embedded in it and how they are related to one another, and relate the system
of symbol tokens to the grammar. This is the recognition problem. To restate this using
the terminology of x6: we need to construct the pattern, determine the embedding token
of the pattern, and construct the homomorphism from the pattern to the grammar. The
recognition problem will be stated formally in x7.4.
While the pattern is being constructed it may temporarily be in a inconsistent state:
it may be incoherent or indefinite (these terms were defined in x6.2). Also, the symbols,
nodes and edges of the pattern may be only `tentatively present': this is represented by
inclusion functions (e.g., i(σ) is the degree to which it is believed that a symbol token σshould be present in the pattern). However, by the end of the recognition process the
pattern must become coherent and definite, and every symbol, node and edge must be
unequivocally present.
The recognition process is governed by a Match function, which measures how well
a pattern matches the given image and grammar. The aim of recognition is to maximise
this function. Only one aspect of the recognition problem is solved in this paper: for a
given pattern and homomorphism, the optimal embedding token for the pattern is found
by a process of gradient descent on the Match function (see x7.7).
7.2 Inclusion functions
During the course of recognition, the pattern is accompanied by inclusion functions
i: Σ [N [E ! [0, 1] and j: Σ [N [H ! [0, 1], which are subject to the conditions8σ2Σ i(σ) = j(σ) + Xn2P�1(fσg)
i(n)8n2N i(W(n)) = j(n) + i(n)8h2H i(A(h)) = j(h) + Xe2F�1(fhg)[S�1(fhg)
i(e)
These functions reflect the uncertainty during recognition concerning which parts of the
pattern should really be present: i(σ) represents the degree of belief that σ should be
present, and similarly for i(n) and i(e) (there is no need for an inclusion value for hooks
because a hook is present iff its node is). The j function is concerned with missing
parts: j(h) represents the degree of belief that the hook h should be present but that
the correct edge to be connected to it has not yet been found; j(n) represents the degree
of belief that the symbol W(n) should be present but the node n should not be; j(σ)
represents the degree of belief that σ should be present but as a toplevel symbol, i.e.,
with P�1(fσg) = ;.At the end of recognition we must have 8x2Σ[N[E i(x) = 1, i.e., complete certainty.
In addition there is a function B: H ! R that imposes a penalty on bare hooks: if
a hook h has no edges attached to it then a penalty of B(h) is incurred. When a node39
![Page 40: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/40.jpg)
xis first formed its hooks have low values of B(h), since they cannot be expected to have
acquired edges yet; but if they remain bare the value of B(h) is increased to force them
to acquire edges. At the end of the recognition process every hook must have an edge,
to satisfy the definiteness condition.
7.3 The match function between an embedding type and an embedding token
Given a network N = (Σ, N, H, E, W, P, A, F, S), an image I, an embedding token u forN , and an embedding type v = (con, rel, symm, tem) for N , the match function, which
measures how well u matches v and I, is defined by
Match(I,N , u, v) =Xσ2Σ
�i(σ)ρI,tem(σ)(u(σ))� j(σ)θ
��Xn2N
i(n)Econ(n)(u(W(n))�1 � u(n))�Xh2H
j(h)B(h)�Xe2E
i(e)�Erel(e)(u(A(S(e)))�1 � u(A(F(e)))) + Ein(W(A(F(e))))(u(W(A(S(e))))�1 � u(W(A(F(e)))))
�where θ is a positive real constant. The symmetry condition for N , u, v is8n2N u(P(n))�1 � u(n) 2 symm(P(n)).
The terms in the definition of Match can be explained as follows. For any symbol σ,
u(σ) is the embedding of σ in the image plane. For any node n, u(n) is equal to u(P(n)),
the embedding of the symbol P(n), possibly adjusted by a symmetry transformation
(a member of symm(P(n))). The term ρI,tem(σ)(u(σ)) measures how well the template
of σ, embedded in the image by u(σ), matches the image (see x4). The term θ is
a fixed penalty incurred by any toplevel symbol, i.e., any symbol that is not a part
of any other symbol. The term Econ(n)(u(W(n))�1 � u(n)) measures how well the part
whole relationship between P(n) and W(n) matches the fleximap con(n). The term
B(h) is a penalty incurred by a hook h with no incident edges (see x7.2). The term
Erel(e)(u(A(S(e)))�1 � u(A(F(e)))) measures how well the neighbourhood relation between
the symbols P(A(F(e))) and P(A(S(e))) matches the fleximap rel(e). The final term,
Ein(W(A(F(e))))(u(W(A(S(e))))�1 �u(W(A(F(e))))) is only nonzero if the network is not coherent:
if an edge e has W(A(S(e))) 6= W(A(F(e))) then this term penalises the difference in
embedding between W(A(S(e))) and W(A(F(e))). By the end of the recognition process,
the symbols W(A(S(e))) and W(A(F(e))) will either have been forced together by this term
and merged, or the edge will have been removed.
All these terms are weighted by inclusion functions, i(σ), j(σ), i(n), j(h), i(e), to reflect
the different degrees to which σ, etc., are believed to be present.
LEMMA 1. For any image I, network N , embedding tokens u, u0 for N , and embedding
type v for N ,
Match(I,N , u � u0, v � u0) = Match(I,N , u, v),
provided u0 ÆW Æ A Æ F = u0 ÆW Æ A Æ S. 40
![Page 41: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/41.jpg)
xProof. Let v = (con1, rel1, symm1, tem1, in1); then v � u0 = (con2, rel2, symm1, tem2, in2),
where 8n2N con2(n) = u0(W(n))�1 � con1(n) � u0(n)8e2E rel2(e) = u0(A(S(e)))�1 � rel1(e) � u0(A(F(e)))8σ2Σ tem2(σ) = tem1(σ) Æ u0(σ)8σ2Σ in2(σ) = u0(σ)�1 � in1(σ) � u0(σ)
Now, first, for any σ2Σ,
ρI,tem2(σ)((u � u0)(σ)) = ρI,tem1(σ)Æu0(σ)(u(σ) � u0(σ)) = ρI,tem1(σ)(u(σ))
using lemma 4.1.
Secondly, for any n2N,
Econ2(n)((u � u0)(W(n))�1 � (u � u0)(n))= Eu0(W(n))�1�con1(n)�u0(n)(u0(W(n))�1 � u(W(n))�1 � u(n) � u0(n))= Econ1(n)(u(W(n))�1 � u(n))
by lemma 5.1.
Thirdly, for any e2E, using the abbreviations n1 = A(F(e)) and n2 = A(S(e)),
Erel2(e)((u � u0)(n2)�1 � (u � u0)(n1)) = Eu0(n2)�1�rel1(e)�u0(n1)(u0(n2)�1 � u(n2)�1 � u(n1) � u0(n1))= Erel1(e)(u(n2)�1 � u(n1))
and, using the abbreviations σ1 = W(A(F(e))) and σ2 = W(A(S(e))),
Ein2(σ1)((u � u0)(σ2)�1 � (u � u0)(σ1)) = Eu0(σ1)�1�in1(σ1)�u0(σ1)(u0(σ2)�1 � u(σ2)�1 � u(σ1) � u0(σ1))= Ein1(σ1)(u(σ2)�1 � u(σ1))
since u0(σ1) = u0(σ2).
Thus each term of Match(I,N , u � u0, v � u0) equals the corresponding term of
Match(I,N , u, v).
7.4 The recognition problem
Given a coherent network N0 (representing a grammar), an embedding type v for N0,
and an image I, the recognition problem is to find a coherent and definite network N1
(representing a pattern), with inclusion function i satisfying 8x2Σ1 [N1 [ E1 i(x) = 1,
a homomorphism p:N1 ! N0 (representing a parse of the pattern according to the
grammar), and an embedding token u for N1, that maximises Match(I,N1, u, v Æ p),
subject to the symmetry condition for N1, u, v Æ p.41
![Page 42: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/42.jpg)
x7.5 Symmetries
A symmetry of a network N = (Σ, N, H, E, W, P, A, F, S), with respect to the embedding
type v = (con, rel, symm, tem, in), is an automorphism a:N ! N together with an em
bedding token s for N , such that8σ2Σ s(σ) 2 symm(σ),8n2N s(n) 2 symm(P(n)),
v � s = v Æ a.
LEMMA 2. (Global application of a symmetry.) Given a symmetry a, s of the grammar N0
with respect to the embedding type v, a homomorphism p:N1 ! N0, and an embedding
token u for N1, we can produce a new parse p0 = a Æ p:N1 ! N0 and embedding token
u0 = u � (s Æ p). Provided the grammar is coherent, this gives
Match(I,N1, u0, v Æ p0) = Match(I,N1, u, v Æ p),
and the symmetry condition holds for N1, u0, v Æ p0 iff it holds for N1, u, v Æ p.
Proof. Let N0 = (Σ0, N0, H0, E0, W0, P0, A0, F0, S0) and N1 = (Σ1, N1, H1, E1, W1, P1, A1, F1,
S1). Applying the definition of a symmetry, followed by lemma 6.1 and lemma 1, gives
Match(I,N1, u0, v Æ p0) = Match(I,N1, u � (s Æ p), v Æ a Æ p)= Match(I,N1, u � (s Æ p), (v � s) Æ p)= Match(I,N1, u � (s Æ p), (v Æ p) � (s Æ p))= Match(I,N1, u, v Æ p).
Note that the application of lemma 1 is justified since we have s Æ p Æ W1 Æ A1 Æ F1 =s ÆW0 Æ A0 Æ F0 Æ p = s ÆW0 Æ A0 Æ S0 Æ p = s Æ p ÆW1 Æ A1 Æ S1, since N0 is coherent.
The symmetry condition is checked as follows. Let v = (con, rel, symm, tem, in). For
any n2N1, we have
u0(P1(n))�1 � u0(n) = s(p(P1(n)))�1 � u(P1(n))�1 � u(n) � s(p(n)).
Note that, since a, s is a symmetry of N0, we have s(p(P1(n))) 2 symm(p(P1(n))) and
s(p(n)) 2 symm(P0(p(n))) = symm(p(P1(n))), and symm = symm Æ a. Now, the symmetry
condition for N1, u, v Æ p requires
u(P1(n))�1 � u(n) 2 symm(p(P1(n))),
whereas the symmetry condition for N1, u0, v Æ p0 requires
u0(P1(n))�1 � u0(n) 2 symm(a(p(P1(n)))).
From the above information we see that these two conditions are equivalent.42
![Page 43: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/43.jpg)
x7.6 The derivative of the match function
Given a network N = (Σ, N, H, E, W, P, A, F, S), an image I, an embedding token u for N ,
and an embedding type v = (con, rel, symm, tem, in) for N , we can consider the derivative
of the Match function with respect to the embeddings of the symbols: that is, we can
consider the change in Match(I,N , u, v) produced by a small change in u.
Let us introduce the following abbreviations:8n2N Sn = u(P(n))�1 � u(n)8n2N Cn = u(W(n))�1 � u(n)8e2E Re = u(A(S(e)))�1 � u(A(F(e)))8e2E R0e = u(W(A(S(e))))�1 � u(W(A(F(e))))
This gives
Match(I,N , u, v) =Xσ2Σ
�i(σ)ρI,tem(σ)(u(σ))� j(σ)θ
��Xn2N
i(n)Econ(n)(Cn)�Xh2H
j(h)B(h)�Xe2E
i(e)�Erel(e)(Re) + Ein(W(A(F(e))))(R
0e)�
and the symmetry condition is 8n2N Sn 2 symm(P(n)).
Now, consider the effect of making a small change in the embedding token from u to
u � ∆u, where 8σ2Σ ∆u(σ) = exp(εVσ)8n2N ∆u(n) = exp(εVn)
where ε is a real number in a neighbourhood of 0 and Vσ, Vn 2 A. The movement Vσ
of each symbol σ may be chosen arbitrarily, but, in order to preserve the symmetry
condition, the movement Vn of each node n must then be Vn = Ad(S�1n )(VP(n)), thus
leaving Sn invariant.
THEOREM 3. For ∆u defined as above,
Match(I,N , u � ∆u, v) = Match(I,N , u, v) + εXσ2Σ
Fσ(Vσ) + o(ε)
where, for each σ2Σ, n2N and e2E,
Fσ = F0σ + F1
σ + F2σ + F3
σ 2 FF0
σ = i(σ) ρI,tem(σ) �(u(σ))
F1σ = � X
n2W�1(fσg)
Ad(C�1n )y(Fn) + X
n2P�1(fσg)
Ad(S�1n )y(Fn)
F2σ = X
n2P�1(fσg)
Ad(S�1n )y� � X
e2S�1(A�1(fng))
Ad(R�1e )y(Fe) + X
e2F�1(A�1(fng))
Fe
�F3
σ = � Xe2S�1(A�1(W�1(fσg)))
Ad(R0e�1
)y(F0e) + X
e2F�1(A�1(W�1(fσg)))
F0e
Fn = �i(n) Econ(n) �(Cn)
Fe = �i(e) Erel(e) �(Re)
F0e = �i(e) Ein(W(A(F(e)))) �(R0
e) 43
![Page 44: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/44.jpg)
xProof. The effect of the change u 7! u � ∆u on Cn is
Cn 7! exp(�εVW(n)) � Cn � exp(εVn)= Cn � exp(�εAd(C�1n )(VW(n))) � exp(εVn)= Cn � exp
��εAd(C�1n )(VW(n)) + εVn + o(ε)
�using lemma 3.10. The effect of the change u 7! u � ∆u on Re is
Re 7! exp(�εVA(S(e))) �Re � exp(εVA(F(e)))= Re � exp(�εAd(R�1e )(VA(S(e)))) � exp(εVA(F(e)))= Re � exp(εVe + o(ε))
where Ve = �Ad(R�1e )(VA(S(e)))+VA(F(e)), using lemma 3.10 again. The effect of the change
u 7! u � ∆u on R0e is
R0e 7! exp(�εVW(A(S(e)))) � R0
e � exp(εVW(A(F(e))))= R0e � exp(�εAd(R0
e�1
)(VW(A(S(e))))) � exp(εVW(A(F(e))))= R0e � exp(εV 0
e + o(ε))
where V 0e = �Ad(R0
e�1
)(VW(A(S(e)))) + VW(A(F(e))), by lemma 3.10. We are interested in
calculating the consequent change in the value of the match function. Using lemma 3.11,
Match(I,N , u � ∆u, v) = Match(I,N , u, v) + εXσ2Σ
i(σ) ρI,tem(σ) �(u(σ))(Vσ)� εXn2N
i(n) Econ(n) �(Cn)(�Ad(C�1n )(VW(n)) + Vn)� ε
Xe2E
i(e)�Erel(e) �(Re)(Ve) + Ein(W(A(F(e)))) �(R0
e)(V0e)�+ o(ε). (�)
Using the above definition of Fn, the sum over nodes in equation (�) can be rewritten:�εXn2N
i(n) Econ(n) �(Cn)(�Ad(C�1n )(VW(n)) + Vn)= �ε
Xn2N
Fn(Ad(C�1n )(VW(n))) + ε
Xn2N
Fn(Vn)= �εXn2N
Fn(Ad(C�1n )(VW(n))) + ε
Xn2N
Fn(Ad(S�1n )(VP(n)))= �ε
Xn2N
Ad(C�1n )y(Fn)(VW(n)) + ε
Xn2N
Ad(S�1n )y(Fn)(VP(n))= ε
Xσ2Σ
F1σ (Vσ).
Similarly, using the above definition of Fe, we can rewrite part of the sum over edges in
equation (�) as�εXe2E
i(e) Erel(e) �(Re)(Ve) = εXe2E
Fe(�Ad(R�1e )(VA(S(e))) + VA(F(e)))= �ε
Xe2E
Fe(Ad(R�1e )(VA(S(e)))) + ε
Xe2E
Fe(VA(F(e)))= �εXe2E
Fe(Ad(R�1e )(Ad(S�1
A(S(e)))(VP(A(S(e)))))) + ε
Xe2E
Fe(Ad(S�1A(F(e))
)(VP(A(F(e)))))= �εXe2E
Ad(S�1A(S(e))
)y(Ad(R�1e )y(Fe))(VP(A(S(e)))) + ε
Xe2E
Ad(S�1A(F(e))
)y(Fe)(VP(A(F(e))))= εXσ2Σ
F2σ (Vσ). 44
![Page 45: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/45.jpg)
xUsing the above definition of F0
e, we can rewrite the other part of the sum over edges in
equation (�):�εXe2E
i(e) Ein(W(A(F(e)))) �(R0e)(V
0e) = ε
Xe2E
F0e(�Ad(R0
e�1
)(VW(A(S(e)))) + VW(A(F(e))))= �εXe2E
F0e(Ad(R0
e�1
)(VW(A(S(e))))) + εXe2E
F0e(VW(A(F(e))))= �ε
Xe2E
Ad(R0e�1
)y(F0e)(VW(A(S(e)))) + ε
Xe2E
F0e(VW(A(F(e))))= ε
Xσ2Σ
F3σ (Vσ).
Hence equation (�) finally reduces to
Match(I,N , u � ∆u, v) = Match(I,N , u, v) + εXσ2Σ
Fσ(Vσ) + o(ε)
as required.
Thus Fσ 2 F measures the effect on the value of the Match function, to first order in
ε, of the change in the embedding of σ. Fσ is called the force on σ, and consists of
four components: F0σ , the force exerted by σ's template; F1
σ , the force exerted by the
partwhole fleximaps in which σ participates; F2σ , the force exerted by the neighbourhood
fleximaps in which σ participates; and F3σ , the force that pulls two symbols W(A(F(e)))
and W(A(S(e))) together, whenever any edge e has W(A(F(e))) 6= W(A(S(e))).
7.7 Gradient ascent on the match function
The aim of gradient ascent is to make a small change to the embedding token u of
the pattern so as to increase Match(I,N1, u, v Æ p) as much as possible (while pre
serving the symmetry condition). Let N1 = (Σ1, N1, H1, E1, W1, P1, A1, F1, S1) and v =(con, rel, symm, tem, in). As shown in theorem 3, the effect of changing the embeddings
of the symbol tokens by 8σ2Σ1 u(σ) 7! u(σ) � exp(εVσ)
(with a corresponding change in the embeddings of the nodes) is to produce a change in
Match(I,N1, u, v Æ p) of εP
σ2Σ1Fσ(Vσ), to first order in ε.
The cost of making this change is defined as 12
εP
σ2Σ1mσgσ(Vσ)(Vσ), where mσ is
the mass of σ (see x4.5) and (I, gσ) = in(p(σ)), i.e., gσ is the inertial metric tensor for σprovided by v Æ p. Hence we wish to choose the Vσ's to maximise
εXσ2Σ1
Fσ(Vσ)� 12
εXσ2Σ1
mσgσ(Vσ)(Vσ)= 12
εXσ2Σ1
1mσ
Fσ(g�1σ (Fσ))� 1
2εXσ2Σ1
1mσ
(mσgσ(Vσ)� Fσ)(mσVσ � g�1σ (Fσ))
This is maximised by taking Vσ = 1mσ
g�1σ (Fσ), for each σ.45
![Page 46: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/46.jpg)
x7.8 The grand conclusion
Given a grammar N0, an embedding type v for N0, and an image I, the recognition
problem is to construct a pattern N1, a parse homomorphism p:N1 ! N0, and an
embedding token u for N1. This paper has solved one part of the problem: it has shown
that u can be optimised incrementally by applying the gradient ascent rule8σ2Σ1 u(σ) 7! u(σ) � exp(εVσ)8n2N1 u(n) 7! u(n) � exp(εAd(S�1n )(VP(n))),
where
Vσ = 1mσ
g�1σ (Fσ)
and Fσ is the force on σ, defined in theorem 3.
46
![Page 47: Mathematical Theory of Recursive Symbol Systems](https://reader031.vdocuments.mx/reader031/viewer/2022021820/620e903023bb5e271b70f52a/html5/thumbnails/47.jpg)
References
Barnden, J.A. & Pollack, J.B. (1991) Problems for highlevel connectionism. In
Barnden, J.A. & Pollack, J.B. (eds) HighLevel Connectionist Models. Advances
in Connectionist and Neural Computation Theory, vol. 1. Norwood, New Jersey:
Ablex, pp. 1–16.
Boothby, W.M. (1986) An Introduction to Differentiable Manifolds and Riemannian
Geometry. Orlando, Florida: Academic Press.
Dinsmore, J. (ed.) (1992) The Symbolic and Connectionist Paradigms: Closing the Gap.
Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Fletcher, P. (2001) Connectionist learning of regular graph grammars. Connection
Science, 13, no. 2, 127–188.
Fodor, J.A. & Pylyshyn, Z.W. (1988) Connectionism and cognitive architecture: a critical
analysis. Cognition, 28, nos 1 & 2, 3–71.
Garfield, J.L. (1997) Mentalese not spoken here: computation, cognition and causation.
Philosophical Psychology, 10, no. 4, 413–435.
Hadley, R.F. (1999) Connectionism and novel combinations of skills: implications for
cognitive architecture. Minds and Machines, 9, no. 2, 197–221.
Hausner, M & Schwartz, J.T. (1968) Lie Groups, Lie Algebras. London: Nelson.
Horgan, T. & Tienson, J. (1996) Connectionism and the Philosophy of Psychology.
Cambridge, Massachusetts: MIT Press.
Price, J.F. (1977) Lie Groups and Compact Groups. London Mathematical Society
Lecture Note Series, vol. 25. Cambridge: Cambridge University Press.
Smolensky, P. (1988) On the proper treatment of connectionism. Behavioral and Brain
Sciences, 11, no. 1, 1–23.
Sougné, J. (1998) Connectionism and the problem of multiple instantiation. Trends in
Cognitive Science, 2, no. 5, 183–189.
47