an algorithm for the constructionof graphsof …md899rr3691/md899rr3691.pdf · an algorithmforthe...
TRANSCRIPT
(45) HPP-73-3CS-73-361
AN ALGORITHM FOR THE CONSTRUCTION OFTHE GRAPHS OF ORGANIC MOLECULES
by
HAROLD BROWNLARRY MASINTER
STAN-CS-73-361May 1973
COMPUTER SCIENCE DEPARTMENTSchool of Humanities and Sciences
STANFORD UNIVERSITY
Stanford UniversityLibrariesDept. ofSpecial Collections
CoH Sc SMQ TitleBox IX Series \°\SC____C__sa_.Fol Foi. IX
16793
An algorithm for the construction of the graphs of organic molecules
By Harold Brown and Larry Masinter
ABSTRACT :
A description and a formal proof of an efficient computer implemented
algorithm for the construction of graphs is presented. This algorithm,
which is part of a program for the automated analysis of organic com-
pounds, constructs all of the non-isomorphic, connected multi-graphs
based on a given degree sequence of nodes and which arise from a rel-
atively small "catolog" of certain canonical graphs. For the graphs
of the more common organic molecules, a catolog of most of the canonical
graphs is known, and the algorithm can produce all of the distinct
valence isomers of these organic molecules.
This work was supported in part by ARPA Contract SD-183 and NSF Grant GP
An Algorithm for the Construction of the Graphs of Organic Molecules
Harold Brown and Larry Masinter
T. Introduction. The usual problem in analytical organic chemistry
in to determine the molecular structure of an unknown organic compound.
The heuristic DICNDRAL program for explaining empirical data [1,2] is a
machine implemented program which applies artificial intelligence
techniques to this problem of molecular structure determination. The
primary input to this program is the mass spectrum of the unknown
compound. Secondary inputs, if available, include the nuclear magnetic
resonance spectrum and the elementary formula of the compound. The output
of the program is a list consisting of one or more molecular graphs, each
of which represents a heuristically plausible explanation of the given
input. These graphs are rank ordered with respect to their relative
plausibility scores. Central to this DENDRAL program are algorithms which
generate all of the distinct valence isomers of a given set of atoms. It
is these algorithms that we will describe in this paper.
In graph theoretical terms, the problem that we consider is the
following: Given a degree sequence of nodes which corresponds to the
valences of the atoms of a given organic compound, algorithmically construct
a representative set of the distinct isomorphism classes of the connected,
loop-free multi-graphs based on that degree sequence. (For brevity, such
graphs will be referred to as elf-graphs.) Moreover, for pragmatic
reasons, the algorithm should:
a) Be machine implementable in a reasonably efficient manner
b) Be able to accept as additional input certain constraints
arising from the chemical heuristics of the situation.
For example, generate only those graphs which contain a certain
substructure
2
c) Generate no isomorphic graphs in the intermediate stages of the
algorithm.
This last restriction is necessary because of the time and the storage
limitations when using a machine.
In section II of the paper we define the concept of a superatom, and
we briefly describe the subalgorithms used by the main algorithm. In
section 111, first a conceptual description and then a formal description
of the main algorithm is given. Section IV contains the graph theoretical
results on which the main algorithm is based, and in section V a formal
proof of the completeness and the correctness of the main algorithm is
presented.
11. Terminology and Subalgorithms. The concept of a superatom is used
throughout the paper. A superatom is a collection of atoms bonded
together into a cyclic structure which possesses at least one unassigned
valence, i.e., at least one free valence. Graph theoretically, a
superatom is a connected, loop-free multi-graph with no isthimises, i.e.,
with no edge whose deletion disconnects the graph, and with at least one
unassigned valence. If a superatom A has k free valences, then in forming
molecular structures which include A, A behaves almost like an atom of
valence k. The difference in forming structures including A and in forming
structures including an atom of valence k is the following: The k free
valences on an atom of valence k are, as edge endpoints in a graph,
indistinguishable, i.e. , barring optical isomerism, the free valences on
3
the atom implicitly admit as symmetry group the group S , the full
permutation group on k objects. However, the k free valences on the
superatom A usually are distinguishable from a symmetry viewpoint, and
the free valences on A will, in general, admit only a subgroup of S .X
For example, the superatom in Figure 1 has three free valences, but the1 p _ I??group of symmetries of these free valences is only {( oo)> (t ? "j)}
Thus, associated with each superatom with k free valences is a subgroup
of S , namely, the group of symmetries on its free valences.
By the term tree we will always mean a graph whose nodes can
represent either atoms or superatoms and whose edges are all isthmuses
Also, throughout the paper, a collection of atoms given by their
valences, i.e., a degree sequence, will be represented as an integral
n-vector V = (a ,a ..,a ), where a. is the number of atoms of
valence i in the collection.
least one elf-graph based on V, i.e., a elf-graph with a. nodes of
valence i, then U(V) is precisely the minimal number of edges of any
elf-graph based V which must be deleted in order to make the graph
into a tree of atoms.
The description of the main algorithm makes use of the following
subalgorithms:
a) A tree generator subalgorithm, TG. TG accepts as input
For a degree sequence V = (a ,a ..,a ), the unsaturation of V,n
U(V), is defined as U(V) = 1/2(2 + E (i-2)a.). If there exists at1 x
4
a list of valences corresponding to atoms and superatoms, each
superatom with its associated free valence group, and outputs all
non-isomorphic trees based on these atoms and superatoms. Such a
tree generator algorithm is described in [3].
b) A labeler subalgorithm, LAB. LAB accepts as input an ordered list
of m distinct symbols, a group H of permutations on these symbols,
and a list of m not necessarily distinct labels. It outputs all
non-equivalent, with respect to H, labelings of the given symbols
with the given labels subject to a set, possibly empty, of constraints
of the form that certain types of symbols must be labeled with certain
types of labels. Such a labeling algorithm is described in [4].
c) A "catolog", CAT, containing certain connected, loop-free, isthmus-
free multi-graphs, for brevity, elfif -graphs , underlying, in a sense
to be made explicit later, the graphs of organic molecules. A list
of these underlying graphs is known for most of the common organic
molecules, and consists of a relatively small number of graphs.
111. Main Algorithm. We give now a brief conceptual description of the main
algorithm. The input to the algorithm is a degree sequence, i.e., an integral
n-vector V.
1. Determine all distinct allowable partitions of V into atoms and
superatom sets with assigned free valences. These partitions
are based on the unsaturation of V.
2. For each superatom set, determine all the distinct allowable
a
»
allocations of the free valences to the atoms of the set.
3. For each such free valence allocation, determine recursively the
allowable sets of atoms remaining after the deletion of the
bivalent atoms and the "pruning" of any resulting loops. This
recursion is done until: a) the remaining bivalent atoms in any
clfif-graph based on the set must all be on edges, or b) one of
two special cases is encountered.
4. For each such set of atoms, if condition (a) terminates the recursion
'look up" in the "catolog" all the clfif-graphs based on the non-
bivalent atoms in the set and for each such graph label the edges
with the bivalent atoms. If condition (b) terminates the recursion,directly "write down" the allowable graphs.
5. For each such graph, recursively label the atoms with loops and the
loops and edges with bivalent atoms.
6. For each graph so obtained, label the atoms with the free valences.
7. For each set of atoms and superatoms obtained as above, use the tree
generator to construct all the non-isomorphic elf-graphs based on
these atoms and superatoms.
The following formal description of the main algorithm is presented as
sequence of subalgorithms:
Definitions. Given V = (a1 ,an ,...,a ), define12 n
na) U(V) = (2 + Z(i-2)a.)/2.
1 X
nb) TD(V) = £ia..
1 X
6
_.
B Free Valence Partitioner (FVP).
1. Determine all sets of non-negative integers
(Note that f =v )
2. For each such set, form
W = W(V , FV., {f. }) = (o,w. wn ) whereJ list
ws = £ fh(h-s)' and put W °nYGLFV(V., FV.).s<h<n i.i
C Loop-Bivalent Partitioner (LBP).
(LSPis applied recursively until either m.(W) = m (W) = 0
or m^W) < mQ (W)).
Input : W = (0,w2 ,...,wn ) where N(W) >_ 2, U(W) is a
positive integer, and 2M(W) <_TD(W).
m.(W) = max{ 0, w. + (2M(W) - TD(W))/2 }2. If mQ (W) >zn (W), go to Special Case.
3. If mQ(W) = mx(W) = 0, go to Catolog.
Input: (Vi = (o,v ..,v ), FV.) where N(V.) _^ 2,
2(U(Vi ) - Ui ) = FVi, uL >_1, and
FVi 1 max{ 1, 2M(V.) - TD(V.) }.
{ fftj t I c. = max{o, j t (FVi - TD(Vi ))/2} <t 1 j-2,
j = 2,...,n } satisfying:j-2
i) I f. = v., j = 2,...n.t=c. J J
n j-2ii) I Z tf.^ = FV..
j=2 t=c. 3*
h>c, +s— n
n1. Compute rn^W) = min{ w., Z [(j-2)/2jw. } and
j=4 ]
7
1
nc) N(V) = Z a..
1 1
1. (Test) If U(V) is a non-negative integer and
2M(V) <_TD(V), continue; else STOP.
3. Do 12 for each non-trivial summand Vof V such that
P(V) = 0 and N(V) > 2.
*♦. Do 11 for I_,k <. |_N(V)/2 J5. Do 10 for each distinct k-term ordered partition
6. Do 9 for each distinct set { (V.,u.) | 1 < i <kii - -
where I V. = V and N(V.) > 2, I_Li ± k.1 1 1
k }
7.
8.
Put {V; (Vx , FV X), ..., (Vk , FVk ) }on the list9.
GLSA(V).
10 . Continue .11. Continue.
c) M(V) = max{ 1 < i < n | a. / 0 ).Note that U(V) = (2 + TD(V) - 2N(V))/2.
2. If U(V) = 0, go to TG.
of U(V), say U(V) = tu2 t ... +v. where
li^i^i ... <«,
Compute FV £ = 2(U(Vj.) - v.), 11i 1k.
If FVi < max {l, 2M(Vi) - TD(Vi) } for any
11i 1 k , go to 6 .
12 . Continue .13. If P(V) = 0, put { V; (V, 0) } on the list GLSA(V).
8
__
4. Do 9 for p = m (W),. . . ,m (W)
5. Determine all sets of non-negative integers
(Note that p3Q aw ) .
6 Do 8 for each such set {p._.}
7
8
9
D Special Case (SC).
graph consisting of the ring
with N(W) bivalent nodes and its associated nodethe list
symmetry group onYGLGPH(W) and go to Loop Labeler;
else go to 2
2. If m (W) = s > 0, put the graph consisting of one
node of degree 2(s+l) with s+l loops attached andthe list
its associated edge-loop symmetry group onVGLGPH(W)
and go to Bivalent Labeler.
{ Pj t I 3<_j<_n, 0 <_t <_ ]_(J-2)/2| } satisfying:L(J-2)/2j
i) Z p. = w., 3<_ j <_ n.t=o 3
n L<3"2)/2jii) Z Z tp.. =p.
j=3 t=o Dt
Form V = Y(W, p, {Pj t>) = (0, y2»---»yn ) where
nys = I Pm,(m-,)/2 " (Note that P20 = 0)
*
m-sby 2
the listIf 2M(Y) <_ TD(Y), put V onYGL_B(W,p) .
Continue.
(Here m^W) < m
Q
(W))
1. If m.(W) = 0, put the
9
__
E. Catolog (CAT).
__?_£____" V = (0,y2 ,...,yn ) where m1(Y) = m Q (Y) =0.
1. If y2 1 0, "look-up" all non-isomorphic V trivalentgraphs G with y nodes and their associated edge
symmetry groups, EGRP(G); else go to 6.
2. Do 4 for each such graph G
3. Determine all distinct iDetermine all distinct ordered partitions of y
into TD(G)/2 non-negative integral terms.
■+. For each such partition, say b: + ... + bTD(ff)/2 = y2,
use the general labeling subalgorithm to determineall distinct, with respect to EGRP(C), labelings
of the edges of G with the labels b i» "" " »bTD( Cw 2 »
replace the label b. on the labeled graphs by
bi bivalent nodes, and place the resultant graphsthe list
and their node symmetry groups ori^GLGPHtY)
based on V, and put these graphs and their associatedthe list
node symmetry groups, on GLGPH(Y).
7. Go to Loop-Bivalent Labeler.
F- Loop-Bivalent Labeler (LBL).
(LBL is applied recursively to each graph G on GLGPH(Y)
up the path of vectors from which G arose until reaching
5. Go to Loop-Bivalent Labeler.clfif-- - If y2 =0, "look-up" all non-isomorphic y* graphs
the W = W(Vi , FVi , f jt )on GLFV(Vi> FVi ) heading the path).
10
j
f
i
i:
ii
i\
{
Loop Labeler (LL).
Input: A elf-graph G , its node group NGRP(G) and the
vector V = Y(W,p,{p }) on GLL(W,p) from which
G arose.
2. Determine all distinct, with respect to NGRP(G),
labelings of the nodes of G with q -t's subject to
the constraint that p._.
valence j-2t.
t's must go on nodes of
3. For each such labeling,
Bivalent Labeler (BL).
replace the label t by t loops.
Input : A connected graph H with p loops, the edge-loop
group of Hy ELGRP(tf), and w„, the number of bivalent4.
nodes in the vector W from which H arose.
1. Compute h= p + number of (non-loop) edges in H
2. Do 4 for each ordered partition of w into h
non-negative terms, say w. = b, + ... + b,2 1 h
*""!__.""" _i bh» such that at least p of the b.'s
are non-zero.
3. Determine all distinct, with respect to ELGRP(#),
labelings of the edges ahd loops of H with the
labels {bi | I<_ i <_ h } subject to the constraint
that each loop gets a positive label.
4. For each such labeling, replace the label b. by
bi bivalent nodes, 1 <_ i <_ h, and place thethe listresultant graph and its node group onYGLGPH(W).
n1. Compute n = I p t = 0,...,|.(n-2)/2j.
j=3 :T
11
■!
!
■1
■I
k
Input: A graph G , its node group NGRP(G), and the vector
2. Determine all distinct, with respect to NGRP(G),
labelings of the nodes of G with f t's sübject to
the constraint that fj t~t's must go on nodes of
valence j-t.
3. For each such labeling, replace each label tby t
free valences and put
free valence symmetry
the resultant graph and itsthe list
group onVGLGPH( V . ,FV . ) .""" Input for Tree Generator.
For { V: (V^ FV^input to TG:
the list(V FV J °nVGLSA(V),
a) V- V. (The atoms)
b) A k-tuple of graphs (C^, ..., g^) and theirassociated free valence groups where G. is on the list
IV. Theoretical Results. We now present the graph-theoretical results onwhich the main algorithm is based. Even though some of these results areknown, we give proofs for the sake of clarity and completeness. Throughout
Lemma 1. If there is at least one elf-graph G based on V = (a ..,a ),then U(V) is a non-negative integer.
Proof. A spanning tree of G has N(V)-1 edges, and G has TD(v)/2 edges.
Hence U(V) = TD(V)/2 - (N(V)-l) is a non-negative integer.
G. Free Valence Labeler (FVL).
W = W(V FV ,{f }) from which G arose.n JX
1. Compute f = I f t = 0,...,n + (FV,-TD(V. ))/2.j=2 i i
GLGPH(V., FV.), I<_ i <_ k. (The superatoms).
this section, V = (^.....a^ denotes an integral n-vector
12
i1
18
II
r)
n
_.
Lemma 2. If 2M(V) <: TD(V) and U(V) is a non-negative integer, then thereis at least one elf-graph based on V.
Proof. Let m = M(V) , the maximum valence of V. Given non-negative integers,b 2
, ..., b_, bm iO, let bx = L(b2,...,b_) be the least non-negative integer
such that there is at least one elf-graph based on B=(b ,b . . , b ) . Sincem m 12m
for sufficiently large b , e.g., b = E ib - 2fe b.-l), such a graph exists,2 2L(b
2 ,...,bm ) is well-defined.
Assume b1 = L(b 2 bm) >_ 2, and let Gbe a elf-graph based
on B. If there where two distinct nodes N. and N. of G both of
which had adjacent nodes of valence 1, we could delete these valence
1 nodes and add the edge (N^N.) to G obtaining a elf-graph witha lesser number of nodes of valence 1. Thus all valence 1 nodes ofG must be adjacent to the same node, say N. If there was a nodet*x adjacent to N and a node N 2 ? N adjacent to N , we could delete
the edge (N^N-) and two of the valence 1 nodes and add the edges
(NX ,N) and (N.N-) obtaining a elf-graph with a lesser number of
valence 1 nodes. (See Figure 2). Thus all nodes adjacent to N are
adjacent only to N, and since G is connected, all nodes must be
adjacent to N. (See Figure 3). Hence, N must be the unique node ofm-l
valence m, i.e., b_ = 1, and b= m - Z ib.1 i=2 x
Now assume the lemma is false, i.e., 2m <_TD(V), U(V)
is a non-negative integer and there is no elf-graph based on V = (a , ..., a.Then a± < L(a
2 ,...,am). If L(a2 a ) >2, then a, <n - Z ia.,i=2 x
and TD(V) < 2m, a contradiction. If L(a0,..., a) =0, then a, < 0.-. m x
a contradiction. If L(a2> . . . ,aj_) =1, then ax =0. Since a
13
l
__
elf-graph based on (l,a2, . . . ,am) does exist, by Lemma l,c = 1 + Z
i=2(i-2)a
is even. Hence U(V) = (c+l)/2 is not integral, a contradiction.Theorem 1. There exists at least one elf-graph based on Vif and
only if 2M(V) <_ TD(V) and U(V) is a non-negative integer.
Proof. The sufficiency is just Lemma 2. For the necessity, let
m = M(V). By Lemma 1, U(V) is a non-negative integer. Assume that
and m > Z ia^ This contradicts the loop-free assumption.
Corollary 1. V determines a loop-free tree if and only if U(V) = 0
Moreover, in this case every graph based on V is a loop-free tree.
Theorem 2. There is a clfif-graph based on Vif and only if U(V)
is a positive integer and 2M(V) <_TD(V).
Proof. If there is a clfif-graph based on V, then by Theorem 1
and Corollary 1, U(V) is a positive integer and 2M(V) «.TD(V).
Conversely, by Theorem 1 there exist elf-graphs based on V.
Assume that every such graph has at least one isthmus. Let Gbe
a graph based on V with a minimal number of isthmuses, i.e., a
"least criminal." We will show that G can be modified to a elf-graph
based on V with a lesser number of isthmuses, a contradiction to
our assumption.
Let the edge (A,B) be an isthmus of G, and let X and J be the
connected components of G\(A,B). Since U(V) >0, not both X and V
are trees. Say Xis not a tree and Ais in X. Let Cbe an elementary
circuit in X, and let C be a node on Cof minimal path length from
A, where we consider here a multiple edge as a circuit. (Note that
2m > TD(V). Then m> a + 2a. + . . . + (m-l)a Hence a = 1m-l x z m-l m
14
i
i\
II
__
we may have A = C). By the choice of C, there is a path
from Ato C which is edge-disjoint from C. Let Dbe a node on Cadjacent to C. (See Figure 4).
Case 1. Vis a tree
Since Gis loop-free, there must be a valence 1 node N in Y. Say
N is attached to the node Eof Y. Then, delete (CD). *aa in
r\
Then, delete (C,D), add (D,E),and move N from Eto C. (See Figure 5).
Case 2. Vis not a tree.
Let E be a elementary circuit in V, E a node of E of minimal pathlength from B and F a node of E adjacent to E. Then, delete (C,D)and (E,F), and add (D,F) and (C,E). (See Figure 6).
In both cases, the resulting graph is a elf-graph based onV and has at least one less isthmus than G. Hence our assumptionwas false, and there is at least one clfif-graph based on V.I_3mma 3. Let V and FV., i= l k, be as in Superatom PartitionerThen Z FV = 2(k - U(V-V)).
J_H_Hf 4- In a loop-free graph based on V, there are TD(V)/2 - aedges between non-univalent nodes.
1
integer, and 2M(V) < TD(V). Let = max (0, (a. + (2M(V)-TD(V))/2)>and m^V) = min{ j^ 2)n^ } . Then> <except in precisely the following two cases:
a) mx(V) = 0 and m (V) = 2
b ) m1(V) =s > 0 and m (V) =s + lMoreover, case (a) holds if and if =0> . , __, ft)holds if and <__- if a2(stl) = l_d = j , s , 5 , „„„.
i=l x
IhSS^ "" Ut V ■ «""_ ■_>" «V) >_ 2, U(v) a positive
15
'
!l
i:
\
L
Then a^ = 0, j > 3. By Theorem 1, a elf-graph based on V exists.Hence a. is even. From 0 < mQ(V) = (2M(V) - 3a3 )/2, it follows
that a3 = 0 and mQ(V) = 2. The converse is immediate. Assume that
m 1(V) = s > 0. Then, a. t (2M(V) - TD(V))/2 > s > 0, and
Thus, 2s + 3 >_ M(V) > 2s. This implies that a. = 0, 3 <_ i < M(V).
Since a elf-graph based on V exists, M(V) must be even. ThusM(V) = 2(s +1). A direct computation gives that mn(V) =s+ 1.
The converse follows directly from 2M(V) < TD(V)
Lemma 5 « Let V, mQ(V), and m^V) be as in Theorem 3. Assume that
mQ (V) = mx(V) = 0 and a 2 tO. Then a 3 i 0 and is even and a. = 0
for i > 3.
i > 3. Now m (V) = 0 implies that 2M(V) - 3a <_ 0. Hence a * 0v 3 3Since a elf-graph based on V exists, a must be even.
V. Proof of Main Algorithm.
A. Every elf-graph based on V occurs. Let Gbe a elf-graph based
on V. We will show that the algorithm produces a graph isomorphicto G.
Since G exists,.by Theorem 1, U(V) is a non-negative integerand 2M(V) £TD(V). If u(V) = 0, the tree generator produces a
graph isomorphic to G. Assume U(V) >0. Let Bbc the set of
isthmuses in G t and let G^ ..., G^ be those connected components
Proof. Let m Q(V) > m^V). If m^V) = a
2,
then 2M(V) - TD(V) >0,n
a contradiction. Hence m (V) = _ [_(i-2)/2ja. . Assume m.(v) =0.i=4 x x
M(V) >2s + 3a3 +...+ (M(V)-l)aM(v)_
1 1 2s. Hence aM(v) =1.
M(V)Also, s = m^V) = z i.(i-2)/2jai implies that 2s >_ M(V) - 3
nProof. a 2 i 0 implies that m (V) = Z |_(i-2)/2ja. = 0- Thus a. = 0,
i=4 1 x
16
L
of G\B having at least two nodes. Since U(V) > 0, G _,s not a tree
and kil. Also, no G^ has univalent nodes. Let V. = (0, al , ..., a 1)," -" n_ i "where a. is the number of valence j nodes of C, and let FV. be thei i
toff,
inff, lii£k. Set
and V depend only on the isomorphism
number of elements of B connected
class of g. Now P(V) = 0 and N(V) >. 2. Thus V is an allowable
summand of of Yin SAP. Also, N(Vi > i2,li i i k, and 1ik 1 [n(V)/2Set Uj_ =. (2U(Vi ) - FVi )/2. Let be the graph obtained from C
by replacing in G each of the FV. connected components of G/Bconnected to G^ by a univalent node. Then & is a elf-graph whichis not a tree. Let Vi be the n-vector associated with £ . Then
UHVi ) = \xLi and is a positive integer. Moreover, by Lemma 3,
and, after suitable reindexing, { V; (v. , FV, ),..., (V, , FV ) }the list 11 k' k 'is on.GLSA(V).
For a fixed i, I<_i<_ k , let f be the number of valence
Since G^ is a elf-graph and has isthmuses only to univalent nodes,
fj t = 0 for t > j-2. Also, by Lemma 4, "_^ has (TD(V.) - FV.)/2
edges between non-uivalent nodes. Hence, any node of valence j
in GL has at least Cj = max{ 0, j + (FV£ - TD(Vi >)/2 } univalentnodes- attached, i.e., f . = 0 for t < c . . The set
** J
i f jt I2- —n' c i 1. t li" 2 1 satisfies the conditions of FVP.J the listHence, w(Vi> FV £ , {fj>) is onYGLFVtV^ FV.) and depends only on
the isomorphism class of G. .
_ kV = Z V . Note that k, V., FV.
i=l x ' i' l
kZ UjL = U(V). Since is a elf-graph, by Theorem 1,
2M("v\) = 2m(V.) <_TD(V.) = TDCV^ + FV.. Thus FV. 12M(V.) - TD(V.),
2 nodes of G^ with t univalent nodes attached, 2f,j <n, 0 < t <j.
17
Let O be the graph obtained from G. by deleting the FV.
univalent nodes and their connecting edges. Then
with C|, and G^ is a clfif-graph. Assume that m
Q
(W) <_ m (W) where
mQ and m.^ are as in LP. By Theorem 3, G. is not one of the special
case graphs, and hence deleting the bivalent nodes of G. leavesi
a connected, isthmus-free graph _". with at least two nodes and say
t^ loops. Since G. is isthmus-free, each node of ~G. must have at
least two non-loop edges attached. Hencen
xi l min{ w_, E [_(i-2)/2jwi } = rn^W). Let G*! be the graph
obtained from G. by deleting the t. loops. G. is a clfif-graph.
Hence, by Theorem 1, 2M(W) - 4t.
and w 2 + (2M(W) - TD(W))/2 < t..
number of valence j nodes of ~G.J i
Thus ti iim0(W). Let p. be the
with t loops , 4 <_ j <_ n ,0l x 1 L_J-2)/2j . Then {p.} satisfies the conditions of LP and
V is precisely the n-vector Y(W, t^ <P- t >). Since G' is a elf-graph,, v tlje list 1
2M(Y) __TD(Y) and V is onY(3LLB(W, ti ). Moreover, V depends only on
the isomorphism class of G.
Recursive application of the above process yields a sequence
Yu = W(V FV <fft}j t }) ' Yl = Y< w »' V <P jt > >. *2 ' "-.. Y^ , wheres.i
Y. is on GLL(Y. ,P- ,) for some allowable p. and either■* J J " J
«_"
(i) mQ (Y^ ) = mx(Y^) =0, or (ii) mQ (Y^) > m <V* ). Moreover, V*i i i i J
depends only on the isomorphism class of G. . . Assume (i) holds1>j-l
W = W(Vi> FVi , {fj t > ) = (0, w2 ,..., wn) is the n-vector associated
Let Ybe the n-vector associated with G". . Then M(W) <_ M(Y) + 2t..
Hence, by Theorem 1, 2M(W) - 4t± ± 2M(Y) <_ TD(Y) = TD(W) - 2w - 2t.,
of graphs GiQ = C., G^ = G^, Gi2 , ..., G , and associated n-vectorsi
18
_._
for Y= V = (0,y_ ..,y ). Now by Lemma 5, V has only bivalenti
and trivalent nodes or no bivalent nodes. In the first case deleting
the bivalent nodes of G. leaves a proper elfif, trivalent graph.... iIn the second case G. is a graph with no bivalent nodes. In either
*. ISi *case, a graph G. isomorphic toff. is produced by CAT. If (ii)
XS
"
XS
"1 * 1holds, by Theorem 3, ff. is either the ring with y„ bivalent nodes
i>o "
mC
1or ff. with its bivalent nodes deleted is a single node of valence
is . b
1*-2m.(Y) with m (V) loops attached, and a graph ff. isomorphic to
mj \J
XS
"... 1 ... ,ff. is produced by BL. Hence we have that a graph ff. isomorphic
XS " xs "1... . 1to G . and the vector V are produced in the algorithm as input
i ... ito LLBL. Since C. , induces in the obvious manner a loop and/oris.-l r
1 sV i Vcbivalent node labeling of ff. arising from V , ff. is isomorphic
XS "
S
" "* J. XS " ** X1 1 *" 1 ito one of the graphs produced by LLBL with input (ff . , V ) .
XS " S " "* X1 1Recursively applying LLBL up the sequence {V.}, we obtain a graph
G\ isomorphic
toff,
with associated n-vector W. = W(V., FV. , {f._.} )i i i i* i" l ]t
in the output, i= 1, ..., k. With the input (ff ! , W.), FVL produces
a graph C| isomorphic toff.. Hence, (V-V, ffj, ..., q') is among
the inputs to TG arising from V, and with this input TG produces
a graph isomorphic to ff .B . The algorithm produces no redundances. Let ff and Hbe two graphs
produced by the algorithm with input V. We will show that ff and H
are not isomorphic, i.e., ff ¥H.Assume G = H . TG is irredundant. Hence if ff
and H arise from the input to TG, (V-V, ff , ..., ff ) and
(V-V, H , , ..., H,,) y respectively, where the ff. and H. are superatoms
with attached free valences, we must have that V = V, k = k' , and
19
the H can be so indexed such that H. ~ff., 1 < j < k. Hence, H.
and G^ both must have the same free valence FV. , and ff and H must
arise from the same SAP of V, say {V; (_"-_, FV^, ..., (Tk , FV )}.Now since the labeler is irredundant, so is FVL. Hence from H =ff
i i >
it follows that the graphs 5\ and #.. defined as in (A) must be iso-
morphic. Also, ff\ and H^ must arise from the same FVP of (T., FV.),
it follows that the graphs G i and H. defined as in (A) must be
isomorphic. Similarly, since LL is irredundant, the graphs ff?and Hi must be isomorphic. Moreover, ff'. and H. must arise from the
above argument recur sively to ff. and H. y we have that the CAT
"look-up" (or SC determination) graphs, say ff! and H\, respectively,
must be isomorphic and that the sequence of loop partitions leading
from YL to the partitions determining ff| and H\ must be identical.Now ff! =B\ if and only if ff! and H\ are equal. Thus ff' =H »
i ii^ i i'
Hence we have that if ff = H t they arise from the identicalsequence of vector inputs to the various labelers and TG, and that
G'L = H[, 1 1 i 1 k. Since for a given input, the labelers produce
only non-isomorphic graphs, we must have, successively, that
Gl = HL' = Fi» Gi = Hi> and C.^-.liiik. Hence if ff ~H %
they both arise from the same input to TG. But for a given input,TG produces only non-isomorphic graphs, a contradiction.
say W. = W(Tif FV £ , {f jt>), l<i<k. Since BL is irredundant,
same LP of W., say Y. = Y(W., p., {pjt>), I£i<. k. Applying the
1 < i < k.
20
_fc_
C. The portitioners produce only allowable partitions of V. We
will now show that every sequence
the algorithm yields at least one
of partitions of V produced by
elf-graph based on V.the list
Thus U(Y) is a positive integer. By construction, 2M(Y) <_TD(Y).
Since m (W) >_ m
Q
(W), it follows from Theorem 3 that N(Y) >_2.
Hence by Theorem 2, there is a clfif-graph based on Y. In the
case m (W) < m (W), Theorem 3 assures us that a clfif-graph based
on W exists. Thus LP produces only allowable output.
based on W, and FVP produces only allowable output.the list
Let {V; (V , FV^ , ..., (Vk_ FVk )} be a member ofrGLSA(V),
say Vi = (0, a
2,
.... a*), 11i £ k. Let H£ = (FV^ a
2,
.... a*)
Also, P(H £) = FVi 11, and TD(h\) = TD(Vi > + FVi >. 2M(V.) = 2M(H.).
Thus, by Theorem 2, there is a clfif-graph based on H. , 1 <_ i <_ k.
Hence, at least one setjG ..., G^c-f superatoms based on the
given partition of V exists. If V is the n-vector corresponding
to V - V and the free valences of the ff. , then a direct computation
Let V = Y(W, p, {p. >) be a member ofY'GLL(W, p). By directn
computation, U(Y) = U(W) - p. Also, p £ Z (i-2)w./2 < U(W).i=4 . X
the listLet W = W(V., FV., {f. }) be a member ofVGLVF(V., FV.). By
direct computation, U(W) = U(V.) - FV./2 = v. , a positive integer
Also, TD(W) = TD(V.) - FV.. Now M(W) = max{ j-t | 3£ j £n,
c. it £ j-2, f . *0 }, and j-t £ (TD(V.) - FV. )/2. Hence,J J XX
2M(W) £ TD(V.) - FV. = TD(W). By Theorem 2 there is a clfif-graph
Now 2U(Hi ) = 2U(V.) - FV. = 2u. , and U(H.) is a positive integer
21
i
Ii.
f
|Iii
i
using Lemma 3 gives that U(V) = 0. By Corollary 1, a loop-free
tree based on V - V and the Ci exists. Thus the sequence of partions
of V does yield at least one elf-graph based on V.
VI. Acknowledgements. The DENDRAL concept and its applications toorganic chemistry were originally conceived by Professor Joshua
Lederberg. The authors thank Professor Lederberg for his guidence
and his critical suggestions which have made this work possible.
Computer Science Department
Stanford University
Stanford, California 94305
22
1. E. A. Feigenbaum, B. G. Buchanan and J. Lederberg, On generality
and problem solving: A case study using the DENDRAL program, in:Machine Intelligence 6 (EdinbargWUniversity Press , Edinberg^ 1971)
165-190.
2. B.G. Buchanan, G. L. Sutherland and E. A. Feigenbaum, A program
for generating explanatory hypotheses in organic chemistry, in:Machine Intelligence 4 (EdinborgJ-iUniversity Press, Edinbttrgfc1969) 209-254.
3. J. LederbOrg, et. al., A tree generation algorithm, To appear.
4. H. Brown, L. Hjelmeland and L. Masinter, Constructive graph
labeling using double cosets, To appear, Discrete Math.
REFERENCES
FOOTNOTES.
1. More formally, if Xis an m-element set and His a group ofpermutations on X, then a labeling of X with n labels a ,
ni+n2+ " " ,+nk = m '
1n2 labels a
2,
"" " , nR labels ak ,yls a mapping ifi: X+{a ,a ,such that |*" (a^l =nr Two such mappings and * areequivalent with respect to H if there is an n . H such that
,ak }
<Ji 2n
2
1
Figure 1
-NYf
i. i
Figure 2.
Figure 6.
1