structure generation on the basis of bct ......structure analysis are in£erence of constituent...

39
lS E-T R-81-21 ダ燕. A-fillX: /二こAx STRUCTURE GENERATION ON THE BA REPRESENTATION OF CHEMICAL by Takashi Nakayama and Yuzuru Fujiwara“ July 22,1981

Upload: others

Post on 26-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

lS E-T R-81-21

ダ燕.唇     婆

    A-fillX:   /二こAx

STRUCTURE GENERATION ON THE BASIS OF BCT

 REPRESENTATION OF CHEMICAL STRUCTURES

by

Takashi Nakayama

     and

Yuzuru Fujiwara“

July 22,1981

Page 2: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

Structure Generation on the Basis of BCT

 Representation of Chemical Structuxes

                      abstract

     A method of structure generation based on BCT

(bユock ・一 cu’ヒpoint tree ) represen’ヒation of chemical

structure has been developed. The generation pro-

gram is a part of the automatic structure analysis

system of mass spectra( ASASMAS ), and is used

when a set of the inierred substructures are given

as input data. The input substructures are repre-

sented by means of BCT.

Page 3: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

                   1NTRODUCTZON

     The major parts of the process of chemical

structure analysis are in£erence of constituent

substructureS and structure generation by cornbin-

ing those infe:rred subst:ructures。  A me’ヒhod of

structure generation £rom substructures already

inferred is described in this paper. Various

schemes for stxuctuye generation have been devis一一

   1-10        each of these methods.is based onNa rnethodedr

of representation of chemical s’ヒruc’ヒures, and it

may be said that ’ヒhe meヒhod of representa’ヒion of

chemical structures determines the method oE

struc’ヒure g’enera’ヒio:n. AIl of these me・ヒhods i:n-

cluding the present one pursue the govexning prin-

ciple of reducing the number of combinationsi. :, n

this paperr the chemical structure is represented

in terms・f BCT(b1。ck-cu七P。int tree),U which

clarifies the hierarchy in chemical structures                                                 3wholely so that the idea of the connectivity stack

and the supedr-atom5 are included naturally in our

method presented. Combinatorial problems that

occur in the process of s●ヒruc’ture generation are

thus partitioned into 一qubproblems and’are classifi-

ed into strages.

一 1 一

Page 4: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

     Struc・ヒure generation is regarded as a problem

in combi:natorial analysis. The:re must be neither

duplications nor.omissions in the genera’ヒed struc一

                                          コturesr and the method of st:ructur母 gene:rat::Lo・n

should be e:Eficien’t∫ processing ti:me shou:Ld be

short in prac’ヒice.  The prqcessing ’ヒime depends

::Largely on the method of representation of chemi-

cal structures.  The constituent unit of ’ヒhe BCT

representation of ohemical struc’ヒures is a block

( abiconnected component of a g:raph )r and this

makes it possible to reduce ’ヒhe nu皿ber of combina一

                                              サtions greatly by omitting atom-by-atoln processユ:ng.

Another feature of the rne’ヒhod of s・ヒructure genera・一

tion described here is the availabi].ity of a graph

database. The various graphs which appear in ’ヒhe

process of s七ruc・ヒure generation are no’ヒ generated

for each case, but ins’ヒead are.:re●ヒrieved from the

g:raph da剛ヒabase.

一 2 一一

Page 5: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

       REPRESENTAT1ON OF CHEb41CAL STRUCTURES

     Chemical structures are represented by means

of graphs, and atoms and bonds of’a chemical

structure correspond to vertices and edges of a

graph respectively. However, it is inefficient to

process chemical struc’ヒures at the atomic leveユ,

and sometimes a superatom/ring assembZy is used

as the processing unit. Xn ASASMAS r the concept

of ’ヒhe superat:om is ex’ヒend,ed ’ヒ。 rep:resent the

chemicaZ structures in a way that is consistent

with BCT.・ The following is a brief description of

                                          12BCT rep:resentation of chemical s・ヒructures.

     Let  u∈V  be a ver七ex of a connected graph

G=( V,E ). A vertex u is a ”cutpoint” if the

rernoval of u yields the disconnectivi’ヒy of the

graph G. A ”block” of a graph G is a maxirnai sub一一

graph of G which contains no cutpoints. Nowr

bc(G) ( biock-cutpoint graph of G ) is defined:

T=( W,F ) is a bc(G) if (1) W=AuB is a set of

vertices where A=’ o ai, ... ,an } is a set of

all cutpoints of G, and B =’ o bl, e” ,bm } iS a

sev of all biocks of G, and (2) F i{ flr ’“’ ,fz }

i{( ai,bj .)1 ai E b」r ai E Ar bj E B }, where

ai Ebゴmeans that a cutp・int ai is a member。f a

一 3 一

Page 6: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

set of vertices of block bj e A bc(G) has the

following properties: 〈i) it is a bipartite graph

of subsets A and B, (2) it is a tree regardless of

the original structurer 〈3) terminal vertices

correspond to biocks of G, and (4) the distance

between any pair of 『ヒe:rminal vert二ces is an even

number. A bc(G) is called a block一一cutpoint tree

( BCT ) because of property (2). A tree is a BCT

if and onlY if it possesses p:roper’ヒy (4), and this

property is used for generating BCT. Examples of

the BCT representa’ヒion of chernical s’ヒ:rucut:re・s are

shown in Figu:re l.  The in’ヒernal structure of each

block is fUed in the block dictionary.

一 4 一

Page 7: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

                  GRAPH DATABASE

     The graph generation p:rocedu:re which wouユd

usually be performed in the process of struc’ヒure

genera’ヒion is :replaced by the ret:rieval of object

graphs from the graph database. As it is impossi-

ble to prepare aU of the graphs, a graph which

canno’ヒ be found in ’ヒhe graph database is generated

when it is first required and this generated graph

is also added ’ヒ。 the database. There are five ma-

jor fUes in ASAS!YLAS: (i) a compound file, (2) a

table’ 盾?@correspondence between substructures and

mass spectral components ( CSSC ), (3) a bloek

file, (4) a BCT fiie, and (5) a spectrai data file.

Acompound file and a CSSC con’ヒain’ヒhe chemical

st:ructures or substruc’ヒures in the form of BCT

representation r and are Used for inferring sub-

s七ruc’ヒures in ’ヒhe process of s’ヒruc’ヒure analysis.

The i:n七ernal structures of block in tha’ヒ BCT rep-

resentation are stored in a block fiie. A BCT

fUe is a set of BCTs which are organized accord-

ing to parameters ( mrn )r where rn is the number

of blocks and n is the number of cutpoints of a

BCT r and is used when needed du:ring ’ヒhe process of

structure generation. 1Vhe records of a BCT fUe

一 5 一

Page 8: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

are trees which possess the BCT’s property (4)

described previous sec.tion and furthermore satisfy

・ヒhe condition;deg u≦4, where u is a cutpoint of

a BCT t because the degree of u cannot exceed the

valence of u which is considered to be an atorp

such as carbQn, oxygen, or nit:rogen・  The ou’ヒユine

of ’ヒhe system ASASMAS (especially the rela’ヒion

between programs and these five data files ) is

shown in Figure 2.

一 6 一

Page 9: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

Dv!ETHOD OF STRUCTIL]RE

9ty!12i1ne一..gS.s1i11Lg1iig!z2一一gguergS!9一.utline of structure t

GENERAT 1 ON

         Prograrn

GENERAIr工ON is initia’ヒed when a se’ヒ of subs’ヒruc一・

tures bl, 一e e. tbm is given as input data, where

blr ’” , bm are simple blocks that have been

i:nferred by program ANALYS工S. GENERAT工ON genera’ヒes

all the possibZe eombinations of blocks bl, e’e r

bm with neither onission nor duplication. The

outline of the procedure.is: .(i) all the BC[Vs of

mblocks and n cutpoin’ヒs a:re gene:radヒed, whe:re

lsn一くm-17 (2) blocks blr ... rbm are assigned

to block vertices of a generated BCT ( labeling );

(3) cutpoints are assigned to vertices in each

block of ’ヒhe labeled BCT: and (4) obゴect s’ヒ:ruc一一

’ヒure『 are obtained by connec’ヒing blocks according

to the cutpoint assignment.

     Generation of ( rn,n )BCT.. A BC7’ of m blocks

and n cutpoints i.s denoted by ( mrn’ )BCT. The

first. step of・ structure generation which generates

( m,n )BCT is for the most part accoMplished by

retrieving them from the BCT f“e. This reduces

the pxocessing time of structure generation.. When.

m blocks are given, the number of cutpoints n var-

ies from l ’ヒ。 m-1, i.e., 1≦n≦m’一1。 The case of

一一 7 一一

Page 10: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

n=l  corresponds to a ’1 star te whose center is a

cutp。int’and n=m闘l is the case that the der

gree of every cutpoint is equal to two・.

     The procedu:re of (m,n )BCT generation is to

collect・ヒhe trees that ha▽e m block vertices and

n cutpoint vertices.  Let :N-tree denote the tree

 コ                                   ロ                                                                                                                       ロ

wユth N vert■ces, where N=・m+n. An N ・一’ヒree ユs

genera’ヒed from an (N-1)一tree by adding one ▽ertex.

N。w’let T={tl’t2’…’tZ}beaset。fall

(N-1)一trees’where ti represents an(N-1)一tree’

and le七 T. be a set. of trees which are generated         =L

fr・m ti・T by adding。ne vertex’T’ = Tl u [D2 u

・∴uTZ is a set。f N-trees・T#eref。re the

generati。P pr。cedure。f N-trees is divided int。

tw。 parts・the generati。n。f Ti(i=1’…rZ)

and the genera’ヒion of the union T 1..

(1)Generati。n。f Ti・Let ti =(V’E)’ti・T・

:Fi:rst a set of▽ertices V is divided in七〇〇rbits

of per:muta’ヒion g:roup on V.  The element of this

permutati・n gr。up is aut。m。rphism。f ti’that is

to sayt the perrnutation on a vertex se’ヒ▽which

preserves the adjacency relationship among ver一

 コt:Lces:

一一 8 一

Page 11: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

V=VI U’V2 U’@’” U Vr, V垂獅u早=早@for any p¢q.

The vertices in a subset▽j(ゴ=・’∴・rr)are

topologically equivalent. Secondv an’ m-tree is.

generated from ti by adding one vertex to the

arbitrary ve「tex in Vゴ(」 :1’… C’「)・The「e-

fore the number of generated N一一trees is the number

of oxbits, iee., iTi l=re

(2) Generation Qf T’. T’=Tl u T2 U “’“ U 1Vz

gives七he t。tal N騨t「ees’but Ti∩Tゴφd。se n。t

always hold。  The duplications ( i.e., ’ヒhe e1-

ements of T. n T. ) should be eliminated. This            ユ  コ

problem is so1▽ed by introducing linear order in’ヒo

the set Of N-t:rees; an :N-tree is coded in.ヒ。 numer一一

ical sequence by Edmonds’ rnethod,13 and is stored

in the appropriate vertex of binary tree which

should represent the set of N一一trees.14 Duplica-

tion is detected when a generated N-tree is found

in七he ve:rt:ex in which i七 should be s’ヒ。:red.  The

set of N-trees is generated in ’ヒhis :manne:r, but it

consists of BCTs and non一一BCTs. An N-tree is a BCT

if and only if it satisfies the BCT’s property (4)

described in sec’ヒion 2. An :N-BCT is a bipar’ヒite

graph w:hose vertex se’ヒ consists of block and

一 9 一

Page 12: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

cutpoint二ver’ヒices, and are classified according to

the parameter ( mfn )r where m and n are the num-

bers of block ver’ヒices and cu’ヒpoint ve:rtices re-

spectively. The BCTs which are stored in the BCCr

file are restricted to those in which the degrees

of all cutpoint vertices do not exceed four r

because the ceiling of the free vaience may be set

to four for most organic compounds. These are

calユed ▽aユid BCTs. Table 工 shows the number of

BCTs in each class corresponding to ( mrn ). The

valid ( 6,n )BCTs ( n=2, e.. ,5 ) are shown in

:Figure 3. The to.ヒal number of valid (.6,n )BCTs

is 20.

     Labeling vertices of ( mrn )BCT. A BCT is

a bipa:tite graph whose vertex set consists of

b10Ck and CUqヒpOin’ヒ Ve:r’ヒiCeS, and the セe:rminal

ver’ヒices are all block ve:rtices. Therefore, it is

easy to distinguish block vertices from cutpoint

vertices. The second step of the structure genp.

eration is to assign the elemehts bl, ’” ,bm

of the given block Set to the block vertices of the

( mrn )BCT. This procedure is called labeling.

     Now let Cl, ’” rCk denote the colors of

blocks, and let n, be the number of blocks whose                 ユ

一 io 一一

Page 13: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

coior is Ci ( ’堰≠P,... , k ), where it is assumed

thaV nl S n2 S’”・Snk without ioss of generality.

Labeユ.ing is in block number orde:r; the first is C                                                lr

the second C               and so on.            2r

(1) if nl=n2=’”=nk-i=i, Ci (i=lr...r-

k-1 ) is assigned to ’ヒhe :represen’ヒative block

verteX of a class partitioned on the basis of.

’ヒopochromatic equivale:ncy. .If block ▽ertices of

an ( mrn )BCT are partitioned into Z classes

gl,’” r gz, there are Z ways of assignments of ci.

When’ bi is assigned to the representative of a

class, b梶@r the foUowing condition should be sat-

isfied between ’ヒhe degree of ・ヒhe representa・ヒive in

the ( m,n )BCT ( deg bj(Ci) ) and the number of

the constituent vertices of the block Ci ( I Ci l・ ):

deg bj(ci) s ICi l (A)

If there are no classes which satisfy the condi一一

tion (A), it means that the assigrment of

Clr’”,Ci一.1 Which resulted in the assignment of

the Ci should be forbidden. After the assignment

Of Ci,’”rCk’D.P is finishedr Ck is assigned to

’ヒhe remaining block vertices、under condition (A).

一 il 一

Page 14: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

This labeling Procedu:re can be :represented in the

f。rm。f a tree’and an assignrnent・f C1’●●●’Ck

corresponds ’ヒ。 a path in ヒhe tree which connects

’ヒhe roo’ヒ with a 七er:minal ve:rtex.  Therefo:re the

number of labeling methods is equivalent to the

number of terminal ▽er’ヒices of dep’ヒh (m一ユ ) i:n

・ヒhe ・ヒree, and there a:re no duplica’ヒes.  This セree

is calユed a genera’ヒion ’ヒ:ree・

(2) When  N.≧2  for some  ゴ<k, the ve:r‘ヒices           ⊃

                     j-1            j ・一l

whose dep七h’s fe.m p三、np t。(nゴ’+pl、np)

f。rゴ≧2’。rfr。mOt・(nゴ1)f。「ゴ= lra「e

the same coユor in the genera’ヒion tree, so if ’ヒhe

procedure described abo▽e is applied, there will

be duplica’ヒion.  To avoid the duplication r color

Cゴis assigned nj times t。 bl。ck ve「tices:Fi「st’

the block ver’ヒices to which colors are no’ヒ ye’ヒ

assigned are pa:rtitioned according to topochromat-

ic equi▽a:Lency. This partition is not limited to

・ヒhe simple elassifica・ヒion based on orbits, but is

further subdivided。 Those classes are organi2:ed

hierarchically.  The subdi▽ision is pe:rfor:med

                                   ロ       ロaccording to ’ヒhe deg』ree of consangu■n■ty a:mong

vertices.  If the dis’ヒance between vertices u and

一 ユ2 一

Page 15: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

v of the same orbit in a rooted ’ヒree is k, ・ヒhe

degree of consanguinity between u and v is k.

This is denoted by san( u,v )=k. The relation

introduced into vertices of the sa.me orbit as the

degree of consanguinity within k ( san( u,v )sk )

is an equivalence relation ( denoted by ”:” ).

The following apply if u, v a:nd w are ver’ヒices

which be工ong to ’ヒhe same orbit:  :L. a reflexive

law: u 1 u ( san( u,u )=O ). 2. a syrnmetric law:

if u E vr then v i u. 3. a transitive law: if

u≡▽ and v…wr’ヒhen u…w. (The transit=i▽e law

is shown as follows: assuming that u:wr and ×

is the closest co㎜on ancestor of u and w,七here

should exis七 two paths which connec’ヒvwi●ヒh y or z

(xx) which are on the path between u and w. Then

’ヒhere.exis’ヒs a cycle which contain verヒices x, y

and z, and’ヒhis contradicts that x, y and z a:re

’ヒhe ver.ヒices、of a tree. )  If partitions of

san(u,v)skl and of san(urv)一くk2 are ob一一

tained f。r kl and k2 when klくk2’the f。rmer must

be a subdi▽ision of the la’ヒter.  The:refo:re 七he

hierarchical partitioning structure is obtained

based on the relation of t!ne degree of consanguin-

ity ( see Figure 4 ).

一 13 一

Page 16: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

     This par・ヒi七ioning s’ヒructure can be :represent-

ed in ’ヒhe fo:rm of a tree (called a part:itioni:ng

tree, as in Figure 4 ).  Then ’ヒhe :Labeling problem

at this step is equivalent to enumerating all the

ways。f assignment。f the same things(c。1。「Cゴ)

・ヒ。 ’ヒhe rooms (vertices in a leve1 ) stepwise

in order of dep’ヒh.  The capacity of each.roo:m

( i.e., the rnaximum number of colors to be adlni’ピー

’ヒed ) is de’ヒermined・for every depth.  That is to

say r the capacity of a room is equivalent to the

suTin of the nurnbers o£ members of classes corres-

ponding ‘ヒ。 ’ヒhe descendant lea▽es of the ver’ヒex

( a roorn ).

     Afte「c。1。「Cゴis assigned t。 the leaves。f

a partitioning tree, specific block vertices to

which color C」 is to be assigned are seiected

arbitrarily from leaf classes by the .number of

assigned Inembers.  In the genera’ヒion t:ree t label-

inq paths a「e g「。wn by c。nnec七ing Nゴmembe「s se.

lected seriaily with the paxent. This probZem is

equivalent to the problem of partition of an in-

teger under some restriction.

     The example of iabeling one of the ( 6 r 4 )BCT

is as follows: A ( 6,4 )BCT and block colors Cl,

一 14 一

Page 17: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

C2 and C3 which sh。uld be assigned. t。 it are sh・wn

in F4gure 5(a)・  Block vertices of the ( 6,4 )BCT

a「enumbe「edfr。rP l t。6・As nゴ1’n2’=2

and n3=3(i・e・’n1<n2<n3)’Cl’C2 and C3 are

assigned in this order. And

     #n。de(C・)== 3’#node(C2)=6’#n。de(93)=2

and

     degl=deg2=deg4 t deg5=deg6 = 1;deg3=4

so c。lo「s Cl and C3 cann。一t be assigned t。 vertex 3

(condi’ヒion (A) does no’ヒhold ).

(1)Assignment。f Cl・Bユ。ck vertices are parti-

ti。ned int。 three・。rbitS’gl i{3}’92 =’{1’2}

and 93={4・5’6}’but gl is rejected by・c・ndi-

ti。n(A)・τheref。re there、 are tw。・kinds・f assign-

ment。f Cl ・’92 and g3・It is・arbitrary t。 select

           コ                               ロ

representative vertユces from each class, and the

selecting vertex l fr・m 92 and vertex 4 fr。m g3

c。「「esDP。nds to the r。。ts。f generati・n trees Tl

and T2 sh。wn in Figure 5(b)・

(2)Assignment。f C2・In case。f generati。n tree

Tlr the remaining vertices are partiti。ned int。

three・rbits 91 =’ o2}’92 ={3}and

                      -15 一

Page 18: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

g3={ 4,5,6 }・ Since further subdivision based on

the degree of consanguinity dose not exist for

these classesr n2 (= two) of the color C2 are

assigned to gl, g2 and g3. There are four sets of

assignments, { glrg2 },{ gl,g3 }t{ g2,g3 } and

{ g3 }. The number of members assigned to each

class ( each element of a se’ヒ calculated above )

are { 1,1 } r { 1,1 } ,{ 1,1 } and { 2 }.

     And finallyr selection of specific members

from each class is arbitrary, so we get { 2 r 3 } ,

{ 2,4 },{ 3,4 } and { 4,5 } as examples.

     Procedures for generation tree T2 are simi1ar

and straightforward.

(3) Assignrnent of C3. This is the final color to

be assigned, and the only w.ay is to assign C3 to

ail the rernaining vertices. However r it should be

noted that some paths are blocked by condition (A)

upon assigning C3; color Cl and C3 cannot be as-

sig:ned ’ヒ。 ver’ヒex 3, and ’ヒhis is shown by X in

Figure 5(d).  Final generation ’ヒ:rees are ob七ained

as in Figure 5(d) when C3’s are assigned.

     Specific labelings corresponding to paths

connecting a root with leaves of depth 5 are shown

in Figure 5(e)

一 16 一

Page 19: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

     一 工tis not clearhow those labeled biock vertices are connected at

the atomic level at this stage. The detailed

structures whSch specify the connectivity among

blocks at ’ヒhe a’ヒomic level are generated by alloc-

ating cu・ヒpoints in each block of ’ヒhe labeled BCT.

:For examp二Le, in the ( 6,4 )BCT in Figure 5(a),

cutpoints al from biock 1 and 2, cutpoints al, a2,

a3 and a4 frorn block 3, cutpoint a2 frorn b1ock 4,

cutp。int a3 fr。m bユ。ck 5’and cutp。int a4 fr。m

block 6 are selected and the connectivity among

blocks at the atomic level is determined.

     The procedure of selecting cutpoints

al, ’” ,ap from a block bi is ’≠刀@follows: let

j= 1.

(1) The constituent vertices of bi are partitioned

into orbits: gi, ’” , gq’

(2) An arbitrary vertex of gk (k=1, e e. , q ) is

selected as a candida七e fo「aゴ・The「e a「e q kinds

of selections of aj for a given selection of

alr e1’@la梶|le

(3) 1f j=pr then end, Otherwise, go to (4).

(4) The selected vertex is regarded as a verヒex

which is given a new coior a」; go to (i) with

                       一 17 一

Page 20: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

ゴ←」+1・

     This procedure is als・o implemented i:n the

fo:rm of genera’ヒion tree described in the p:revious

Xab61ing procedu:re fo:【’BCT。

     The example of cutpoint assignment for a ben-

zene ring is shown in Figure 6。

                             コ                                          ロ

     Connecting blocks accord:Lng to cutpo:Ln’ヒ as.

                                     コ                           ロsignment.   Using the me’ヒhod of cutpoint ass■gn一

:ment for each block ve:r’ヒex of a labeied BCT,

structures are genera・ヒed by connecting blocks with

othe:r blocks in turn, fusing the cutpoints which

should be ’ヒhe same atom in ’ヒhe gene:rated s’ヒruc・・一・

tures.

     工f there are no symme’ヒries in a labeled BCT

      ヨ

( i.e., if ’ヒhere are no block vertices which are

equivalent topochromatically ), al]_the cOmbina-

tions of cutpoin・ヒ assignmen・ヒs for blOck vertices

gi▽e differen’ヒ assemb二Led structures, i.e., there

are no duplicates among them. The number of

                                         ロ

。。mbinati。ns・f cutp。int assignment is ll Zi’                                        i=1

where there are Z. kinds of cutpoint assignment                 ユ.

in bユ。ck bi・H・weVer’if there is a sy㎜etry in

a labeled BCT, ’ヒhere are duplicates in

一 18 一

Page 21: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

  ロ

  IT Zi structures・T・a▽・id these duplicates’the i=1

 p:roblem of connecting the bユocks is conside:red in

 the same way as the proble:m of coloring blook

 vertices.  This is equivalent・to selecting a kind

 of cutpoin’ヒ assignment for each block vertex,

 where the kinds.of colors are equivalent to the

 kinds of cutpoin’ヒ assignme:nt inherent ●ヒ。 the

 blocks.  The:refore, ’t二his problern is solved sirn-

 ilarly ’ヒ。 the problem of BCT labeling. The struc-

 tures gene:rated by connec’ヒing blocks of Figure 5(e)ノ

 are shown in :Figure 7. Six structures shown in

 the lowes’ヒ part of tt j gure 7 correspond 七〇 the

 first and last three assignments for six-ring (C2)

 shown above theln. There is only one labeled BCT

 which could be cormect:ed at ’ヒhe a’ヒomic level.

 The other th:ree BCTs are excluded by the condition

thatl 狽??@degree。f a cutp。int must n。t be greater

 than four.

      Partitioning Ve:rtices in’ヒ。 o:rbits。  When

block ▽er七:ices of ( m,n )BCT’are labeled, or cu・ヒー

points are selec’ヒed from cons’ヒituent vertices of

a block, it is necessary to partition these ver-

tices into orbits according to topochromatic

一 19 一

Page 22: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

equivalency. Vertices u and v of a labeled

graph ,, G are equivalent toporochromatieally if

f(u)=v fdr some au・ヒomorphism f of G, where an

au・tomorphism of a graph  G  is a permu’ヒa’ヒion on a

set of vertices of  G  which prese:rves the adゴa-

cency among vertices. A set of aZl automorphisms

of G forms a group. The orbits of this group

are the classes partitioned according to topo-

chromatic equivalency.

     Therefore, if all the autornorphisrns are

found, it is easy to par’ヒi’ヒion vertices.  In prac-

tice, it is easy to find all the automorph±srns for

a g:raph ’ヒha’ヒ is not highly symmetrical by using

backtrack search. The procedure of finding auto-

morphisms is as follows:

     :Le・ヒ V={1,2』, …  ,n} be a se・ヒof ver・ヒices

of G. The procedure checks whether the adjacency

relationship among vertices is preserved for a

permutation7

           f=〈1’i i’2 111 2’n>

Let f(k)=( il, ’” , ik ) denot.e the left por一

・ヒion of a permu・ヒatiOn  f  correspoDding ’ヒ。  k

一 20 一

Page 23: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

     ロ                       ロ

vert■ces, =L。e.,

                      k+1  …    n

           f=  f(:k)                       コ                                                

                      ■k÷1 ’●● ㌔

Assulning that  f(k)  is given,  f(k+1)  is de『ヒer-

mined as f。ユ1。ws・(1)ik+1・V 一’{il’…’ik}・

(2)B。th(k+1)and ik+1 bel。ng t。 the same

orbit.  (.3) The、 adjacency relationship formed by

▽er’ヒices ’{ 1, …   ,k,k+1 }  shOuld be formed by

{i、’∵・’ik’ik+、}’whete it is guaranteed that

the adゴacency rela・ヒidn白hip formed by ’{ 1, …   ,

k}is f。rmed bジ{il’…’ik}’s・it。nly has

’ヒ。 be guaran’ヒeed that the adゴacency be’ヒween、 i                                              :k+l

and{il’…’ik}is the same as the。neわe-

tween ( k+1 ) and ’{ 1, 。・・ ,k }. Au・ヒomorphisms

are obtained when the procedure reaches  f(n).

If the:re are no ▽er ti ce$ that satisfy the condi一

・ヒions (1), (2) and (3), then ’ヒhere are no auto-

morphisms which contain  f(k)  as a part of a

permutation(see Figure 8).

     It is not required necessariユy to find alユ

’ヒhe au’ヒomorphisms fo:r partitioning vertices into

orbits, because verdヒices can be classified into

some classes ( these a:re integ:rated ・ヒ。 some orbits )

一 21 一

Page 24: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

when an autpmorphism is found, and the autornpr-

phism that should be found next is conditioned

’ヒhat it has to con’ヒain a’ヒ least one cycle that

maps a vertex ( member ) in some ciass to another

cZass’s member. Tf there is no automorphism that

satisfies ’ヒhe condition, ’ヒhe :resulted classes give

the orbits.  The program ’ヒha’ヒ is i:mplemented ac-

co:rding to this algori’ヒhm is also prepared, and is

useful for getting orbits for highly symmetrical

graphs.

     partitioning vertices o£ a tree into. p.rb-i.一t.一s.!

The pr.ograms that give partitioning. of vertices

are prepared for both highly and not-so-highly.

sy皿metrical graphs by finding au’ヒomorphisms de-

fined on a set of vertices. On the other handr

thd particular partitioning progedure that dose

not use automorphisms explici’ヒely is prepared for

trees.

     This parti’ヒioninq is accomplished by using

Edmonds’ canonical form of a tree. Assuming that

there is one ”center” in a given tree. Then the

onユy vertex ’ヒhat is topologically equiva].ent to

the center is itself. Regarding this tree as a

rooted tree whose root is ’ヒhe center vertex r

一 22 一

Page 25: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

vertices u and v are equivalent if and only if

they satisfy the following three condi’ヒions:

(i) Vertices u and v are the same distance frorn

・ヒhe center.

(2) The parents of u and v are the sarRe vertex r or

an equivalent one.

(3) The subtrees whose roots are u or v are iso一一

morphic, where a subtree is defined as a subgraph

七hat consis’しs of a ver’ヒex ( u or・▽ ) and its all

the descendants in the given rooted tree. Since

these conditions are checked inmediately in terms

of Ed:monds, canonical form of a t:ree, par’ヒitioning

vertices is given by investigating vertices for

e▽ery dis’ヒance f:rom the :root to leaves of a maxi-

rltal distance. The classes partitioned in this

fashion agree wi’ヒh ’ヒhe orbits of .ヒhe automorphism

group described previously‘

     When there are two centers in a given tree,

it can be coped with by inserting a dummy vertex

between ’the two vertices, converting the case of

one’ モ?獅狽?秩@( see Figure 9 ).

一 23 ・一

Page 26: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

                    CONCLUSION

     Ame’ヒhod to genera’ヒe chemical s’ヒ:ructu:res is

p:resen’ヒedr and it is based on BCT represen’ヒa’ヒニLon

of chemical s・ヒructu:resr on labeling nodes of bi・一#

partite graphs ( BCT ) corresponding to blocks and

on allocating cutpoints in the blocks.

     The programs were im,plemented in FORTRAN on a

ACOS一一900 cornputer, and the spectraZ data were

edi’ヒed from EPA/N工H Mass Spec’ヒral Database ( 1975

edition, abou’ヒ l1300 records )。  BCT file is or-

ganized according to parame’ヒe:r (m,n ) as shown in

TabZe 1. The execution speed of structure genera-

tion depends on the times of orbi’ヒ compu七a’ヒion in

the generation s’ヒages such as BCT labelingr cu’ヒ・一

point assignmeht and then connecting blocks.

Apparen’ヒly o:rbits should be computed fo:ビ each

node of a generation ’ヒree except lea▽es, so ’ヒhe

numbe:r of orbi’ヒ computa’ヒion equals 七〇 ’ヒhe 孕umber

of nodes ( not leaves ) of generation trees.

The execution time of orbi’ヒ computa七ion depends on

the nuruber of vertices and symmetric degree o£ the

graph, so we prepared three types of programs for

orbit computation. This rnethod of structure gen-

eration is useful for autornatic structure elucida一一

一 24 一

Page 27: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

’ヒユon, and consti’ヒutes a part of ASASbtEAS。

一 25 一

Page 28: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(1 O)

(u)

              REFERENCES

NelsonrD.B.; MunkrM.E.; GashrK.B.;

Hexald,D.L.Jr. 」.Org.Chem. 1969, 34, 3800.

AntoszrF.J.; NelsontD.B.; Herald.D.L.Jr.;

Munk,M.E. J.A:n.Chem.Soc. i970r 92, 4933.

ShellyrC.A.7 MunkrT“1.E. Anal.Chim.Acta.

1978, 103r 245・

Shelly,C.A.7 Munk,M.E. 」。Chem.工nf。Comput.

Sci. 1979, 19, 247.

Kudo,Y.; Sasaki,S. J.Chem.Doc. 1974, 14r

200.

KudorY.; Sasaki,S. 」.Chern.Znf.Comput.Sci.

1976r 16r 43・

Masinter,L.M.; Sridharan,N.S.; Lederberg,」.;

SMithrDeH・ J・AM・CheM・SOC・ 1974r 96, 7702・

rylasinter,L.M.; Sridharan,N.S.7 CarhartrR.E.;

Smith,D.H. J.Arn.Chern.Soc. 1974, 96r 7714.

Carhart,R.E.; SmithrD.H.; Brown rH.;

DjerassirC. 」.Am.Chern.Soc. 1975, 97, 5755.

Carhart,R.E.; SmithrD.H.; Brown,H.;

Sridharan,N.S. J.Chem.rnf.Cornput.Sci.

:L975, ユ.5r l24.

Nakayama r T.7 Fuj iwara,Y. 」.Chem.1nf.Comput.

SCi’. ’1980r 20r 23・

一一 26 一

t

Page 29: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

(1 2)

(Z 3)

(1 4)

Harary r F・ ”Graph Theory”; Addison-Wesley:

Reading, Massachusetts, i969.

Busacker,R.G.; Saaty,T.L. tv Fini’ヒe G:raphs and

:Netwo:rks: An 工nt:roduction wi・ヒh ApPlications ev,

McGraw-Hill, 1965; Chapter 6.

Aho,A.V.7 Hopcroft r J.E.; Ullman r J.D. ”The

Design and Analysis of Computer趾gorit㎞”,

Addison-Wesiey: Reading, b4assachusetts,

1974e

一 27 一

Page 30: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

        F工GURE LEGENDS

BCT representation of chemical s’ヒruc-

tures。

Block diagra:m of the au’ヒomatic struc-

ture analysis system of mass spectra

(ASASMAS )7 GENERAMON is the program

                          set for s’ヒ:ructu:re generatユon・

‘▽alid (m,:n)BCTs for m=6.

Par’ヒitioning of ▽ertices of a t:ree

                                コ      ロincluding『一ヒhe degree of consangu■n:L’ヒy.

(a)As:kele●ヒon of a(6,4)BCT and

    blocks.

(b)Assignment。f Cl t・(6’4)BCT and

    roots of corresponding generation

    trees Tl and T2・

 (c) Grow七h of generation ’ヒrees ’ヒhrough

    assignment of C2・

(d)C。mpユeti。n。f generati。n tree Tl

    a:ndT2・

(e).F・ur kinds。f labeling c。rresp。nd鱒

     ing 七〇 four pa’ヒhs of genera’ヒion

     ’ヒrees.

 (a) Parti・ヒioning of vertices・

(b)Assignment。f Cl and f。U。wing ’

            一 28 一

Page 31: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

 ロ:Fユqure

 ロ:F:Lqure

7.

8.

Figure 9.

Tabユe 工.

    par七i・ti on,in9.

(c)Assigrment・f C2 and f・ll。wing

        の      ロ             ロ    partユtユon■ng・

(d)Assignment。f C3・

Connecting blocks at the atomic le▽el e’

Examp:Le of ▽er’ヒex par’ヒitioning.  There

are 32 au’ヒomo:rphisms on the set of

verdヒices of ’ヒhis graph.

Example of tree ve:rtex par’ヒitioning.

The:re are 9! au’tomorphisms on the se’ヒ

of ▽er七ices of this tree.  In prac’ヒice,

it is irnpossible and unnecessary to get

all of them for ’ヒhe pu:rpose of par’ヒi一

 コ           ロ

            のtユon:Lng vert■ces.

N umber of BCTs classified according to

parameter(ln,:n).

一29 一

Page 32: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

一・・一j〉

一一”堰

oe一.p一一〇一一e一一〇一“H)

Figure i

GENERATION

ANALYSIS

structurat data

compound data

B C  T

btock dictionary

しEARNING

c s s c

spectral data

   Figure 2

Page 33: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

(m,n)=(6,2)

(m,n)=(6,3)

(m,n)=(6,4)

(m,n)=(6,5) o{ro-eK>・e>一〇一e-o一)>o

Figure 3

1 2 4 5 8   10111 3141 161718

sah《。,.)≦4   sah(…)≦2   san(u,v) S 6

          一一一一一一一一一一一一 san(v,v)〈一6

                        {1,2.’r’ .18 }

                 一一一一一 san(u.v)〈一4

                        〈1 .’”・6}・ {7・ ’” s12}’ e {13ih・18}

                  一一一一 san(u.v) (一2

                        {i.2.3}.{4.5.6},…,{16.i7,18}

tree representation ot partitioning structure

                     Figure 4

Page 34: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

1

2

    4

 3fila2 E

al Y a3   a4

    6

cl CuP

c2く!)P

C3 O一一一〇

ni = 1

n2= 2

n3= 3

Figure 5(a)

1

2

C13

T1

4

6

51

2

3

e) 4

T2

6

5

(i) ”””’Cl一・“一一一@

Figure 5(b)

4

5 3

T1

1

2

””M“””””@Cl e一一””””..”“”

    3) 一一… C2 一一一一一一一 Cl

‘c,) (4) 一一一一 C2 一一一 (2) (3

T2

4

5

3

5

5

6

Figure 5(c)

Page 35: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

Tl T2

1 一一e一一一一一一一一一一一一一 Cl 一“一e一一ee一一e-ee一一一一 4

4 2 3 “ee-eC2 一一一一一一一一 1 3 5

5 3 4 4 魯●●●りC2 ’・一. 2 3 5 5 6

x4

x2 e一一“e

c3 一一・””K.

2x

1x

5 5 一一ee一C3 ・一一一一一・一 5 2

6 6 一一e一一C3 ”’‘一・一”.. 6 6

Figure 5(d)

C3

Cl Cl

C2 C3 c C3

C3

C3

Cl Cl

C2

C2 c

C3 C3

C3

Figure 5(e)

Page 36: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

   1

1〔つ;

   1

   Cl

l(e)1

   3

Figure 6(a) Figure 6(b)

   Cl

l〔ゴ

    3

   Cl

l(慧   Cl

l(g)1

   C2

Figure 6(c)

   Cl

     C2C3

   Cl

C3

C2

  Cl

ce3 2

  Cl

  Cl

  C3

  Cl

“,2

  C3

          Cl  Cl

  Cl

  C2

  Cl

Q,3

  C2

Figure 6(d)

Page 37: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

1

2

3

4

一一 v

cC’o・・一一一一・

…αく

一一bh一

口〇一

C3

c

Cl

C3

c

c

 2. .41<=>

 3

163 3. .21く=S・

 4i 一2

1《隠

 3. .4

iO2

 2i 21b“ 3’ 3 2A 2i〈 :>3

    4   4

 3. L2       3

く}1、

く:S・ i

    3  4

,b2 ie    4   4

Q, ,

,b,

督    3

〈=〉・

 2. 一4 2

    3

之8“ 4. .2

103 1

 4  2

1タ>1

・4 一3・

1<=>21

1( >34

 2. i3

04く季・

    2

“3 4

く⇒・

 2

1擬

づ・

   2

1費1つ・

Figure 7

4

4

3

3

3

3

2

2

1 5

6

6

7

7

7

7

8

8

Figure 8

Page 38: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

a112

6

133 2

11

5G

3

2

2

7Q

1

1

4

3

2

Y8

3

 103

al 9

nodecode

 ロ             ヨ

 12567389104111213

.13 411141114111  」一■一一圏一」L-r■一一」一

   SI  S2  S3

S1竃S2翼S3 罵〉‘2翼3耀4

sim鷲arly

5嵐6■7口8■9臓10ロ11●12●13

Figure 9

n霜 ユ 2 3 4 5 6 7

2 ユ

3 1 ユ

4 ユ 1 2

5 1 2 3 3

6 ユ 2 6 7 6

7 ユ 3 9 17 ユ8 ユユ

8 1 3 ユ3 30 51 44 23

9 ユ 4 ユ7 53 109 ユ48

ユ0 1、 4 23 79 2ユ3

11 1 5 28 ユユ9

ユ2 1 5 35

ユ3 1 6

ユ4 1

Tabie 1. Niumber of BCTs classified according

to paraneter ( m t n ).

Page 39: STRUCTURE GENERATION ON THE BASIS OF BCT ......structure analysis are in£erence of constituent substructureS and structure generation by cornbin- ing those infe:rred subst:ructures。

INSTITUTE OF INFORMATION SCIENCES AND ELECTRONICS             UNIVERSITY OF TSUKUBA

 SAKURA-nURA, 麗II}一{ARI・一GUN, 18ARA}く1 305     JAPAN

REP(⊃R丁D(X:二Ufl ENTAT I ON PAG EREPOR[if NU蟹BER

 工SE-TR-81-21

TZTLE   Structure Generation on  Chemical Structures

the Basis of BCT Representation of

AUTHOR (s)

   Yuzuru Fuコ■wara

Takashi Nakayama

[工nstitute of 工nformation SciencestUniversity of Tsukuba]

[Central Research Laboratory,Kuraray Co. Ltd.]

REPQRユl DA〔でE

   July 22, 1981

MA工N CATEGORY

   3.7 lnformation Retrieval

NTUtYIBER eF

       37

PAGES

CR CATEGORIES

   3.79

KEY WORD S

   Graph database/graph generation/BCT graph/ASASMAS/learning system

ABS[rRACT

     A method of structure generation based on BCT (block cutpoint tree)

   representation of chemical strucVure .has been developed. The generation

   program is a part of the automatic structure analysis system of mass

   spectra (ASASMAS), and is used when a set of the inferred substructures

   are given as imput data. The input substructures are represented by

   means of BCT.

SUPPLEMENTARY NOtll]ES