Computer-aided molecular design using Tabu search

Download Computer-aided molecular design using Tabu search

Post on 26-Jun-2016

222 views

Category:

Documents

7 download

TRANSCRIPT

  • Computers and Chemical Engineering 29 (2005) 337347

    Computer-aided molecular designa,1 b dab,

    gy, 550b De 530 W 1

    ctober

    Abstract

    A detailed ded mopresented in hods oproblems wh argets.based on co vel opwhich woul ach. Inprocess guar t, resulcase studies TS is ausing a much smaller amount of computation time. 2004 Elsevier Ltd. All rights reserved.

    Keywords: Tabu search; Computer-aided molecular design; Computation time; Transition metal catalysts

    1. Introdu

    Computtential to gvelop newand time-cslate of candesired procused expecable to sucand detergeputationalnew cancer

    The CAward and a

    CorrespoE-mail a

    david.c.miller1 Present ad

    nical Univers

    0098-1354/$doi:10.1016/jction

    er-aided molecular design (CAMD) has the po-reatly decrease the time and effort required to de-molecular entities by reducing the need for costlyonsuming trial-and-error experiments. Instead, adidate molecules which are predicted to have theperties can be used as a starting point for more fo-rimental synthesis. This general method is appli-h varied products as catalysts, polymers, solvents,nts. Hairston (1998) recently reported that a com-

    algorithm has been successfully used to design a-fighting pharmaceutical.MD methodology consists of solving both a for-backward problem (Venkatasubramanian, Chan,

    nding author. Tel.: +1 812 877 8506; fax: +1 812 877 8992.ddresses: bal@kt.dtu.dk (B. Lin),@rose-hulman.edu (D.C. Miller).dress: CAPEC, Department of Chemical Engineering, Tech-

    ity of Denmark, Lyngby, DK-2800, Denmark.

    & Caruthers, 1994). The forward problem predicts propertiesbased on molecule structure; the backward step identifies astructure to obtain a molecule with a given set of target proper-ties. Property prediction is usually based on either group con-tribution methods or topological indices. The group contribu-tion approach has been most widely reported (Gani, Nielsen,& Fredenslund, 1991; Harper & Gani, 2000; Harper, Hostrup,& Gani, 2003; Sahinidis & Tawarmalani, 2000; Sahinidis,Tawarmalani, & Yu, 2003; Vaidyanathan & El-Halwagi,1996; Venkatasubramanian et al., 1994). Constaninou andGani (1994) and Constantinou, Gani, and OConnell (1995)described a two-level group contribution method which uti-lizes molecular structure information to estimate the phys-ical and thermodynamic properties of pure components. Athree-level group contribution method proposed by Marreroand Gani (2001) exhibits improved accuracy and applicabil-ity to deal with bio-chemically and environmentally-relatedcompounds. Because most group contribution methods can-not adequately account for steric effects (Wang & Milne,1994), several researchers have begun using topological

    see front matter 2004 Elsevier Ltd. All rights reserved..compchemeng.2004.10.008B. Lin , S. Chavali , K. Camara Department of Chemical Engineering, Rose-Hulman Institute of Technolopartment of Chemical and Petroleum Engineering, University of Kansas, 1

    Received 11 December 2003; received in revised form 25 O

    implementation of the Tabu search (TS) algorithm for computer-aithis paper. Previous CAMD research has applied deterministic metich arise from the search for a molecule satisfying a set of property t

    nnectivity indices, which allows the TS algorithm to use several nod have no effect with a traditional group contribution-based approantees that molecular valency and connectivity constraints are meusing TS are compared with a deterministic approach and show thatusing Tabu searchD.C. Millera,

    0 Wabash Avenue, Terre Haute, IN 47803, USA5th, 4132 Learned Hall, Lawrence, KS 66045, USA

    2004; accepted 26 October 2004

    lecular design (CAMD) of transition metal catalysts isr genetic algorithms to the solution of the optimizationIn this work, properties are estimated using correlationserators to generate neighbors, such as swap and move,

    addition, the formulation of the neighbor generationting in a complete molecular structure. Results on twoble to provide a list of good candidate molecules while

  • 338 B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347

    Nomenclature

    fi,j,k

    xx*

    zi

    ElecElec(i)

    LC50

    NMo

    NTotNBond

    Nmax,i

    OxidOxid(i)PmP scalem

    Ptargetm

    R

    Greek le00v

    11v

    22v

    v

    indices in ation (Cama& Maranas2000). Whproperty inrelating thest. The in

    timization problem, whether it is solved explicitly and de-terministically as in Maranas (1996), stochastically via a ge-netic algorithm-based approach as in Venkatasubramanian

    . (199of Frhanovarper

    any rprobla partitioned binary adjacency matrix showingwhen basic groups i and j are bonded with akth-multiplicity bondbest new neighbor solutionthe best solution ever foundan existence vector showing wither the ith

    et althoseMolcand H

    Mwardgroup exists in the moleculesum of the electronegativities of all the groupselectronegativity of group i in electronvolts(eV)lethal concentration killing 50% of the test pop-ulation, a measure of toxicitythe number of rows allotted to all the molyb-denum groupstotal number of basic groupstotal number of types of bonds of the basicgroupsmaximum number of basic groups allowed inmoleculesum of the oxidation states of all the Mo groupsthe oxidation state of the of the ith groupthe estimated value of property ma scale factor used to weight the importance ofone property relative to anotherthe target value for property mset of all targeted properties

    tterszero-order molecular connectivity indiceszero-order molecular valence connectivity in-dicesfirst-order molecular connectivity indicesfirst-order molecular valence connectivity in-dicessecond-order molecular connectivity indicessecond-order molecular valence connectivityindicessimple atomic connectivity indices that refer tothe number of bonds which can be formed withother groupsatomic valence connectivity indices that de-scribe the electronic structure of each basicgroup

    n effort to obtain more accurate property predic-rda & Maranas, 1999; Kier & Hall, 1976; Raman, 1998; Siddhaye, Camarda, Topp, & Southard,ichever approach is chosen, existing structure-formation is regressed to form an empirical modele molecule structure to the properties of inter-verse problem is essentially a mixed integer op-

    binatorialhave beenStephanopoa graph recular structuVaidyanathval analysipolymers aformulated(MILP/MINnon-convex

    by expressinary variabbinary varianectivity inconvex MIdesign probant design pimation (Aand-boundand Sinhacleaning aTawarmalabranch-andof Freon. Sdevelopedmethods. Vnetic algorestimated vKokossis (proach fortion solvenglobal optialgorithms

    Tabu seabinatorialsearch procout becomiother stochof previousguide the ssince they ptions. The asolutions isalgorithmsnormally dtermination4), or via a generate and test approach such asiedler, Fan, Katotai, and Dallos (1998); Gordeeva,a, and Zefirov (1990); Harper and Gani (2000)et al. (2003).

    esearchers have reported solutions to the back-em to determine the molecular structure. Com-and heuristic-based enumeration approachesreported (Gani & Brignole, 1983; Joback &ulos, 1989). Kier, Lowell, and Frazer (1993) used

    onstruction approach to determine feasible molec-res with bounded physical property values, whilean and El-Halwagi (1996) described an inter-s approach for the computer-aided synthesis ofnd blends. CAMD problems have recently beenas mixed-integer linear/non-linear programmingLP) problems. Maranas (1996) transformed theMINLP formulation into a tractable MILP model

    ng integer variables as a linear combination of bi-les and replacing the products of continuous andbles with linear inequality constraints. Using con-dices, Camarda and Maranas (1999) described aNLP representation for solving several polymerlems. Churi and Achenie (1996) solved a refriger-roblem with an augmented penalty-outer approx-P/OA) algorithm. A reduced dimension branch-algorithm was presented by Ostrovsky, Achenie,(2003) to design optimal solvents used as

    gents in the printing industry. Sahinidis andni (2000), Sahinidis et al. (2003) reported a-reduce algorithm for identifying a replacementtochastic optimization approaches have also beenas alternate strategies for rigorous deterministicenkatasubramanian et al. (1994) employed a ge-ithm for polymer design in which properties areia group contribution methods. Marcoulaki and

    1998) described a simulated annealing (SA) ap-the design of refrigerants and liquidliquid extrac-ts. Wang and Achenie (2002) presented a hybridmization approach that combines the OA and SAfor several solvent design problems.rch (TS) is a heuristic approach for solving com-

    optimization problems by using a guided, localedure to explore the entire solution space with-ng easily trapped in local optima. It differs fromastic optimization techniques by maintaining lists

    solutions (usually termed memory) that helpearch process. These lists are useful for CAMDrovide a direct method to track near-optimal solu-bility of TS to efficiently find a set of near-optimalparticularly useful since all property prediction

    have limited accuracy and problem formulationso not include all relevant properties. Thus, the de-of the global optimum is not as critical as finding

  • B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347 339

    a set of near-optimal solutions. Deterministic approaches cangenerate such a list by multiple MINLP solves and applica-tion of integer cuts, but for large problems this becomes pro-hibitively cproaches, agenerate surange of potentially usteria (suchof the candclass of mominimal imthemselvesvironment.using homoare effectivmany casethe environwhich showhighly desiTS to desig

    2. Propert

    In orderphysical prubility, or rwith a reasotional efforcan be prednew catalyWe employues that decharacterizture with pnect the mproviding awhole.

    MolecuRandic (19between coganic comFurthermormum of coconnectivitKier et al. (of connectwhile Ramtivity indicMaranas (1scriptors tovalues. Sidto predict thmaceutical

    In earlimethods w

    Table 1Atomic connectivity values of groups in CHCl2F

    , as i), and(1994cont

    ionalts whedicesructurindic

    cluded) and

    tion weffec) hav

    ve mo

    a mo

    , a comry proructurhalfamole

    a metonnec

    inedson,& Jeo

    omputose as that

    efinedin a gn atobasicasic g

    1. Ththat r

    otherindicgroupor tran, thepartic

    uter shell. Once the atomic connectivity indices of basics are defined, the molecular connectivity indices can beuted for the entire molecule.olecule structure is expressed with a hydrogen-ressed graph (as shown in Fig. 1). The zero-order molec-onnectivity indices 0 and 0v are the sum of each basicomputationally expensive. Other stochastic ap-s well as generate and test strategies, can alsoch a near-optimal candidate list. By identifying atential target molecules, TS avoids missing po-eful molecules and allows the use of other cri-as ease of synthesis) to perform a final rankingidates.Transition metal catalysts are an importantlecule for creating other molecules efficiently withpact on the environment; however, the catalystsare often extremely toxic and harmful to the en-For example, most propylene oxide is producedgenous catalysts containing molybdenum, whiche but also harmful to the environment. Since in

    s a significant amount of such catalyst is lost toment (Allen & Shonnard, 2002), new materialsimproved catalytic activity and less toxicity are

    red. This paper describes a framework for usingn transition-metal catalysts.

    y prediction via connectivity indices

    for a molecular design algorithm to be successful,operties of interest, such as density, toxicity, sol-eactivity within a given system must be estimatednable accuracy using only a very small computa-

    t. Once the properties of a transition metal catalysticted from structure, the problem of designing a

    st can be formulated as an optimization problem.connectivity indices, which are numerical val-

    scribe the electronic structure of a molecule, toe the molecule and to correlate its internal struc-hysical properties of interest. These indices con-olecular structure to the properties of interest by

    mathematical description of the molecule as a

    lar connectivity indices were first introduced by75). Kier and Hall (1976) reported correlationsnnectivity indices and many key properties of or-

    pounds, such as density, solubility, and toxicity.e, these indices can be computed with a mini-mputational effort. Many of the applications ofy indices have been reviewed by Trinajstic (1983).1993) and Gordeeva et al. (1990) reported the useivity indices within a molecular design context,an and Maranas (1998) first incorporated connec-es into an optimization framework. Camarda and999) used connectivity indices as property de-design polymers that have pre-specified property

    dhaye et al. (2000) employed connectivity indicese physical properties for the design of novel phar-products via combinatorial optimization.er molecular design work, group contributionere used to estimate the values of physical prop-

    Group

    v

    erties(1994Ganigroupfuncteffecity inlar sttivitybe in(2000tribuorder(2003to giwhendicesondalar stand Kpletevidethe ccomb(JohnKim,

    Ccompgroupare datomdrogeeach

    BTablediceswithtivitybasicity. Fstatestronsthe ogroupcomp

    Msuppular c1 2 3

    CH F Cl3 1 13 7 0.77778

    n Gani et al. (1991), Venkatasubramanian et al.Maranas (1996). The works of Constaninou and

    ) and Marrero and Gani (2001) extend the basicribution approach by considering combinations ofgroups and thus take into account second-ordern used to predict physical properties. Connectiv-

    , however, take into account the entire molecu-e of a compound. By using higher-order connec-es, third-order and higher structural effects canin structure property relations. Harper and GaniHarper et al. (2003) have combined group con-

    ith connectivity indices to include these higher-ts. Raman and Maranas (1998) and Harper et al.e shown that these higher order effects are ablere accurate property descriptions. Furthermore,lecular design problem is solved using these in-plete molecular structure is obtained, and no sec-

    blem must be solved to recover the final molecu-e. Harper and Gani (2000) and Meniai, Newshamoui (1998) have also reported the design of com-cular structures. Connectivity indices also pro-hod to compute the properties of mixtures, sincetivity indices of individual compounds can beto estimate mixture properties in certain casesLin, Miller, & Camarda, 2002; Kim, Min, Lee,ng, 1992).ational property estimation algorithms first de-molecule into smaller units. Then, a set of basiccan potentially be part of the candidate molecules

    . A basic group is defined as a single non-hydrogeniven valence state, bonded to some number of hy-ms. Atomic connectivity indices are defined overgroup.roups of the molecule CHCl2F are shown ine values are the simple atomic connectivity in-efer to the number of bonds which can be formedgroups. The v values are atomic valence connec-es that describe the electronic structure of each, including lone-pair electrons and electronegativ-sition metals, which can assume multiple valencedefinition of v is based on the number of elec-ipating in the bonding, instead of those present in

  • 340 B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347

    Table 2Calculation of molecular connectivity indices

    Zero 0 =iV 1/2i 0v =

    iV v1/2i Vthe set of all vertices

    First 1 =(i,j)E (vi vj )1/2 1v =

    (i,j)E (vi vj )1/2 Ethe set of all edgesSecond 2 =(i,j,k) T (ijk)1/2 2v =

    (i,j,k) T (vi vjvk)1/2 Tthe set of all triplets

    group (theof the grou

    Higher ohave been ular structurthe secondadded accuover eachpossible co

    Once thindices arerelations tometal catalthe design otion. (To limmaximumderived in t

    = 5535+1784+6490

    +8.9(This correcentered coof 0.962 anrelation, anoptimal solset of targe

    3. Problem formulation

    When zeroth, first- and second-order connectivity indicesare employed for property estimation, a molecule is repre-sented mathematically using sets of binary variables. First, a

    r of binary variables, zi, is defined. An element of thisr equae, it eqelemeonded0. Th

    to comcularlationll forme obje

    ifferenritten

    bj =

    e R isof pr

    rtancet valueructurculesectedts inche rel for Cble 3ning atotal nonly

    ix f is 2

    Table 3Basic groups

    Group

    v

    Nmax,iFig. 1. Hydrogen-suppressed graph of CHCl2F.

    sum over all vertices), which describe the identityps in a given molecule (see Table 2).rder connectivity indices can also be defined andsed to give a more precise description of molecu-

    e (Kier and Hall, 1976). In this work, we have used-order connectivity indices, 2 and 2v, to giveracy to correlations. These two indices are sums

    of the triplets in the molecule, that is, over eachmbination of three bonded groups (see Fig. 1).e equations defining the (molecular) connectivityin place, these indices can be used in empirical cor-predict the physical properties of novel transition-ysts. Table 3 shows the basic groups employed inf a molybdenum catalyst for an epoxidation reac-it the search space, we set an upper bound on the

    number of groups in a molecule.) The correlationhis work for density is:

    1 + 758000 76630v + 4090111v 720462 6072v 24695(0)2

    0v 12271(1)2 65.4(1v)2 1793(2)2

    2v)2 + 7232312 (1)lation was developed from regressing 23 Mo-mplexes and resulted in a correlation coefficientd a sum of squared error of 0.944. Using this cor-optimization problem has been formulated whose

    vectovectoerwiswithare bfi,j,k =tionmolecorre

    overa

    Ththe dbe w

    min O

    whervalueimpotarge

    Stmoleconn

    strainues. Tmode

    TadesigTheSincematrutions are molecules which most closely match at property values.

    either 0 oris22

    i=1i

    and atomic connectivity values of case 1

    1 2 3 4 5

    OH Cl

    5 6 1 1 20.13889 0.17143 5 0.77778 62 2 3 3 3ls one if the ith group exists in the molecule; oth-uals zero. Second, a partitioned adjacency matrix

    nts fi,j,k is determined. When basic groups i and jwith a kth-multiplicity bond, fi,j,k = 1; otherwise,ese sets of variables provide sufficient informa-pute the connectivity indices and, thus, estimate

    properties. Along with these definitions, propertys using the connectivity indices are included in the

    ulation.ctive is to determine the molecule that minimizesce between the target property values. This can

    as:

    mR

    1P scalem

    |Pm P targetm | (2)

    the set of all targeted properties, Pm the estimatedoperty m, P scalem a scale factor used to weight theof one property relative to another, and P targetm thefor property m.

    al constraints are added to ensure the obtainedare fully connected and satisfy valency by beingwith the appropriate types of bonds. Other con-lude bounds on the variables and property val-sulting formulation is a large, non-convex MINLP

    AMD.shows the set of 8 basic group types used formolybdenum catalyst for an epoxidation reaction.umber of available groups is

    8i=1Nmax,i = 22.

    single bonds are involved, the size of the adjacency2.22.1 = 484 elements. Each element, fi.j.k, can be1, and the total number of independent variables1 + 22 = 253. In order to apply TS effectively,

    6 7 8O CH3 CH2 NH2

    1 2 11 2 33 3 3

  • B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347 341

    the original MINLP model of Siddhaye et al. (2000) has beenaltered to handle basic groups and the connectivity betweenthem rather than dealing with the elements of the adjacencymatrix directly.

    4. TS implementation

    TS begins by determining an initial solution. Additionalsolutions (termed neighbors) are generated by modifying theexisting solution through a sequence of moves. The best newneighbor (x*) is used as the starting point for the next iterationunless it is on a Tabu list. Thus, even if no neighbor solutionsare better than the initial solution, the best one is still chosenas the starting point for the next iteration. A record of the bestsolution ever found (x*) is separately maintained. In addition,the Tabu lists provide an adaptive memory that guides thesearch by taking advantage of historical information. Thismemory enables TS to make strategic choices and achieveresponsive exploration.

    The standard TS algorithm (Glover & Laguna, 1997; Lin,Chavali, Camarda & Miller, 2003; Lin & Miller, 2004b) isadapted to represent the molecule efficiently. Each solution isa molecule consisting of a series of fully connected groups.Therefore, the initial solution is constructed by connectingbasic groups together. In generating neighbor solutions, op-erators are also designed to handle basic groups. The pro-cedure of building an initial solution is described using thebasic groups listed in Table 3. The progression of the build-ing process

    Table 4Procedures fo

    Step

    1

    2

    3

    4

    5

    6

    a mainchain is defined as the list within a molecule with thelargest number of groups. It is determined during the processof constructing a molecule. The length of sidechains (list ofgroups conthan or equdesign test

    4.1. Buildi

    Step 1. OFor simpsen.

    Step 2. Asecond gempty bconnectewould brequiringbe chosefrom Tabshown in

    Step 3. Ntional grSince theof the si1, anothshown inthe mainalways t

    ep 4. Ta branded.ep 5.e free

    ep 6. Imole

    tainedn one

    fully c

    set ofat ea

    ) will b

    Neigh

    o setperatocule, ah; gloficantlfy thetors:

    eplacehis opith anis shown in Table 4. In the CAMD framework,

    r building an initial solution

    Solution

    NH2

    Stinad

    StOn

    Sttheobthais

    Auatedtions

    4.2.

    Twcal omolesearc

    signiversiopera

    1. RTwnecting to the mainchain) is constrained to be lessal to that of the mainchain. For all the catalystcases, at least one Mo group must be present.

    ng an initial solution

    ne of the basic groups is selected as a root group.licity, a group with only one bond, NH2, is cho-

    nother group is added to the root group. If theroup has only one bond, i.e., OH, there are no

    onds in the molecule. Since the molecule is fullyd, the initial solution, hydroxylamine (H2N OH),e obtained; however, this violates the constraint

    at least one Mo atom. Thus, another group mustn. In this example, this step adds basic group 1le 3, resulting in molecule fragment for step 2 asTable 4.ow, the molecule consists of two groups. Addi-

    oups are needed to connect to the empty bonds.length of the mainchain is currently 2, the length

    dechains can be either 1 or 2. When the length iser group with a single bond, OH, is attached asTable 4. As sidechains are added to the molecule,

    chain identity and length are updated so that it ishe longest chain in the molecule.he length of the second sidechain is 2, resultingch that consists of two groups. Thus, O Cl are

    Another sidechain of length 1, CH3, is added.bond is still existing in the molecule.

    f a group with only one bond is selected, i.e. OH,cule is fully connected, and the initial solution isas shown in Table 4. If the selected group has morebond, steps 25 are repeated until the molecule

    onnected.

    candidate solutions will be generated and eval-ch iteration. These candidates (or neighbor solu-e constructed by modifying the current molecule.

    bor generation operators

    s of operators are defined: local and global. Lo-rs maintain the main backbone of the currentnd are implemented for the purpose of localized

    bal operators generate neighbor solutions that arey different from the current one in order to di-solutions at each iteration. There are six local

    erator replaces a group in the current moleculeother available basic group. If OH is chosen, it

  • 342 B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347

    Table 5Neighbor generation operators

    Step Solution

    Replace

    Insert

    Delete

    Swap

    Move

    can be ris replac(see Tab

    2. InsertA basicthe selegroup,can beinsertin

    3. DeleteThis opsolutiongroup winfeasibnum or

    Table 54. Swap

    Two grapplyinselectedchosen.shown iping tware equdifferen

    5. MoveThis opemoleculafter thshown i

    6. CombinationThis operator combines the 5 operators to form a new one.A random number between 1 and 5 is first generated. For

    ample, 3, then 3 of the 5 operators, such asMove, Insert,ap, will be applied sequentially to the current moleculed generate a neighbor solution.

    he following three global operators help TS perform aversified search to investigate the whole solution spacefficiently:deChain Rebuildthe s

    oleculnerateainCha mainter it wconstrtal Rthe firhole moleculitial s

    e twocatingted atoperah procfter o

    matrg theand ve caleplaced by Cl, OH, or CH3. If the group Oed by CH2 , a new molecule will be obtainedle 5).

    group that is available will be inserted in front ofcted group in the current molecule. Suppose theCH3, is chosen, a group with two single bonds

    ex

    Swan

    Tdisu

    7. SiIfm

    ge8. M

    Ifafre

    9. ToIfw

    m

    in

    Thof loseleclocalsearc

    Acencyamon

    dicescan binserted on the sidechain. The new molecule byg CH2 is shown in Table 5.

    erator deletes the selected group in the current. Therefore, the operator cannot be applied to aith only a single bond, since it will results in anle molecule. In this example, only the molybde-the oxygen group can be deleted. The molecule inshows the result of the deletion of O .

    oups of the current molecule are exchanged byg the swap operator. Suppose the group CH3 is, another group with only a single bond will beThe molecule after swapping Cl with CH3 isn Table 5. Since molecules obtained from swap-o groups connecting to the same parent groupivalent, groups to be swapped are required to bet and not connected to the same parent.

    rator moves the selected group within the existinge. Suppose the group O is selected to be movede group NH2. A new molecule is obtained asn Table 5.

    estimated witeration, cand addedof other solthe Tabu liously and aadaptive ma more flex

    4.3. Tabu l

    In the Tfully conneand sidechTS based ois identifie

    Fielected group is located on a sidechain of thee, the sidechain will be replaced by a newly-d one.ain Rebuildchain group is selected, all groups that are locatedill be deleted and the rest part of the molecule is

    ucted.ebuildst or the last group of the molecule is selected, theolecule will be discarded and replaced by a new

    e that is built following the steps of generating anolution.

    sets of operators are selected based on the stagea solution. Global operators are more frequentlythe starting stage to favor a diversified search andtors are selected more frequently at the end of theess to locate the final solution precisely.

    btaining a molecule, all elements of the adja-ix can be determined according to the connectivitygroups. Then, with the provided simple atomic in-alence indices, the molecular connectivity indicesculated, and the properties of this molecule are

    ith the property-structure correlations. At eachertain solutions are classified as Tabu (forbidden)to Tabu lists. At the same time, the Tabu propertyutions will expire, and they will be removed fromsts. In this way, Tabu lists are updated continu-dapt to the current state of the search. The use ofemory enables TS to exhibit learning and createsible and effective search.

    ists

    S algorithm, each molecule consists of a set ofcted groups, which are classified into mainchainain groups. Fig. 2 shows a solution located byn basic groups listed in Table 3; the mainchain

    d as: Cl CH2 Mo Cl with 3 sidechains that are

    g. 2. A sample molecule for case 1 located by TS.

  • B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347 343

    Table 6Recency-based Tabu list

    No. Tabu tenure Objective function Group list

    1 22

    2 20

    3 15

    4 14

    5 10

    6 9

    all of lengtsidechain gcluded in Twill be timfrequency-groups.

    The recetions, and ithe best neifar (x*), it isTabu list. TTabu tenurber of basifor this cas

    A segmTable 6. Thing which alist. The firobjective fu

    It will ba best neigthis periodthe improv1997).

    The freqory and kemost frequean index. Windex willdecreases.process has

    Table 7list. The 10.4848 has

    Table 7Frequency-based Tabu list

    No. Freque

    0.025

    0.103

    0.085

    12.75

    0.590

    3.600

    ced byhe 4t

    een v5) is mecauses, the

    lgorithhism,me as

    lwaysabu lisf a ne

    Intens

    ased on knowledge of the current search status as pro-0.0036

    0.0699

    0.5752

    0.5666

    0.5041

    0.0094

    h 1, the group OH. If both the mainchain androups, as well as connectivity relations are in-abu lists, the update and maintenance of the listse-consuming. Therefore, both recency-based andbased Tabu lists only keep track of mainchain

    ncy-based Tabu list records recently visited solu-s called short-term memory. At each iteration, ifghbor is not better than the best solution found soclassified as Tabu and added to the recency-basedhe Tabu property remains active throughout the

    e, which is empirically set equal to the total num-c groups available for building the molecule (22e).ent of the recency-based Tabu list is shown ise first column shows the number of iterations dur-solution will be kept on the recency-based Tabu

    st solution on the recency-based Tabu list with annction 0.0036 has the following mainchain:

    1

    2

    3

    4

    5

    6

    replafull. T

    has b(12.7

    Bgroupthe amorpthe sation athe Ttion obors.

    4.4.

    Be released from the list after 22 iterations, unlesshbor with the same mainchain is located within. In this case, the Tabu property is overridden byed-best aspiration criterion (Glover and Laguna,

    uency-based Tabu list provides long-term mem-eps track of the solutions that have been visitedntly. The frequency Tabu property is denoted withhen a previous solution is revisited, its frequency

    be incremented; otherwise, the frequency indexThis allows TS to determine whether the searchbecome trapped in a specific area for a long time.

    shows six solutions on the frequency-based Tabust solution with an objective function value ofthe lowest frequency index. This solution will be

    vided throution strategtion is carring the loccation andlists. If thefrequency-old (in thisdeemed tooften. Thusrandomly gtors, and thtion strategstochasticnot purelymation.ncy index Objective function Group list

    0.4848

    0.1405

    0.0648

    0.0094

    0.4879

    0.5666

    new solutions if the frequency-based Tabu list ish solution with the following mainchain structure:

    isited most frequently, since its frequency indexuch larger than that of the other solutions.the Tabu lists track the occurrence of mainchain

    danger of auto isomorphism negatively impactingm is reduced. Even without considering auto iso-it is possible that some neighbor solutions will beothers. To avoid becoming stuck with a best solu-being a rearranged version of the same molecule,t will recognize the mainchain and force the selec-w starting point for the next generation of neigh-

    ication and diversicationgh the Tabu lists, intensification and diversifica-ies are used to control the search area. Intensifica-ied out by generating neighbor solutions employ-al operators. Global operators increase diversifi-are implemented based on frequency-based Tabumaximum frequency index of solutions on the

    based Tabu list is larger than a predefined thresh-case, the threshold is empirically set to 22), TS ishave been searching around a specific solution too, the current solution will be replaced by a newenerated solution by applying the global opera-

    e search process will be restarted. This diversifica-y is similar to the restart mechanism used in otheroptimization approaches; however, the restart israndom. Instead, it is guided by historical infor-

  • 344 B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347

    In some cases, TS cannot locate improved solutions af-ter the restart operation and additional procedures will berequired. For example, if the current best solution has theobjective fubest neighbanother intthe formertion to thetion.

    4.5. Aspira

    An aspiemployedhelps to mfication anSince an indiversifiedallows theend of theproperly en

    4.6. Const

    In this aguaranteeson basic grstraints areas previousoperators etriple) to bbonds is chin the curreof either coall groups c

    5. Case st

    The effdemonstratrameters fo

    1. the numtion is eand the

    2. the num3. the leng

    maximu

    The TSa PentiumRedhat Lin

    5.1. Case 1

    The first case is the design of a molybdenum catalyst foroxidato defiate thith a dt property of

    activdefind to s

    mountion ofsure ri

    densl catalby Ee num

    molecbtainenal ouion. Ithe psolut

    resultions isocatedthe obsity o

    tion 3me. Try clon the b, thisisingn.is espions sily 4%strongore, m

    een ta. Thu

    se fromprov

    l PC.ICOP

    n wastructu

    Case

    e sec9) annction of 0.0094 and TS initiates a restart, if theor after several iterations is not less than 0.0094,ensification strategy will pull the search back toly promising area, by assigning the current solu-best neighbor recorded before the restart opera-

    tion criterion

    ration criterion based on the sigmoid function isto invalidate Tabu property in certain cases andaintain an appropriate balance between diversi-d intensification (Lin and Miller, 2004a, 2004b).tensified search is favored at the end, and a broadlysearch, at the beginning, the aspiration criterionTabu property to be overridden more towards thesearch. This helps to ensure that the Tabu listscourage a broad search early on.

    raint handling

    pplication, constraints are handled in a way thatthe feasibility of the molecule. Since TS operatesoups directly, both valence and connectivity con-handled during the process of neighbor generationly described. For example, both local and globalnsure the proper type of bond (single, double ore connected. In addition, the number of emptyecked to guarantee that valency will be satisfiednt solution. Groups that would cause a violationnnectivity or valency are not allowed. Otherwisean interconnect.

    udies

    ectiveness of the TS algorithm for CAMD ised with the following two case studies. The pa-r each case are as follows:

    ber of neighbor solutions generated at each itera-qual to the product of the number of basic groupsmaximum number of groups in a molecule;ber of iterations is 200;th of the Tabu list is defined to be the same as them number of groups in a molecule;

    algorithm is compiled using the gcc compiler onIII 1.0 GHz CPU, 1024 MB memory PC runningux 7.1.

    an epusedevaluTS wtargepropalyticalonerelatethe abinatexpois themetagiven

    Thbestare o

    the fiiterattrial,diatefinalsolutwas lwitha denSoluthe tiis vetwee0.2%promcatio

    Itsolutimateto bethermnot blationchoo

    TSsona

    the DlutioThe s5.67.

    5.2.

    ThTabletion reaction. Eight basic groups (see Table 3) arene the search space. The purpose of this case is toe effectiveness of the formulation and to compareeterministic solution method. Thus, only a single

    erty, density, is used. While density is an importanta homogenous catalyst, it does not define the cat-

    ity or any other critical properties and, thus, cannote an effective catalyst. However, density is closelyolubility for these systems, and solubility definesof potential uptake by humans. Therefore, a com-density and toxicity would be needed to assess

    sk. The target density value is 4172 kg/m3, whichity of a commonly used homogenous transition-yst. The correlation for density this property isq. (1).

    ber of neighbor solutions is 8 22 = 176. The 5ules and the probability that a molecule is foundd from 100 trials. Sub-optimal solutions are eithertput of a test or an intermediate result of certain

    f the sub-optimal solution is the final result of arobability is incremented by 1%. Since interme-ions of some trials may be even better than theof other trials, the probability of such sub-optimaldenoted as

  • B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347 345

    Table 8Results of case 1 with TS approach

    Structure N Objective value Probability (%)

    0 18 0.000111 1

    1 13 0.000115

  • 346 B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347

    groups:

    Oxid =NMo

    i=1ziOxid(i) (4)

    where Oxid(i) is the oxidation state of the ith group and ziis an element of the existence vector. The target value of theoxidation sthe followi

    log(LC50)

    The LC50 wSyracuse Rrelation hasquared err

    In thissummation

    Obj = 0.6The weightbe based onthese weighthe conclus

    The numlength of thfrom 100 Ttion with antrials and ting 60% osolution anCl NH O(The valuemore thancase study,near-optim

    In compDICOPT stion was fo10 workstajective funmay be abthis problewe see thations withia researche

    Fig. 4.

    alysts, since the list of near-optimal solutions provides op-tions which can be narrowed down by employing other fac-tors such as cost, ease of synthesis, and estimated values ofphysical andesign.

    onclu

    detaillems ispproacted u

    implnd co-base

    ratingapplilutionh spacion pref nearo cas

    owled

    is pron thronal supf Rose

    rence

    D., &nscious

    rsey: Prrda, K.n usingsearch,N., &

    ng modeering

    aninou,timating97171antinou

    acentrw grouer, F., Fproachsed on(6), 809R., &uid ex1340.R., Nieproach18133tate is the set {6,7,8}. Toxicity is determined withng correlation for the LC50:

    = 27.8 + 5.490+ 10.60v 6.6512.931v 4.402v + 0.3682v

    0.710(0v)2 + 5.59(1)2 0.0761(1v)2

    2.100v1+ 0.5160v1v 0.02950v2v

    (5)as developed using 34 data points obtained fromesearch Corporation (2000). The resulting cor-

    s a correlation coefficient of 0.953 and a sum ofor of 2.083.example, the objective function is the weightedof these properties:

    LC50 + 0.3 Elec + 0.1 Oxid (6)s on the properties are freely adjustable and couldthe preference of the designer. For this case study,ts are simply examples and do not directly impactions of the paper.ber of neighbor solutions is 10 20 = 200. Thee Tabu list is 20. The 5 best molecules obtainedS trials are shown in Table 10. The optimal solu-objective function of 1.286401 is obtained in 40

    he second optimal solution is located the remain-f the time. The difference between the optimald the 10th sub-optimal solution (shown below):Mo O Mo NH CH2 Cl is smaller than 1%.

    of its objective function is 1.297251.) In addition,60 near-optimal solutions are obtained for thiswhich shows that TS can locate a large number ofal solutions for further experimental verification.arison, employing outer approximation via theolver in GAMS, only an integer feasible solu-und after 20 min of CPU time on a Sun Ultra

    tion. The structure (see Fig. 4) resulted in an ob-ction of 5.55. While the deterministic algorithmle to find and guarantee the global optimum tom after expending a large amount of CPU time,t TS can generate a list of near-optimal solu-n a short amount of time. This is very useful tor searching for novel alternatives to current cat-

    Integer solution for case 2 found with OA algorithm.

    6. C

    Aprobtion apredirithmtors aindexgenegiventhe sosearc

    solutber othe tw

    Ackn

    Thdatioditioing o

    Refe

    Allen,CoJe

    CamasigRe

    Churi,migin

    Constes

    16Const

    thene

    Friedlapba22

    Gani,liq33

    Gani,ap13d chemical properties not included in the original

    sions

    ed implementation of the TS algorithm to CAMDpresented in this paper. Although other optimiza-

    ches have been applied to CAMD with propertiessing group contribution techniques, the TS algo-emented with novel neighbor-generating opera-mbined with property prediction via connectivityd correlations provides a powerful technique forlists of near-optimal molecular candidates for acation. In addition, the Tabu lists help TS searchspace both in a diversified way, to cover the entiree, and in an intensified manner, to locate the finalcisely. Moreover, TS is able to locate a large num-optimal solutions within a short time as shown ine studies.

    gements

    ject was supported by the National Science Foun-ugh grant CTS-0224887. B. Lin acknowledges ad-port from the Department of Chemical Engineer--Hulman Institute of Technology.

    s

    Shonnard, D. (2002). Green Engineering: EnvironmentallyDesign of Chemical Processes. Upper Saddle River, New

    entice-Hall.V., & Maranas, C. D. (1999). Optimization in polymer de-connectivity indices. Industrial and Engineering Chemistry38, 18841892.Achenie, L. E. K. (1996). A novel mathematical program-el for computer aided molecular design. Industrial and En-Chemistry Research, 35(10), 37883794.L., & Gani, R. (1994). New group contribution method for

    properties of pure components. AIChE Journal, 40(10),0., L., Gani, R., & OConnell, R. J. P. (1995). Estimation ofic factor and the liquid molar volume at 298 K through a

    p contribution method. Fluid Phase Equilibria, 103, 1122.an, L. T., Katotai, L., & Dallos, A. (1998). A combinatorialfor generating candidate molecules with desired propertiesgroup contribution. Computers and Chemical Engineering,817.Brignole, E. A. (1983). Molecular design of solvents fortraction based on UNIFAC. Fluid Phase Equilibria, 13,

    lsen, B., & Fredenslund, A. (1991). A group contributionto computer-aided molecular design. AIChE Journal, 37(9),2.

  • B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347 347

    Table 10Solutions of TS for case 2

    No. Structure NGrp Objective Probability (%)0 Cl Mo O Mo O CH2 CH2 Cl 8 1.286401 401 Cl O Mo O Mo NH CH2 Cl 8 1.288636 602 Cl CH2 O Mo O Mo NH Cl 8 1.288698

Recommended

View more >