Computer-aided molecular design using Tabu search

Download Computer-aided molecular design using Tabu search

Post on 26-Jun-2016

223 views

Category:

Documents

7 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>Computers and Chemical Engineering 29 (2005) 337347</p><p>Computer-aided molecular designa,1 b dab,</p><p>gy, 550b De 530 W 1</p><p>ctober</p><p>Abstract</p><p>A detailed ded mopresented in hods oproblems wh argets.based on co vel opwhich woul ach. Inprocess guar t, resulcase studies TS is ausing a much smaller amount of computation time. 2004 Elsevier Ltd. All rights reserved.</p><p>Keywords: Tabu search; Computer-aided molecular design; Computation time; Transition metal catalysts</p><p>1. Introdu</p><p>Computtential to gvelop newand time-cslate of candesired procused expecable to sucand detergeputationalnew cancer</p><p>The CAward and a</p><p> CorrespoE-mail a</p><p>david.c.miller1 Present ad</p><p>nical Univers</p><p>0098-1354/$doi:10.1016/jction</p><p>er-aided molecular design (CAMD) has the po-reatly decrease the time and effort required to de-molecular entities by reducing the need for costlyonsuming trial-and-error experiments. Instead, adidate molecules which are predicted to have theperties can be used as a starting point for more fo-rimental synthesis. This general method is appli-h varied products as catalysts, polymers, solvents,nts. Hairston (1998) recently reported that a com-</p><p>algorithm has been successfully used to design a-fighting pharmaceutical.MD methodology consists of solving both a for-backward problem (Venkatasubramanian, Chan,</p><p>nding author. Tel.: +1 812 877 8506; fax: +1 812 877 8992.ddresses: bal@kt.dtu.dk (B. Lin),@rose-hulman.edu (D.C. Miller).dress: CAPEC, Department of Chemical Engineering, Tech-</p><p>ity of Denmark, Lyngby, DK-2800, Denmark.</p><p>&amp; Caruthers, 1994). The forward problem predicts propertiesbased on molecule structure; the backward step identifies astructure to obtain a molecule with a given set of target proper-ties. Property prediction is usually based on either group con-tribution methods or topological indices. The group contribu-tion approach has been most widely reported (Gani, Nielsen,&amp; Fredenslund, 1991; Harper &amp; Gani, 2000; Harper, Hostrup,&amp; Gani, 2003; Sahinidis &amp; Tawarmalani, 2000; Sahinidis,Tawarmalani, &amp; Yu, 2003; Vaidyanathan &amp; El-Halwagi,1996; Venkatasubramanian et al., 1994). Constaninou andGani (1994) and Constantinou, Gani, and OConnell (1995)described a two-level group contribution method which uti-lizes molecular structure information to estimate the phys-ical and thermodynamic properties of pure components. Athree-level group contribution method proposed by Marreroand Gani (2001) exhibits improved accuracy and applicabil-ity to deal with bio-chemically and environmentally-relatedcompounds. Because most group contribution methods can-not adequately account for steric effects (Wang &amp; Milne,1994), several researchers have begun using topological</p><p> see front matter 2004 Elsevier Ltd. All rights reserved..compchemeng.2004.10.008B. Lin , S. Chavali , K. Camara Department of Chemical Engineering, Rose-Hulman Institute of Technolopartment of Chemical and Petroleum Engineering, University of Kansas, 1</p><p>Received 11 December 2003; received in revised form 25 O</p><p>implementation of the Tabu search (TS) algorithm for computer-aithis paper. Previous CAMD research has applied deterministic metich arise from the search for a molecule satisfying a set of property t</p><p>nnectivity indices, which allows the TS algorithm to use several nod have no effect with a traditional group contribution-based approantees that molecular valency and connectivity constraints are meusing TS are compared with a deterministic approach and show thatusing Tabu searchD.C. Millera,</p><p>0 Wabash Avenue, Terre Haute, IN 47803, USA5th, 4132 Learned Hall, Lawrence, KS 66045, USA</p><p>2004; accepted 26 October 2004</p><p>lecular design (CAMD) of transition metal catalysts isr genetic algorithms to the solution of the optimizationIn this work, properties are estimated using correlationserators to generate neighbors, such as swap and move,</p><p>addition, the formulation of the neighbor generationting in a complete molecular structure. Results on twoble to provide a list of good candidate molecules while</p></li><li><p>338 B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347</p><p>Nomenclature</p><p>fi,j,k</p><p>xx*</p><p>zi</p><p>ElecElec(i)</p><p>LC50</p><p>NMo</p><p>NTotNBond</p><p>Nmax,i</p><p>OxidOxid(i)PmP scalem</p><p>Ptargetm</p><p>R</p><p>Greek le00v</p><p>11v</p><p>22v</p><p>v</p><p>indices in ation (Cama&amp; Maranas2000). Whproperty inrelating thest. The in</p><p>timization problem, whether it is solved explicitly and de-terministically as in Maranas (1996), stochastically via a ge-netic algorithm-based approach as in Venkatasubramanian</p><p>. (199of Frhanovarper</p><p>any rprobla partitioned binary adjacency matrix showingwhen basic groups i and j are bonded with akth-multiplicity bondbest new neighbor solutionthe best solution ever foundan existence vector showing wither the ith</p><p>et althoseMolcand H</p><p>Mwardgroup exists in the moleculesum of the electronegativities of all the groupselectronegativity of group i in electronvolts(eV)lethal concentration killing 50% of the test pop-ulation, a measure of toxicitythe number of rows allotted to all the molyb-denum groupstotal number of basic groupstotal number of types of bonds of the basicgroupsmaximum number of basic groups allowed inmoleculesum of the oxidation states of all the Mo groupsthe oxidation state of the of the ith groupthe estimated value of property ma scale factor used to weight the importance ofone property relative to anotherthe target value for property mset of all targeted properties</p><p>tterszero-order molecular connectivity indiceszero-order molecular valence connectivity in-dicesfirst-order molecular connectivity indicesfirst-order molecular valence connectivity in-dicessecond-order molecular connectivity indicessecond-order molecular valence connectivityindicessimple atomic connectivity indices that refer tothe number of bonds which can be formed withother groupsatomic valence connectivity indices that de-scribe the electronic structure of each basicgroup</p><p>n effort to obtain more accurate property predic-rda &amp; Maranas, 1999; Kier &amp; Hall, 1976; Raman, 1998; Siddhaye, Camarda, Topp, &amp; Southard,ichever approach is chosen, existing structure-formation is regressed to form an empirical modele molecule structure to the properties of inter-verse problem is essentially a mixed integer op-</p><p>binatorialhave beenStephanopoa graph recular structuVaidyanathval analysipolymers aformulated(MILP/MINnon-convex</p><p>by expressinary variabbinary varianectivity inconvex MIdesign probant design pimation (Aand-boundand Sinhacleaning aTawarmalabranch-andof Freon. Sdevelopedmethods. Vnetic algorestimated vKokossis (proach fortion solvenglobal optialgorithms</p><p>Tabu seabinatorialsearch procout becomiother stochof previousguide the ssince they ptions. The asolutions isalgorithmsnormally dtermination4), or via a generate and test approach such asiedler, Fan, Katotai, and Dallos (1998); Gordeeva,a, and Zefirov (1990); Harper and Gani (2000)et al. (2003).</p><p>esearchers have reported solutions to the back-em to determine the molecular structure. Com-and heuristic-based enumeration approachesreported (Gani &amp; Brignole, 1983; Joback &amp;ulos, 1989). Kier, Lowell, and Frazer (1993) used</p><p>onstruction approach to determine feasible molec-res with bounded physical property values, whilean and El-Halwagi (1996) described an inter-s approach for the computer-aided synthesis ofnd blends. CAMD problems have recently beenas mixed-integer linear/non-linear programmingLP) problems. Maranas (1996) transformed theMINLP formulation into a tractable MILP model</p><p>ng integer variables as a linear combination of bi-les and replacing the products of continuous andbles with linear inequality constraints. Using con-dices, Camarda and Maranas (1999) described aNLP representation for solving several polymerlems. Churi and Achenie (1996) solved a refriger-roblem with an augmented penalty-outer approx-P/OA) algorithm. A reduced dimension branch-algorithm was presented by Ostrovsky, Achenie,(2003) to design optimal solvents used as</p><p>gents in the printing industry. Sahinidis andni (2000), Sahinidis et al. (2003) reported a-reduce algorithm for identifying a replacementtochastic optimization approaches have also beenas alternate strategies for rigorous deterministicenkatasubramanian et al. (1994) employed a ge-ithm for polymer design in which properties areia group contribution methods. Marcoulaki and</p><p>1998) described a simulated annealing (SA) ap-the design of refrigerants and liquidliquid extrac-ts. Wang and Achenie (2002) presented a hybridmization approach that combines the OA and SAfor several solvent design problems.rch (TS) is a heuristic approach for solving com-</p><p>optimization problems by using a guided, localedure to explore the entire solution space with-ng easily trapped in local optima. It differs fromastic optimization techniques by maintaining lists</p><p>solutions (usually termed memory) that helpearch process. These lists are useful for CAMDrovide a direct method to track near-optimal solu-bility of TS to efficiently find a set of near-optimalparticularly useful since all property prediction</p><p>have limited accuracy and problem formulationso not include all relevant properties. Thus, the de-of the global optimum is not as critical as finding</p></li><li><p>B. Lin et al. / Computers and Chemical Engineering 29 (2005) 337347 339</p><p>a set of near-optimal solutions. Deterministic approaches cangenerate such a list by multiple MINLP solves and applica-tion of integer cuts, but for large problems this becomes pro-hibitively cproaches, agenerate surange of potentially usteria (suchof the candclass of mominimal imthemselvesvironment.using homoare effectivmany casethe environwhich showhighly desiTS to desig</p><p>2. Propert</p><p>In orderphysical prubility, or rwith a reasotional efforcan be prednew catalyWe employues that decharacterizture with pnect the mproviding awhole.</p><p>MolecuRandic (19between coganic comFurthermormum of coconnectivitKier et al. (of connectwhile Ramtivity indicMaranas (1scriptors tovalues. Sidto predict thmaceutical</p><p>In earlimethods w</p><p>Table 1Atomic connectivity values of groups in CHCl2F</p><p>, as i), and(1994cont</p><p>ionalts whedicesructurindic</p><p>cluded) and</p><p>tion weffec) hav</p><p>ve mo</p><p>a mo</p><p>, a comry proructurhalfamole</p><p>a metonnec</p><p>inedson,&amp; Jeo</p><p>omputose as that</p><p>efinedin a gn atobasicasic g</p><p>1. Ththat r</p><p>otherindicgroupor tran, thepartic</p><p>uter shell. Once the atomic connectivity indices of basics are defined, the molecular connectivity indices can beuted for the entire molecule.olecule structure is expressed with a hydrogen-ressed graph (as shown in Fig. 1). The zero-order molec-onnectivity indices 0 and 0v are the sum of each basicomputationally expensive. Other stochastic ap-s well as generate and test strategies, can alsoch a near-optimal candidate list. By identifying atential target molecules, TS avoids missing po-eful molecules and allows the use of other cri-as ease of synthesis) to perform a final rankingidates.Transition metal catalysts are an importantlecule for creating other molecules efficiently withpact on the environment; however, the catalystsare often extremely toxic and harmful to the en-For example, most propylene oxide is producedgenous catalysts containing molybdenum, whiche but also harmful to the environment. Since in</p><p>s a significant amount of such catalyst is lost toment (Allen &amp; Shonnard, 2002), new materialsimproved catalytic activity and less toxicity are</p><p>red. This paper describes a framework for usingn transition-metal catalysts.</p><p>y prediction via connectivity indices</p><p>for a molecular design algorithm to be successful,operties of interest, such as density, toxicity, sol-eactivity within a given system must be estimatednable accuracy using only a very small computa-</p><p>t. Once the properties of a transition metal catalysticted from structure, the problem of designing a</p><p>st can be formulated as an optimization problem.connectivity indices, which are numerical val-</p><p>scribe the electronic structure of a molecule, toe the molecule and to correlate its internal struc-hysical properties of interest. These indices con-olecular structure to the properties of interest by</p><p>mathematical description of the molecule as a</p><p>lar connectivity indices were first introduced by75). Kier and Hall (1976) reported correlationsnnectivity indices and many key properties of or-</p><p>pounds, such as density, solubility, and toxicity.e, these indices can be computed with a mini-mputational effort. Many of the applications ofy indices have been reviewed by Trinajstic (1983).1993) and Gordeeva et al. (1990) reported the useivity indices within a molecular design context,an and Maranas (1998) first incorporated connec-es into an optimization framework. Camarda and999) used connectivity indices as property de-design polymers that have pre-specified property</p><p>dhaye et al. (2000) employed connectivity indicese physical properties for the design of novel phar-products via combinatorial optimization.er molecular design work, group contributionere used to estimate the values of physical prop-</p><p>Group</p><p>v</p><p>erties(1994Ganigroupfuncteffecity inlar sttivitybe in(2000tribuorder(2003to giwhendicesondalar stand Kpletevidethe ccomb(JohnKim,</p><p>Ccompgroupare datomdrogeeach</p><p>BTablediceswithtivitybasicity. Fstatestronsthe ogroupcomp</p><p>Msuppular c1 2 3</p><p>CH F Cl3 1 13 7 0.77778</p><p>n Gani et al. (1991), Venkatasubramanian et al.Maranas (1996). The works of Constaninou and</p><p>) and Marrero and Gani (2001) extend the basicribution approach by considering combinations ofgroups and thus take into account second-ordern used to predict physical properties. Connectiv-</p><p>, however, take into account the entire molecu-e of a compound. By using higher-order connec-es, third-order and higher structural effects canin structure property relations. Harper and GaniHarper et al. (2003) have combined group con-</p><p>ith connectivity indices to include these higher-ts. Raman and Maranas (1998) and Harper et al.e shown that these higher order effects are ablere accurate property descriptions. Furthermore,lecular design problem is solved using these in-plete molecular structure is obtained, and no sec-</p><p>blem must be solved to recover the final molecu-e. Harper and Gani (2000) and Meniai, Newshamoui (1998) have also reported the design of com-cular structures. Connectivity indices also pro-hod to compute the properties of mixtures, sincetivity indices of individual compounds can beto estimate mixture properties in certain casesLin, Miller, &amp; Camarda, 2002; Kim, Min, Lee,ng, 1992).ational property estimation algorithms first de-molecule into smaller units. Then, a set of basiccan potentially be part of the candidate molecules</p><p>. A basic group is defined as a single non-hydrogeniven valence state, bonded to some number of hy-ms. Atomic connectivity indices are defined overgroup.roups of the molecule CHCl2F are shown ine values are the simple atomic connectivity in-efer to the number of bonds which can be formedgroups. The v values are atomic valence connec-es that describe the electronic structure of each, including lone-pair electrons and electronegativ-sition metals, which can assume multiple valencedefinition of v is based on the number of elec-ipating in the...</p></li></ul>