[ieee 2010 international conference on artificial intelligence and computational intelligence (aici)...

6
DNA Codewords Design Using Ant Colony Optimization Algorithm Xinjin Wang, Yongpeng Shen, Xuncai Zhang, Guangzhao Cui, Yanfeng Wang Henan Key Lab of Information-based Electrical Appliances Zhengzhou University of Light Industry 5 Dongfeng Road, Zhengzhou, 450002, China [email protected] Abstract—Before performing the DNA computation, a set of specific DNA sequences are required. However, this is a burdensome task as too many constraints need to be satisfied. In this paper, ant colony algorithm is applied to solve the problem of DNA codewords design. Inspired by the traveling salesman problem, first a city matrix with rows and columns is designed, in which every city denotes a DNA sequence. Then the artificial ants begin to search for an optimal route based on the DNA thermodynamic and combinational constraints. At last, the shortest rout with sequences is the desired set of DNA codewords. The simulation results of the proposed approach shows better convergency and can provide reliable and effective codewords for the controllable DNA computation. Keywords-DNA Computation; DNA Codewords Design; Ant Colony Optimization Algorithm; Genetic Algorithm I. I NTRODUCTION Inspired by the behavior of real ants, a population-based meta-heuristic algorithm, called ant colony optimization algorithm (ACOA), was initially proposed by M. Dorigo in 1992 in his PhD thesis [1], the first algorithm was aiming to search for an optimal path in a graph, and based on the behavior of ants seeking a path between their colony and a source of food. ACOA was first successfully applied to solve the famous traveling salesman problem (TSP). As more researchers in recent years the study of the algorithm, ACOA was successfully applied to quadratic assignment problem (QAP), shop scheduling problem (JSP), degree-constrained minimum spanning tree problem (DCMST), vehicle rout- ing problem (VRP), integer programming problems, graph coloring problem (GCP), etc., and made a series of good experimental results. The ability of DNA computation to perform calculations using specific biochemical reaction between different DNA sequences by Watson-Crick complementary basepairing, af- fords a number of useful properties such as massive paral- lelism and a huge memory capacity [2], [3]. In the past 10 years, many interesting applications have been demonstrated in many areas such as Hamiltonian path [4], associate mem- ory [5], breaking DES [6], logic circuits [7], data encryption [8], finite state machine [9], machine learning models [10] and medical diagnosis [11]. Though there have been many achievements, as the studies of DNA computation get deeper and wider, DNA computa- tion faces some hurdles due to the technological difficulty of handling biochemistry process and unknown factors. To overcome these drawbacks, much works have focused on the design of DNA sequences to reduce the possibility for illegal reactions [12]. In concrete terms, DNA codewords design is to minimize the emergence of false hybridization in DNA computation process through the optimization of DNA sequences. In this paper, ACOA will be used to solve the DNA code- words design problem. This paper is organized as follows. In section II, we make a brief introduction to ACOA and DNA codewords design. Section III gives an overview of ACS and ACS for the DNA codewords design. In Section IV, the sequence generation results are shown, and finally a brief conclusion is drawn in Section V. II. PROBLEM STATEMENT A. Problem of DNA Codewords Design The encoding problem for DNA computation can be simply defined as mapping the instances of an algorith- mic problem in a systematic manner onto specific DNA molecules so that the following chemical reactions avoid all these sources of error, and the resulting products contain, with a high degree of reliability, enough DNA molecules encoding the answers to the problem’s instances to enable a successful extraction [13]. For this purpose, various combi- natorial and thermodynamic constraints such as H-measure, secondary structure, continuity, melting temperature and GC content are introduced to optimize the DNA sequence [14]. The exiting methods for designing DNA codewords include exhaustive search algorithm developed by Hartemink et al. [15], random search algorithm designed by Penchovsky and Ackermann [16], template-map strategy proposed by Frutos et al. [17], directed graph method offered by Feldkamp et al. [18], simulated annealing algorithm presented by Tanaka et al. [19], dynamic programming method given by Marathe et al. [20], biological-inspired methods produced by Deaton et al. [21], [22] and Heitsch et al. [23], and evolutionary algorithm implemented by Deaton [24], [25], Zhang et al. [26], Shin et al. [27] and Cui et al. [28]. 2010 International Conference on Artificial Intelligence and Computational Intelligence 978-0-7695-4225-6/10 $26.00 © 2010 IEEE DOI 10.1109/AICI.2010.342 494 2010 International Conference on Artificial Intelligence and Computational Intelligence 978-0-7695-4225-6/10 $26.00 © 2010 IEEE DOI 10.1109/AICI.2010.342 494

Upload: yanfeng

Post on 14-Feb-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

DNA Codewords Design Using Ant Colony Optimization Algorithm

Xinjin Wang, Yongpeng Shen, Xuncai Zhang, Guangzhao Cui, Yanfeng Wang∗

Henan Key Lab of Information-based Electrical AppliancesZhengzhou University of Light Industry

5 Dongfeng Road, Zhengzhou, 450002, [email protected]

Abstract—Before performing the DNA computation, a setof specific DNA sequences are required. However, this is aburdensome task as too many constraints need to be satisfied.In this paper, ant colony algorithm is applied to solve theproblem of DNA codewords design. Inspired by the travelingsalesman problem, first a city matrix with 𝑇 rows and 𝑆

columns is designed, in which every city denotes a DNAsequence. Then the artificial ants begin to search for an optimalroute based on the DNA thermodynamic and combinationalconstraints. At last, the shortest rout with 𝑆 sequences isthe desired set of DNA codewords. The simulation resultsof the proposed approach shows better convergency and canprovide reliable and effective codewords for the controllableDNA computation.

Keywords-DNA Computation; DNA Codewords Design; AntColony Optimization Algorithm; Genetic Algorithm

I. INTRODUCTION

Inspired by the behavior of real ants, a population-basedmeta-heuristic algorithm, called ant colony optimizationalgorithm (ACOA), was initially proposed by M. Dorigo in1992 in his PhD thesis [1], the first algorithm was aimingto search for an optimal path in a graph, and based on thebehavior of ants seeking a path between their colony and asource of food. ACOA was first successfully applied to solvethe famous traveling salesman problem (TSP). As moreresearchers in recent years the study of the algorithm, ACOAwas successfully applied to quadratic assignment problem(QAP), shop scheduling problem (JSP), degree-constrainedminimum spanning tree problem (DCMST), vehicle rout-ing problem (VRP), integer programming problems, graphcoloring problem (GCP), etc., and made a series of goodexperimental results.

The ability of DNA computation to perform calculationsusing specific biochemical reaction between different DNAsequences by Watson-Crick complementary basepairing, af-fords a number of useful properties such as massive paral-lelism and a huge memory capacity [2], [3]. In the past 10years, many interesting applications have been demonstratedin many areas such as Hamiltonian path [4], associate mem-ory [5], breaking DES [6], logic circuits [7], data encryption[8], finite state machine [9], machine learning models [10]and medical diagnosis [11].

Though there have been many achievements, as the studiesof DNA computation get deeper and wider, DNA computa-

tion faces some hurdles due to the technological difficultyof handling biochemistry process and unknown factors. Toovercome these drawbacks, much works have focused onthe design of DNA sequences to reduce the possibility forillegal reactions [12]. In concrete terms, DNA codewordsdesign is to minimize the emergence of false hybridizationin DNA computation process through the optimization ofDNA sequences.

In this paper, ACOA will be used to solve the DNA code-words design problem. This paper is organized as follows.In section II, we make a brief introduction to ACOA andDNA codewords design. Section III gives an overview ofACS and ACS for the DNA codewords design. In SectionIV, the sequence generation results are shown, and finally abrief conclusion is drawn in Section V.

II. PROBLEM STATEMENT

A. Problem of DNA Codewords Design

The encoding problem for DNA computation can besimply defined as mapping the instances of an algorith-mic problem in a systematic manner onto specific DNAmolecules so that the following chemical reactions avoid allthese sources of error, and the resulting products contain,with a high degree of reliability, enough DNA moleculesencoding the answers to the problem’s instances to enable asuccessful extraction [13]. For this purpose, various combi-natorial and thermodynamic constraints such as H-measure,secondary structure, continuity, melting temperature and GCcontent are introduced to optimize the DNA sequence [14].The exiting methods for designing DNA codewords includeexhaustive search algorithm developed by Hartemink et al.[15], random search algorithm designed by Penchovsky andAckermann [16], template-map strategy proposed by Frutoset al. [17], directed graph method offered by Feldkamp etal. [18], simulated annealing algorithm presented by Tanakaet al. [19], dynamic programming method given by Maratheet al. [20], biological-inspired methods produced by Deatonet al. [21], [22] and Heitsch et al. [23], and evolutionaryalgorithm implemented by Deaton [24], [25], Zhang et al.[26], Shin et al. [27] and Cui et al. [28].

2010 International Conference on Artificial Intelligence and Computational Intelligence

978-0-7695-4225-6/10 $26.00 © 2010 IEEE

DOI 10.1109/AICI.2010.342

494

2010 International Conference on Artificial Intelligence and Computational Intelligence

978-0-7695-4225-6/10 $26.00 © 2010 IEEE

DOI 10.1109/AICI.2010.342

494

B. Principle and Model of ACOA

ACOA studies artificial systems that take inspiration fromthe behavior of real ant colonies that they are capable offinding shortest path from a food source to the nest andwhich are used to solve discrete optimization problems [1].The original idea comes from observing the exploitationof food resources among ants, in which ants individuallylimited cognitive abilities have collectively been able to findthe shortest path between a food source and the nest. Antsuse the pheromone as a medium of communication. Theyexchange information indirectly by depositing pheromone,all detailing the status of their “work”. The informationexchanged has a local scope, only an ant located wherethe pheromone were left has a notion of them. This systemis called “Stigmergy” and occurs in many social animalsocieties. The mechanism to solve a problem too complexto be addressed by single ants is a good example ofa self-organized system. Theoretically, if the quantity ofpheromone remained the same over time on all edges, noroute would be chosen. However, because of feedback, aslight variation on an edge will be amplified and thus allowthe choice of an edge. The algorithm will move from anunstable state in which no edge is stronger than another, toa stable state where the route is composed of the strongestedges.

ACOA is first applied to solve TSP [29]. Here we intro-duce the model of ACOA through solving the TSP. 𝑚 is thequantity of ants, 𝑑𝑖𝑗 is the distance between city 𝑖 and city𝑗. 𝜏𝑖𝑗 refers to the desirability of visiting city 𝑗 directly aftercity 𝑖 when the time 𝑡. The heuristic information is chosenas 𝜂𝑖𝑗 = 1/𝑑𝑖𝑗 , that is the heuristic desirability of goingfrom city 𝑖 directly to city 𝑗 is inversely proportional to thedistance between the two city.

In ACOA, 𝑘(𝑘 = 1, 2, ⋅ ⋅ ⋅ ,𝑚) ants concurrently build atour of the TSP. Initially, ants are put on randomly chosencities. At each construction step, ant 𝑚 applies a probabilis-tic action choice rule, called 𝑟𝑎𝑛𝑑𝑜𝑚𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛𝑎𝑙 rule, todecide which city to visit next. In particular, the probabilitywith which ant 𝑚, currently at city 𝑖, chooses to go to city𝑗 is

𝑃 𝑘𝑖𝑗 =

⎧⎨⎩

𝜏𝛼𝑖𝑗(𝑡)⋅𝜂

𝛽

𝑖𝑗(𝑡)∑

𝑠∈𝐽𝑘𝑖

𝜏𝛼𝑖𝑠(𝑡)⋅𝜂𝛽

𝑖𝑠(𝑡)

𝑠 ∈ 𝐽𝑘𝑖

0 otherwise

(1)

where 𝜂𝑖𝑗 = 1/𝑑𝑖𝑗 is a heuristic value that is available apriori, 𝛼 and 𝛽 are two parameters which determine therelative influence of the pheromone trail and the heuristicinformation, and 𝐽𝑘

𝑖 is the feasible neighborhood of ant 𝑘when being at city 𝑖, that is the set of cities which the ant𝑘 has not visited yet.

After each iteration of the algorithm, i.e., when all antshave completed a tour, trails are updated by means of thefollowing formula,

𝜏𝑖𝑗(𝑡+ 𝑛) = 𝜌 ⋅ △𝜏𝑖𝑗(𝑡) + (1 − 𝜌) ⋅ △𝜏𝑖𝑗 (2)

△𝜏𝑖𝑗 =

𝑚∑𝑘=1

△𝜏𝑘𝑖𝑗 (3)

where △𝜏𝑖𝑗 represents the sum of the contributions of allants the used move (𝑖, 𝑗) to construct their solution, 𝜌(0 ≤ 𝜌 ≤ 1) is a user-defined parameter called evaporationcoefficient.

△𝜏𝑘𝑖𝑗 =

{𝑄𝐿𝑘

if ant 𝑘 use arc(𝑖, 𝑗) in its tour0 otherwise

(4)

𝑄 being a constant parameter, and 𝐿𝑘 is the length of thetour found by ant 𝑘.

The ant system simply iterates a main loop where 𝑚 antsconstruct in parallel their solutions, thereafter updating thetrail levels. The performance of the algorithm depends on thecorrect tuning of several parameters, namely, 𝛼, 𝛽: relativeimportance of trail and attractiveness; 𝜌: trail persistence;𝜏𝑖𝑗(0): initial trail level; 𝑚: number of ants; and 𝑄: usedfor defining to be of high quality solutions with low cost.

III. DNA CODEWORDS DESIGN BY USING ACOA

A. Construct Cities

Assume we want get a set of DNA codewords with 𝑆sequences. First, randomly generate 𝑇 × 𝑆 sequences (𝑇is a constant parameter determine the time complexity ofcomputing), then we can construct a city matrix with 𝑇 rowsand 𝑆 columns, in this matrix, every sequence is a city, justas the following Figure 1 shows,

Figure 1. Construct 𝑇 × 𝑆 cities

In these 𝑇 × 𝑆 cities, ants can only tour from (𝑘 − 1)thto 𝑘th column (realize through design the tabu list).

B. State Transition Rule

During the construction of a new solution the state tran-sition rule is the phase where each ant decides which is thenext state to move to. Here, 𝜏𝑘𝑎𝑏 is the pheromone betweencity 𝑎 in (𝑘−1)th column and 𝑏 in 𝑘th column, the quantity

495495

of ants is 𝑁 , 𝑇 (𝑛,𝑚) is the 𝑚th step during a tour of ant𝑛(𝑛 = 1, 2, ⋅ ⋅ ⋅ , 𝑁).

First, we initialize every 𝜏𝑘𝑎𝑏 with a small 𝜏0, and every antset of from the city belongs to column one, i.e. 𝑇 (𝑛, 1) =City[𝑖][1](𝑖 = 1, 2, ⋅ ⋅ ⋅ , 𝑆).

Every ant decides the next state according the followingequation,

𝑇 (𝑛, 𝑘) =

{𝑅𝑎𝑛𝑑𝑜𝑚𝑙𝑦 if all 𝜏𝑘𝑎𝑏 are equalarg max

{𝜏𝑘𝑎𝑏

}otherwise

(5)

where, Randomly denotes that the next state was chosenrandomly.

While all ants have completed a tour and achieve the 𝑆thcolumn city, the algorithm complete a iteration. Stagnationis the undesirable situation in which all ants repeatedlyconstruct the same solutions making any further explorationin the search process impossible, to avoid stagnation, herewe use a local pheromone update rule that they applyimmediately after having crossed an arc (𝑖, 𝑗) during thetour construction,

𝜏𝑘𝑇 (𝑛,𝑘−1),𝑇 (𝑛,𝑘) = (1− 𝜁)× 𝜏𝑘𝑇 (𝑛,𝑘−1),𝑇 (𝑛,𝑘) + 𝜁𝜏0 (6)

where 𝜁, 0 < 𝜁 < 1, and 𝜏0 are two parameters. The value of𝜏0 is set to be the same as the initial value for the pheromonetrails. A good value for 𝜁 was determined by experiment.

C. Evaluate the Route

So far, every ant has constructed a route from 1th columnto 𝑆th column, how to evaluate the 𝑙𝑒𝑛𝑔𝑡ℎ of this route isvery important. Here, we regard the 𝑆 cities in a route asa set of DNA codewords,and evaluate the DNA codewordsfrom the following four aspects.

1) Melting Temperature: The melting temperature is animportant factor in the efficiency of the reaction, it is thetemperature in equilibrium at which 50% of the oligonu-cleotides have hybridized to their perfect complement and50% of the oligonucleotides are separated. The accurateprediction of 𝑇𝑚 is particularly critical in the case of thepolymerase chain reaction (PCR). Large errors in the 𝑇𝑚

estimation can lead to the amplification of non-specificproducts or to an inappropriate hybridization performancein general. There are many equations to calculate meltingtemperature such as the Wallace 2-4 rule [30], the GC%method [31], and the nearest-neighbor model [32]. It hasbeen demonstrated that the method and thermodynamic pa-rameters provided by Santa Lucia have a good performancein predicting the experimental 𝑇𝑚 of short single-strandedDNA sequences [32]. The 𝑇𝑚 calculation is performedaccording to the following equation,

𝑇𝑚(𝑥𝑖) =Δ𝐻

Δ𝑆 +𝑅× ln(𝐶/4)− 273.15 (7)

where 𝑇𝑚(𝑥𝑖) is the melting temperature of one DNAsequence, 𝑅 is the gas constant (1.987𝑐𝑎𝑙/𝑚𝑜𝑙 ⋅ 𝐾), 𝐶 isthe concentration, Δ𝑆 and Δ𝐻 denote entropy change andenthalpy change under a certain temperature between everybase, respectively.

The finally formulation of the evaluation equation ofmelting temperature is shown as follow,

𝐹𝑇𝑚(𝑅) =

𝑆∑𝑖=1

𝑓𝑇𝑚(𝑥𝑖) (8)

𝑓𝑇𝑚(𝑥𝑖) = [𝑇𝑚(𝑥𝑖)− 𝑇𝑚]2 (9)

where 𝑇𝑚 is the desired value defined by user.2) GC Content: The GC content, 𝐺𝐶(𝑥𝑖), is the ratio

of base-nucleic acid codes G and C of sequence 𝑥𝑖. GCcontent is the percentage of G (guanine) and C (cytosine) ina sequence. As there are three hydrogen bonds between G(guanine) and C (cytosine), and only two hydrogen bondsbetween A (adenine) and T (thymine), so the GC Content isvery important to maintain the chemical properties stabilityof one sequence. To effectively reduce the probability of anon-specific hybridization occurring, GC content must belimited in certain range. 𝐺𝐶(𝑥𝑖) is written as

𝐺𝐶(𝑥𝑖) =#𝐺+#𝐶

∣𝑥∣(10)

where #𝐺 and #𝐶 denote the amount of G and C insequence 𝑥 respectively, ∣𝑥∣ indicates the amount of basesin sequence 𝑥. Normally, 𝐺𝐶(𝑥) ∈ [0.4, 0.6].

The finally formulation of the evaluation equation of 𝐺𝐶content is shown as follow,

𝐹𝐺𝐶(𝑅) =

𝑆∑𝑖=1

𝑓𝐺𝐶(𝑥𝑖) (11)

𝑓𝐺𝐶(𝑥𝑖) = [𝐺𝐶(𝑥𝑖)−𝐺𝐶]2 (12)

where 𝐺𝐶 is the desired value defined by user.3) Continuity Constraint: If the same base appears con-

tinuously (such as “AAAAA”), the structure of DNA willbecome unstable. For more controllable experiments, wecalculate the degree of successive occurrence of the samebase with the following equation,

𝐹𝐶𝑜𝑛(𝑅) =𝑛∑

𝑖=1

⎡⎣ max𝑗=1,⋅⋅⋅,∣𝑥∣

∣𝑥∣∑𝑘=𝑗+1

𝑢(𝑥𝑖𝑗 , 𝑥𝑖𝑘 )− 𝐶𝑜𝑛′ + 1

⎤⎦2

(13)where 𝐶𝑜𝑛′ is defined by user, it means the maximumsuccessive occurrence of the same base which user cantolerate.

𝑢(𝑥𝑖𝑗 , 𝑥𝑖𝑘) =

{1 if 𝑥𝑖𝑗 = 𝑥𝑖𝑘

break otherwise(14)

496496

where “break” means terminate the calculation of currentstep.

4) H-measure: In this paper, we induce Hamming dis-tance and Reverse Hamming distance to prevent mis-hybridization,

𝐹𝐻𝑑(𝑅) = 𝑈 −

𝑛∑𝑖=1

𝑓𝐻𝑑(𝑥𝑖) (15)

𝑓𝐻𝑑(𝑥𝑖) =𝑛∑

𝑗=1

(𝐻𝑑(𝑥𝑖, 𝑥𝑗) +𝐻𝑑(𝑥𝑖,−→𝑥𝑗)) (16)

𝐻𝑑(𝑥𝑖, 𝑥𝑗) =

∣𝑥∣∑𝑚=1

𝑑(𝑥𝑖𝑚 , 𝑥𝑗𝑚) (17)

𝑑(𝑢, 𝑣) =

{1 if 𝑢 = 𝑣0 otherwise

(18)

where 𝑈 is a constant parameter defined by user, −→𝑥𝑗 is thereverse sequence of 𝑥𝑗 .

Under the above analysis, we design a evaluation functionbased on the weight,

𝐹 (𝑅) =∑𝑖

𝜛𝑖⋅𝐹𝑖 (19)

where 𝑖 ∈ {𝑇𝑚, 𝐺𝐶,𝐶𝑜𝑛,𝐻𝑑}, 𝜛𝑖 are the weight of forcriteria, experimentally, they are 0.2311, 0.1347, 0.3100 and0.3242, respectively.

So far, we can evaluate a route objectively through equa-tion (19), then we can get the ant which has the shortestroute.

𝑛min = arg min{𝐹 (𝑅𝑗)} (20)

At last, we implement the global pheromone updateaccording to the optimal route from (20), and the globalpheromone update can be illustrate by (21),

𝜏𝑘𝑖𝑗 = (1− 𝛼)× 𝜏𝑘𝑖𝑗 + 𝛼 ⋅ 𝐹 (𝑅𝑛min)−1 (21)

where 𝛼 ∈ (0, 1) is the global pheromone updateparameter,𝑖 = 𝑇 (𝑛min, 𝑘 − 1), 𝑗 = 𝑇 (𝑛min, 𝑘), 𝑘 ∈ [2, 𝑆].

D. Crossover and Mutation

To enhance the search ability of ACO, we introducecrossover and mutation operation.

1) Crossover: Inspired by the genetic algorithm, to en-hance the global search ability of ant colony algorithm,weintroduce crossover operator. The crossover operators weredivided into individual level operation and sequence leveloperation [14]. Individual level operation is regard as anexchange of member sequences between two individuals,and sequence level operation is the same as crossover insimple genetic algorithms. These two process are illustratedin Figure 2 and Figure 3 respectively.

Figure 2. Crossover between individuals

Figure 3. Crossover between sequences

2) Mutation: Introduced the mutation operation can im-prove the local search ability of the ant colony algorithm.Considering the convergence speed, we only adopt the mu-tation to bases, this is same as the binary mutation operationin the basic genetic algorithm [33].

E. The Flow Chart

The flow chart of the ant colony algorithm to solve theDNA codewords design problem was illustrated in Figure 4.

IV. EXPERIMENTAL RESULTS

We implement the above algorithm on Pentium DualE2104, 1.6GHZ 1.6GHZ, 512M, Microsoft XP and writtenin Matlab 7.0. The parameter settings are given as follows,the length of sequences is 20, 𝑆 = 20, 𝑇 = 20, the quantityof ants is 𝑁 = 30, 𝜁 = 0.7, 𝛼 = 0.6, individual crossoverrate 𝛾1 = 0.05, the sequence crossover rate 𝛾2 = 0.2,crossover rate 𝜓 = 0.01, the evolutionary generations𝐺 = 400. Table I lists the sequences generated by ant colonyalgorithm and Figure 5 shows the convergency of ant colonyalgorithm.

We compared the codewords generated randomly by ACOwith [19] and [27], from secondary structure, to Hammingdistance and melting temperature the ACO shows betterperformance. The Figure 5 shows that ACO has goodconvergence rate.

V. CONCLUSION

In this paper, we have proposed an ant colony algorithmfor designing suitable DNA codewords satisfying the def-inition of the encoding problem in DNA computation. Weintroduced the crossover and mutation operator to ant colony

497497

Figure 4. The flow chart of the ACO

Table ISEQUENCES GENERATED BY ANT COLONY ALGORITHM

Sequences (5′ → 3′) Con 𝐺𝐶 𝐻𝑚 𝑇𝑚

TTATGTGGCGAGTGGTTCCT 0 0.55 268 65.9902CCATGACCGAGGATCCAGCT 0 0.60 256 68.5855GTATTGAACAGACCTAGACG 0 0.45 287 63.5028AATAGGGCGGAGAAGAGGCT 2 0.50 258 67.4031CGCTACTATATGACACCGCT 0 0.50 254 64.6527

GTCGCGGACTGAGATAGCCA 0 0.55 289 65.4302CCATAACCGAACCGACTGTA 0 0.50 245 63.8384CGTGACAAGTTCTATCCATG 0 0.45 294 64.7618GCTCATCAGTGTGCTACTCT 0 0.50 248 64.7618CCTCTACCAGCCAATGATGC 0 0.55 284 65.2329TGGCCCAAGAAGAGCTTTGA 2 0.50 263 63.2197ATCATACTCCGGAGACTACC 0 0.50 302 65.9024

GCAATGCCAGGAGAATGACT 0 0.50 282 63.8206ACCGCGTCTACCGAAGAATT 0 0.50 257 62.1413CGCTACAGAATGGATAGAGA 0 0.45 291 65.2278GCAAAGTGGGTGATAAAGCG 3 0.50 312 62.3880GATGGTTCAGTATACGTGTA 0 0.40 252 65.6801CGTTACGGTCCTCTTACTGA 0 0.50 280 63.8206TTCTACTGTGTAGCAGTAGA 0 0.40 276 67.3120GTGGACGGGATTCCGGACTA 1 0.60 276 69.1254

algorithm. This manner improves the algorithm’s searchability, and is good for keep the population diversity. TheACO provides a new way of solving the DNA encodingproblem. The DNA codewords designed by the ACO havebeen compared with those designed by other existing se-quence design methods. The results show the feasibility andvalidity of this method.

Figure 5. The convergency of the ACO

ACKNOWLEDGMENT

The work for this paper was supported by the Na-tional Science Foundation of China (Grant No. 60573190,60773122, 60970084), Basic and Frontier Technology Re-search Program of Henan (Grant No. 082300413203,092300410166), and Innovation Scientists and Techni-cians Troop Construction Projects of Henan (Grant No.094100510022).

REFERENCES

[1] M. Dorigo, “Optimization, learning and natural algorithms,”Ph.D. dissertation, Dept. of Electronics, Politecnico di Mi-lano, Italy, 1992.

[2] C. C. Maley, “DNA computation: Theory, practice, andprospects,” Evolutionary Computation, vol. 6, no. 3, pp. 201–229, 1998.

[3] M. Garzon and R. Deaton, “Biomolecular computing and pro-gramming,” IEEE Transaction on Evolutionary Computation,vol. 3, no. 3, pp. 236–250, 1999.

[4] L. Adleman, “Molecular computation of solutions to combi-natorial problems,” Science, vol. 266, no. 5187, pp. 1021–1024, 1994.

[5] E. B. Baum, “Building an associative memory vastly largerthan the brain,” Science, vol. 268, no. 5210, pp. 583–585,1995.

[6] D. Boneh, C. Dunworth, and R. J. Lipton, “Breaking DESusing a molecular computer,” in Proceedings of the 1stDIMACS Workshop on DNA Based Computers, R. J. Liptonand E. B. Baum, Eds., Princeton University, Princeton, NJ,USA, 1995, pp. 37–66.

[7] F. Guarnieri, M. Fliss, and C. Bancroft, “Making DNA add,”Science, vol. 273, no. 5272, pp. 220–223, 1996.

[8] L. M. Adleman, P. W. K. Rothemund, S. Roweis, andE. Winfree, “On applying molecular computation to thedata encryption standard,” Journal of Computational Biology,vol. 6, no. 1, pp. 53–63, 1999.

498498

[9] M. Stojanovic and D. Stefanovic, “A deoxyribozyme-basedmolecular automaton,” Nature Biotechnology, vol. 21, no. 9,pp. 1069–1074, 2003.

[10] H. W. Lim, H. M. Jang, S. M. Ha, Y. G. Chai, S. I. Yoo, andB. T. Zhang, “A lab-on-a-chip module for bead separationin DNA-based concept learning,” Lecture Notes in ComputerScience, vol. 2943, pp. 1–10, 2004.

[11] Y. Benenson, B. Gil, U. Ben-Dor, R. Adar, and E. Shapiro,“An autonomous molecular computer for logical control ofgene expression,” Natural, vol. 429, no. 6990, pp. 423–429,2004.

[12] A. Brenneman and A. E. Condon, “Strand design for bio-molecular computation,” Theoretical Computer Science, vol.287, no. 1, pp. 39–58, 2002.

[13] M. H. Garzon and R. J. Deaton, “Codeword design and in-formation encoding in DNA ensembles,” Natural Computing,vol. 3, no. 3, pp. 253–292, 2004.

[14] S.-Y. Shin, I.-H. Lee, D. Kim, and B.-T. Zhang, “Multi-objective evolutionary optimization of DNA sequences forreliable DNA computing,” IEEE Transaction on EvolutionaryComputation, vol. 9, no. 2, pp. 143–158, 2005.

[15] A. J. Hartemink, D. K. Gifford, and J. Khodor, “Automatedconstraint based nucleotide sequence selection for DNA com-putation,” Biosystems, vol. 52, no. 1-3, pp. 227–235, 1999.

[16] R. Penchovsky and J. Ackermann, “DNA library design formolecular computation,” Journal of Computational Biology,vol. 10, no. 2, pp. 215–230, 2003.

[17] A. G. Frutos, Q. Liu, A. J. Thiel, A. M. W. Sanner, A. E.Condon, L. M. Smith, and R. M. Corn, “Demonstration ofa word design strategy for DNA computing on surfaces,”Nucleic Acids Research, vol. 25, no. 23, pp. 4748–4757, 1997.

[18] U. Feldkamp, W. Banzhaf, and H. Rauhe, “A DNA sequencecompiler,” in Proceedings of the 6th DIMACS Workshop onDNA Based Computers, A. Condon and G. Rozenberg, Eds.,University of Leiden, Netherlands, June 2000, p. 253.

[19] F. Tanaka, M. Nakatsugawa, M. Yamamoto, T. Shiba, andA. Ohuchi, “Toward a general-purpose sequence designsystem in DNA computing,” in Proceedings of the IEEECongress of Evolutionary Computation (CEC’02), Honolulu,HI, USA, 2002, pp. 73–78.

[20] A. Marathe, A. E. Condon, and R. M. Corn, “On com-binatorial DNA word design,” in Proceedings of the 5thDIMACS Workshop DNA Based Computers, E. Winfree andD. K. Gifford, Eds., Massachusetts Institute of Technology,Cambridge, MA, USA, 1999, pp. 75–89.

[21] R. Deaton, J. Chen, H. Bi, M. Garzon, H. Rubin, andD. H. Wood, “A PCR-based protocol for in vitro selectionof non-crosshybridizing oligonucleotides,” Lecture Notes inComputer Science, vol. 2568, pp. 196–204, 2003.

[22] R. Deaton, J. Chen, H. Bi, and J. A. Rose, “A softwaretool for generating non-crosshybridizing libraries of DNAoligonucleotides,” Lecture Notes in Computer Science, vol.2568, pp. 252–261, 2003.

[23] C. E. Heitsch, A. E. Condon, and H. H. Hoos, “FromRNA secondary structure to coding theory: A combinatorialapproach,” Lecture Notes in Computer Science, vol. 2568, pp.215–228, 2003.

[24] R. Deaton, R. C. Murphy, J. A. Rose, M. Garzon, D. R.Franceschetti, and S. E. Stevens, Jr., “A DNA based im-plementation of an evolutionary search for good encodingsfor DNA computation,” in Proceedings of IEEE InternationalConference on Evolutionary Computation, Indianapolis, IN,USA, Apr. 1997, pp. 267–271.

[25] R. Deaton, R. C. Murphy, M. Garzon, D. R. Franceschetti,and S. E. Stevens, Jr., “Good encodings for DNA-basedsolutions to combinatorial problems,” in Proceedings of the2nd DIMACS Workshop on DNA-based Computers, PrincetonUniversity, USA, June 1996, pp. 159–171.

[26] B.-T. Zhang and S.-Y. Shin, “Molecular algorithms for effi-cient and reliable DNA computing,” in Genetic Programming1998: Proceedings of the Third Annual Conference, Univer-sity of Wisconsin, Madison, Wisconsin, USA, 1998, pp. 735–744.

[27] S. Y. Shin, D. M. Kim, I. H. Lee, and B. T. Zhang, “Evolu-tionary sequence generation for reliable DNA computing,” inProceedings of the IEEE Congress of Evolutionary Compu-tation (CEC’02), Honolulu, HI, USA, May 2002, pp. 79–84.

[28] G. Cui, Y. Niu, Y. Wang, X. Zhang, and L. Pan, “A newapproach based on PSO algorithm to find good computionalencoding sequences,” Progress in natural science, vol. 17,no. 6, pp. 712–716, 2007.

[29] M. Dorigo and L. M. Gambardella, “Ant colony system:A cooperative learning approach to the traveling salesmanproblem,” IEEE Transactions on Evolutionary Computation,vol. 1, no. 11, pp. 53–66, 1996.

[30] R. Wallace, J. Shaffer, R. F. Murphy, J. Bonner, T. Hirose,and K. Itakura, “Hybridization of synthetic oligodeoxyribonu-cleotides to phi chi 174 DNA: the effect of single base pairmismatch,” Nucleic Acids Research, vol. 6, no. 11, pp. 3543–3547, 1979.

[31] J. G. Wetmur, “DNA probes: Applications of the principles ofnucleic acid hybridization,” Critical Reviews in Biochemistryand Molecular Biology, vol. 26, no. 3-4, pp. 227–259, 1991.

[32] J. SantaLucia, “A unified view of polymer,dumbbell, andoligonucleotide dna nearest-neighbor thermodynamics,” Proc.Natl Acad. Sci. USA, vol. 95, no. 4, pp. 1460–1465, 1998.

[33] Z. Michalewicz, Genetic Algorithm + Data Structure =Evolution Programs (3rd ed.). New York: Springer-Verlag,1999.

499499