biomolecular final1

8/13/2019 Biomolecular Final1

1/9

Biomolecular Computing

Molecular computing is a discipline that aims at harnessing individual molecules at nanoscales for computational purposes. The best-

studied molecules for this purpose to date have been DNA and bacteriorhodopsin. Biomolecular computing allows one to realistically

entertain,for the first time in history,the possibility of eploiting the massive parallelism at nanoscales inherent in natural phenomenato solve computational problems. The implementation of evolutionary algorithms in biomolecules would bring full circle the

biological analogy and present an attractive alternative to meet large demands for computational power.

INTRODUCTION

The notion of harnessing individual molecules at nanoscales for computational purposes is an idea that can be traced bac! at least to

the time when electronic computers were being constructed in the "#$%&s. 'lectrons are, in fact, orders of magnitude smaller than

molecules, but over are re(uired )ust to communicate a carriage return to a conventional processor. The idea of improving the

efficiency of hardware utili*ation using biomolecules is attractive for several reasons. +irst, hardware is inherently

arallel, and parallelism is a good way to handle computational bottlenec!s. econd, bimolecules occur abundantly in nature, for

eample,inside all !nown living cells with eu!aryote/ and without pro!aryote/ nuclei, and constitute the basic substratum of life

0onse(uently, they have developed a structure that enables them to solve a number of difficulties for parallel computing, such asmassive communication over noisy media and load balancing problems, by mechanisms that we may not even be aware of.

+urthermore,short biomolecules can now be synthesi*ed at low cost.

THE ORIGINS OF MOLECULAR COMUTING

1ately, advances in computer science have been characteri*ed by the computational implementation of well-established biological

paradigms. Notable advances are artificial neural nets, inspired by the brain, and it&s obvious connection to natural intelligence, andevolutionary computation, inspired by the Darwinian paradigm of natural selection. 'arly ideas of molecular computing attempted to

emulate conventional electronic implementations in other media, e.g., implementing Boolean gates in a variety of ways. A

fundamental brea!through characteristic of a new era was made by Adleman&s "##$ paper where he reports an eperiment performed

with molecules of fundamental importance for life,

DNAdeoyribonucleic acid/ molecules, to solve a computational problem !nown to be difficult for ordinary computers, namely the

2amiltonian path problem 2/. This problem is typical of an elite set of problems in the well-!nown compleity class N that

eemplify the computational difficulty of search

rocedures that plague a number of very important applications in combinatorial optimi*ation, operations research, 3 numerical

computation. Adleman&s eperiment ushered in a new computational paradigm in molecular computing for several reasons. +irst, it

showed that it is indeed possible to orchestrate individual molecules to perform computational tas!s. econd, it showed the enormouspotential of DNA molecules for solving problems beyond the reach of conventional computers that have been or may be developed in

the future based on solid-state electronics. hortly after, in "##4, the first conference on DNA-based computing was organi*ed a

rinceton 5niversity, and several events have been held since annually.

A) Adlemans Landmark Experiment

6n this section, we present the essential technical details of Adleman&s eperiment. The 2 is defined precisely as follows.

Instance7 a directed graph and two vertices, source and destination8

Question7 yes9no, there is a path following arcs in the graph connecting the source to the destination vertices and passingthrough each other verte eactly once.

As mentioned before, this problem is N-complete, i.e., it is representative of many of the difficulties that afflict conventional

computers for solving very important problems in combinatorial optimi*ation and operations research. 'ach complete problem in N

contains all problems in the class N as special cases after some rewording and is characteri*ed by the fact that their solutions are

easily verifiable, but etremely difficult to find in a reasonable amount of search time. The best-!nown general techni(ues to apply to

these problems amount essentially to an ehaustive search through all possible solutions, loo!ing for satisfaction of the constraints

re(uired by the problem. 6t is therefore an ideal candidate for a brand new computational approach using molecules.Adleman&s brilliant insight was to carefully arrange a set of DNA molecules so that the chemistry that they naturally follow would

perform the brunt of the computational process. The !ey operations in this chemistry are stic!ing operations that allow the basic

nucleotides of nucleic acids to form larger structures through the processes of ligation and hybridi*ation more below in ection 666-

A/. The first DNA-based molecular computation is summari*ed in +ig. ". pecifically, Adleman Assigned well-chosen/ uni(ue


2/9

single-stranded molecules to represent the vertices, used :atson;0ric! complements of the corresponding halves to represent edges

)oining two vertices, and synthesi*ed a picomol of each of the


3/9

SOME SUCCESS STORIES

6n this section, we give a brief description of some of the problems for which molecular protocols have been or are being implemented

successfully. 'ach story that follows either illustrates a basic techni(ue in molecular computing or has successfully mar!ed definite

progress in the lab. Before proceeding further, however, we need to give a more precise description of the molecular biologica

bac!ground as well as a characteri*ation of the basic methodology employed in molecular computing.

A) "arallel O#erlap Assembl$

erhaps the foremost advantage of computing with molecules is the ready parallelism in which molecular operations ta!e place. Thebest way to eploit this parallelism is to perform the same operation simultaneously on many molecules. Adleman&s basic techni(ue

can be characteri*ed as a generate-and-filter techni(ue, i.e., generate all possible paths and filter out those that are not 2amiltonian. 6n

this approach, one must be sure to generate all the possible solutions to the problem, a!in to ma!ing sure that the data structure for a


4/9

chromosome captures all possible solutions in an evolutionary algorithm. Many protocols in molecular computing eploit this method

for eample Boolean formula evaluation see the net section for some references/.

6t is therefore important to be able to generate all potential solutions to a problem. A procedure called parallel overlapassembly has

been used in molecular biology for gene reconstruction and DNA shuffling. 6t has been successfully used by ?uyang et al.in a lab

eperiment to solve an instance of another N-complete graph problem, MA@-0165'. The procedure consists of iterations ofthermal cycles that anneal given shorter DNA segments in random ways to produce larger molecules representing potential solutions

>elated procedures have been used to improve solutions to 2 by Arita et al.

B)Boolean%Ciruit E#aluation

Another important approach in molecular computing is the implementation of Boolean circuits because that would allow importing to

the world of molecules the vast progress that has been made on information processing in electronic computers. A successful

implementation of Boolean circuits would lead to the construction of ordinary computers in biomolecules, particularly of parallel

computers. 1ipton presented an early proposal for Boolean circuit evaluation as a solution to AT Boolean formula satisfiability/ andthereby problems in the class N. ?gihara and >ay have suggested protocols for implementing Boolean circuits that run in an amount

of time that is proportional to the si*e of the circuit. Amos et al. improved the implementation to have run time that is proportional to

the depth of the circuit. 6n the protocol suggested by the latter, for eample, input bits are represented by well-chosen l -mers and y

that are present in the tube if and only if the corresponding Boolean variables have value one. The gates are represented by l -mer

that contain segments that are complementary to the input molecules and the output of the gate. :ithout loss of generality, one

can assume all gates are simply NAND-gates since this operator is logically complete./ A typical evaluation of a NAND is made by

placing in a tube the values of the inputs e(ual to one and allowing the formation of a double strand that represents the evaluation, as

illustrated in +ig.


5/9

D) (inite%State and urin! Mahine *mplementation

There were other simultaneous attempts to implement state machines, particularly finite-state machines +M/. ar*onet al.suggested

implementation of nondeterministic finite state machines that are self-controlled and fault tolerant. +ig. shows an +M where

various moves are possible from a

particular state % on the same input *ero. Nondeterministic computation is at the core of the difficulties that Adleman&s originaleperiment was designed to overcome. ?n the other hand, nondeterminism is supposed to be well-understood in the contet of finite-

state machines, because a so-called subsetconstruction produces a deterministic e(uivalent of a given nondeterministic +M. 6t is conceivable that greater insight about the

virtues of molecular computing may be gained by loo!ing for ways to implement nondeterminism efficiently as a native mode of

computation in a fault-tolerant and efficient way.

The implementation re(uires a dynamic molecule to represent the changing states of the +M that is capable of detecting its inputs in

its environment. 6t can be a doublestranded molecule containing a segment encoding for the current state and another segment

encoding the last symbol read that led to the current state. ?ther molecules representing inputs are added with appropriate overhangs,

which upon hybridi*ation, create restriction sites that allow cleaving with appropriate en*ymes to detach the old state and create a new

molecule that reflects the new state. Nondeterministic transitions occur because of the various possibilities in the hybridi*ationprocess. 6f run uncontrolled, the protocol will

soon produce too many copies of the finite control in the same state and thereby thwart the efficiency of the computation. The !ey to

the success in the subset construction to determini*e an +M is that whenever two nondeterministic copies of the machine find

themselves in the same state, one can safely discard one of them since their runs will be identical thereafter. 6t is desirable to have a

protocol that renders the implementation efficient in the tube, i.e., that it will self regulate to produce approimately e(ua

concentrations of the molecules representing the various states.

E)Cellular Automata 'uns

A new direction originated in :infree&s attempts to show that abstract tilings, used earlier to show computational universality, can

actually be implemented in the lab. 2e has used the @?>-rule in a two-dimensional


6/9

been created. Naturally, other devices were re(uired to detect these tiny structures at the nanoscales in which they eist, namely

atomic-force microscopy.

() Other Appliations+ *s here a ,-iller Appliation./

Molecular computing has generally aimed, so far, at solving the same ordinary algorithmic problems that are commonly posed forconventional C16-based computers, albeit by an entirely different type of operational process. None of them has ehibited the !ind

of practical success that would be considered satisfactory to answer 1ipton&s impromptu call for a I!iller appJ at the second DNA-

based wor!shop in

rinceton. uch an application would suit well the nature of biomolecules, beat current and perhaps even future solidstate electronics

and would establish beyond the shadow of a doubt the power of the new computational paradigm. 1andweber et al. have proposed that

DNA se(uencing,DNA fingerprinting, and9or DNA population screening are good candidates. They would re(uire a new approach in

the way we conceive and do molecular computation now.

pecifically, thus far a practitioner is assumed to !now the composition a digital string/ of the molecules that initially encodeinformation and their subse(uent transformations in the tube. This methodology re(uires going bac! and forth between the digital and

analog DNA world, by se(uencing when compositions are un!nown/, which is an epensive step, and the converse step, synthesis of

DNA. This so-called approach bypasses digiti*ing by operating directly on un!nown segments xof DNA, using !nown

molecules, in order to compute a predetermined function f/ that specifies the computational tas!. The fact that the first five years ofwor! in the field have not, however, produced one such !iller application would ma!e some people thin! that perhaps fundamental

scientific and9or technological difficulties have to be overcome before one effectively appears on the scene. These proposals can be

thus regarded as challenges, rather than established results, and will be discussed in the following section.

0'AND C1ALLEN0ES (O' MOLEC2LA' COM"2*N0

A3 'eliabilit$4 Effiien$4 and Salabilit$

>eliability, efficiency, and scalability are perhaps the three most burning issues for molecular computing. The reliability of a protocoli.e., a DNA computation, is the degree of confidence with which a lab eperiment provides a true answer to the given problem. The

efficiency of the protocol refers to the intended and effective use of the molecules that intervene in it. The scalability of a lab

eperiment is the effective reproducibility of the eperiment with longer molecules that can encode larger problem instances while

still obtaining e(ually reliable results under comparable efficiency. These three are distinct but clearly interrelated problems

Biologists have not really faced these problems in their wor! because in that field the definition of success is different than in

computer science. :hen a biologist claims that she has cloned an organism, for eample, the contention is that one eperiment was

successful, regardless of how many were previously not or whether only one clone was actually produced./ >esearch on these

problems in molecular computing has )ust begun. Most wor! has concentrated on reliability, and we proceed to s!etch it, in the guise

of a more basic and important problem7 the encoding problem. This is a good eample in which molecular computing will probably

have a feedbac! effect on the notions of efficiency and scalability in biology.

B3 he Enodin! "roblem

?nce the encoding molecules for the input of a problem have been chosen, a molecular computer scientist is at the mercy of the

chemistry, even though she may still have some control over the protocols that she may perform with them in the laboratory

eecution. 6f the encodings are prone to

'rrors, the eperiment can be repeated any number of times and always provide the same erroneous/ results, as evidenced in .This

fact lessens the effectiveness of the standard method of increasing the reliability of a probabilistic computation with a non*ero

probability of errors by iteration. A different analysis of the problem was initiated by Baum, where it is assumed that undesirable

errors will occur only if repetitions or complementary substrands of a certain minimum stic!ing length k7KLL appeared in the

encoding. The problem is that the uncertain nature of hybridi*ations may plague the separators that are used to prevent the problem, soa more thorough approach appears to be necessary. A mismatched hybridi*ation is a bound pair of oligonucleotides that contains at

least one mismatched pair. 6n addition to frame shift errors in which the n-mers are shifted relative to each other, mismatches leadingto false positives include hairpin mismatches, bulges, and partial hybridi*ations. The encoding problem for DNA computing thus

consists of mapping the instances of an algorithmic problem in a systematic manner onto specific molecules so that7 a/ the chemical

protocols avoid all these sources of error and b/ the resulting products contain, with a high degree of reliability, enough molecules

encoding the answers to the problem&s instances to enable a successful etraction.

An optimal encoding would maimi*e the li!elihood of desired hybridi*ations while minimi*ing the occurrence of undesirable

hybridi*ations, and furthermore, would lead to e(uilibrium reaction conditions that are favorable for retrieving the solution of the

problem in the etraction phase. 0learly,

the encoding of a problem for a molecular solution has to be decided beforehand by means presumably different from DNAcomputation.

Thus, in its full generality, we have the following algorithmic problem.


7/9

The function reflects a desirable (uality criterion for the protocol and can be given by mapping .olving the

encoding problem re(uires identifying appropriate criteria that capture the relevant chemistry, and moreover, give algorithms toproduce good encodings that will satisfy constraints a/ and b/.The most natural and fitting criteria can be found in the thermodynamics

that governs the hybridi*ation and ligation processes. 5ltimately, it comes down to the ibbs free-energy that nucleotides release

during hybridi*ation in passing to a lower energy state of a bound pair. The thermodynamics of hybridi*ations are fairly well-!nown

see :etmur for a survey of relevant facts, as well as anta1ucia et al. The basic (uantity is the melting temperature of a given

double strand, which is defined as the temperature at which half of a homogenous population of such double strands will have

denatured into single strands. The controlling parameters of a melting temperature are strand composition, strand concentration, and

various other solvent properties, such as p2 of the solution. Despite some fundamental wor!, this approach based on melting

temperatures has not really produced a systematic and productive way to produce good encodings. uch encodings can actually be

obtained through evolutionary searches, either in vitroor in silicothat utili*e fitness functions based on one or some of these factors

or through the use of heuristics for special-purpose encodings. +inding appropriate general metrics in oligonucleotide space and

practical solutions to the corresponding restriction of DNA 'N0?D6N is an important problem for DNA-based computing. 6n

general, even for a single good choice of (uality criterion , the encoding problem as stated is very li!ely to be N complete, i.e., asdifficult as the problem it is supposed to help solve, and so it would not admit general solutions. >elaations of the problem need to be

considered.

C3 Error%"re#entin! Codes

6t is conceivable that a more principled computational approach can produce solutions of the encoding problem that capture physico-

chemical conditions that are good enough to be validated by lab eperiments. erhaps the best eample is the combinatorial approach

proposed by the molecular computing group in Memphis. The cru of the approach is to regard an eperiment for a molecularcomputation as the transmission of a message from the protocol to the eperimentalist through a noisy channel, namely the tubes/ in

which the reactions ta!e place. The theory of communication introduced by hannon has found effective ways to handle this problem

by introducing redundancy to protect against noise. The solutions are the so-called error-correcting codes for data transmission that

information theorists have spent the last 4% years designing. The mathematical framewor! is the metric space of Boolean hypercubes

with the standard binary 2amming metric. 6n the case of information encodings in biomolecules, one can easily generali*e the2amming distance to the four-letter alphabet

A,0,,T using :atson;0ric! complementarity. This generali*ed 2amming metric gives some (uantification of the hybridi*ation

li!elihood of the molecules in the reaction. This possibility has been eplored in several papers. The problem is that oligos at a large

2amming distance can still hybridi*e perfectly at the overlap after a shift, such as in the case of the two strands in . The

physico-chemical reality of the tube ma!es it clear that the 2amming distance is not an ade(uate measure of hybridi*ation li!elihoodecept in very special circumstances. Nonetheless, frame shifts appear to be accountable for, at the epense of technical complications

in the 2amming metric, by a generali*ation, the so-called h-metric, introduced by ar*on et al. This metric may capture enough of the

reality of reaction conditions and the compleity of test tube hybridi*ations to frame and solve the encoding problem appropriately

The h-measure is defined as the minimum of all 2amming distances obtained by successively shifting and lining up the :0-

complement of y against 8 the h-metric is defined for so-called poligos, namely e(uivalence classes of n-mers at an h-measure *ero

of each other. The h-measure is not, strictly spea!ing, a metric./

D3 Buildin! and "ro!rammin! Moleular Computers+or several reasons, the greatest engineering and technological challenge posed by molecular computing is perhaps the construction o

a molecular computer. 6n a molecular computer, one would epect to find the basic features that are evident in a conventional

electronic computer in an integrated

system, namely information storage, programmability, and information processing. uch features are obviously desirable, but whether

they are actually reali*able is not very clear. 'arly papers have suggested abstract architectures a notable eample is the stic!er

architecture of >oweis et al/, but as ac!nowledged by many authors, critical engineering challenges remain unresolved about the

issues of reliability discussed earlier. 6t is now clear that such issues present the most important difficulties. The best effort to date is

being conducted by the :isconsin&s surface computing research group at the 5niversity of :isconsin-Madison.


8/9

The instruction set consists of three primitive operations7 mar!, unmar!, and destroy. uccessful implementation of these operations

would permit, in principle, building a general-purpose molecular

computer. iven the difficulties with implementing traditional algorithms in DNA and their potential for evolutionary-stylecomputation, DNA computers apparently follow Michael 0onrad&s trade-off principle7 Ia computing system cannot at the same time

have high programmability, high computational efficiency, and high evolutionary adaptability.J 2e describes programmability as the

ability to communicate programs to the computing system eactly with a finite alphabet in a finite number of steps. The efficiency of a

computing system is defined as the ratio of the number of interactions in the system that are used for computing and the total number

of interactions possible in the system, and the evolutionary adaptability is defined as the ability of the system to change in response to

uncertainty. 6t is clear that biomolecules offer, by the nature of their function, a good answer to adaptability. 6f 0onrad&s principle

holds here, there is good evidence that molecular programming will be a great challenge.

E3 *mplementin! E#olutionar$ Computation

'volutionary computation is based on analogies of biological processes implemented in electronics to arrive at computer programsthat sometimes outperform software designed by standard methodologies. The most common analogy used is natural selection, or

survival of the fittest.

The various methodologies include genetic algorithms, genetic programming, evolution strategies, evolutionary programming, and

immune systems. These algorithms use a generate-and-evaluate strategy7 a population of possible solutions is maintained usually

generated at random/8 individuals are then selected from a population based upon their fitness, i.e., how well they satisfy an eternal

constraint8 the population is then updated by replacing less-fit individuals by combinations of hopefully fitter individuals through

some variation operations such as crossover and mutation. The basic evolution program '/ algorithm is shown in +ig. $. Through

successive generations, the fitness of individuals is improved, and better solutions are found that may converge to a good enoughsolution. The !ey ingredients in an evolutionary algorithm are selection pressure provided by the fitness function/ and variation

pressure provided by the genetic operations/. Cariation guarantees a fairly thorough opportunity for each solution to access the

population of solutions and thereby a chance to be evaluated8 selection guarantees that evaluation does produce better and bettersolutions.

(3 Autonom$ and Self%Assembl$

6t is fol! !nowledge now that human intervention is a bottlenec! in molecular computing, i.e., it will be necessary to automate

molecular protocols as much as possible. These are usually referred to as Isingle-potJ protocols, after :infree. These concerns havebeen addressed in one way or another in several wor!s, particularly :infree&s selfassembling reactions for tilings, fault tolerance in

error-preventing codes, and self-control of nondeterminism and molecule formation and reaction efficiency. onos!a and =arl show

how many computations can be simplified by constructing appropriate graphs in DNA molecules. 2agiya has further iterated the

importance of self-controlled and autonomous protocols that would eliminate human intervention and so reveal more about the true

power of molecular computing. ar*on et al. provide a self-assembly protocol for a family of graphs, the 0ayley graphs of so-called

automatic groups, that eploits the symmetry of the graphs and good encodings to ma!e self-assembly possible by the type of

thermocycling effective in whiplash 0> computations. iven the increasing importance of reliability for molecular programming

self-assembly, and self-regulation are important tools to achieve a solution to the autonomy problem of molecular computers.

Application!

:hile the development of DNA computational methods may have many directly applicable applications, the biggest contribution of

research in this area may be much more fundamental and will li!ely fuel many indirect benefits. 6n many papers, it is stressed that

high levels of collaboration between academic disciplines will be essential to affect progress in DNA computing. uch collaboration

may very well lead to the development of a DNA computer with practical advantages over a conventional computer but has an even

greater li!elihood of contributing to an increased understanding of DNA and other biological mechanisms. The need for additional

precision could effect progress in biomolecular techni(ues by placing demands on bio-chemists and their tools that might not

otherwise be considered.


9/9

A particular area within the natural and applied sciences that may benefit from advances in DNA computation is combinatorial

chemistry. 0ombinatorial chemistry involves the construction of en*ymes, se(uences of >NA, and other molecules, particularly for

use in biomolecular engineering or medicine. Adleman describes this process as being similar to OclassicO models of DNA

computation, as combinatorial chemistry involves generating large sets of random >NA se(uences and searches for molecules with

the desired properties. Advances in either area could easily benefit the other field or even pave a way to combining the two fieldsproducing both products and related computational results in parallel.

everal papers also etend the use of biomolecular computing into applications in the emerging science of nanotechnology

specifically nano-fabrication, ma!ing use of both the small scale computational abilities of DNA and the manufacturing abilities of>NA . ince both fields are still very embryonic, the practical or even eperimental implementation of this use is still highly

speculative but promising.

Applying the techni(ues of DNA

biomolecular final1

Documents