biomolecular final1
TRANSCRIPT
-
8/13/2019 Biomolecular Final1
1/9
Biomolecular Computing
Molecular computing is a discipline that aims at harnessing individual molecules at nanoscales for computational purposes. The best-
studied molecules for this purpose to date have been DNA and bacteriorhodopsin. Biomolecular computing allows one to realistically
entertain,for the first time in history,the possibility of eploiting the massive parallelism at nanoscales inherent in natural phenomenato solve computational problems. The implementation of evolutionary algorithms in biomolecules would bring full circle the
biological analogy and present an attractive alternative to meet large demands for computational power.
INTRODUCTION
The notion of harnessing individual molecules at nanoscales for computational purposes is an idea that can be traced bac! at least to
the time when electronic computers were being constructed in the "#$%&s. 'lectrons are, in fact, orders of magnitude smaller than
molecules, but over are re(uired )ust to communicate a carriage return to a conventional processor. The idea of improving the
efficiency of hardware utili*ation using biomolecules is attractive for several reasons. +irst, hardware is inherently
arallel, and parallelism is a good way to handle computational bottlenec!s. econd, bimolecules occur abundantly in nature, for
eample,inside all !nown living cells with eu!aryote/ and without pro!aryote/ nuclei, and constitute the basic substratum of life
0onse(uently, they have developed a structure that enables them to solve a number of difficulties for parallel computing, such asmassive communication over noisy media and load balancing problems, by mechanisms that we may not even be aware of.
+urthermore,short biomolecules can now be synthesi*ed at low cost.
THE ORIGINS OF MOLECULAR COMUTING
1ately, advances in computer science have been characteri*ed by the computational implementation of well-established biological
paradigms. Notable advances are artificial neural nets, inspired by the brain, and it&s obvious connection to natural intelligence, andevolutionary computation, inspired by the Darwinian paradigm of natural selection. 'arly ideas of molecular computing attempted to
emulate conventional electronic implementations in other media, e.g., implementing Boolean gates in a variety of ways. A
fundamental brea!through characteristic of a new era was made by Adleman&s "##$ paper where he reports an eperiment performed
with molecules of fundamental importance for life,
DNAdeoyribonucleic acid/ molecules, to solve a computational problem !nown to be difficult for ordinary computers, namely the
2amiltonian path problem 2/. This problem is typical of an elite set of problems in the well-!nown compleity class N that
eemplify the computational difficulty of search
rocedures that plague a number of very important applications in combinatorial optimi*ation, operations research, 3 numerical
computation. Adleman&s eperiment ushered in a new computational paradigm in molecular computing for several reasons. +irst, it
showed that it is indeed possible to orchestrate individual molecules to perform computational tas!s. econd, it showed the enormouspotential of DNA molecules for solving problems beyond the reach of conventional computers that have been or may be developed in
the future based on solid-state electronics. hortly after, in "##4, the first conference on DNA-based computing was organi*ed a
rinceton 5niversity, and several events have been held since annually.
A) Adlemans Landmark Experiment
6n this section, we present the essential technical details of Adleman&s eperiment. The 2 is defined precisely as follows.
Instance7 a directed graph and two vertices, source and destination8
Question7 yes9no, there is a path following arcs in the graph connecting the source to the destination vertices and passingthrough each other verte eactly once.
As mentioned before, this problem is N-complete, i.e., it is representative of many of the difficulties that afflict conventional
computers for solving very important problems in combinatorial optimi*ation and operations research. 'ach complete problem in N
contains all problems in the class N as special cases after some rewording and is characteri*ed by the fact that their solutions are
easily verifiable, but etremely difficult to find in a reasonable amount of search time. The best-!nown general techni(ues to apply to
these problems amount essentially to an ehaustive search through all possible solutions, loo!ing for satisfaction of the constraints
re(uired by the problem. 6t is therefore an ideal candidate for a brand new computational approach using molecules.Adleman&s brilliant insight was to carefully arrange a set of DNA molecules so that the chemistry that they naturally follow would
perform the brunt of the computational process. The !ey operations in this chemistry are stic!ing operations that allow the basic
nucleotides of nucleic acids to form larger structures through the processes of ligation and hybridi*ation more below in ection 666-
A/. The first DNA-based molecular computation is summari*ed in +ig. ". pecifically, Adleman Assigned well-chosen/ uni(ue
-
8/13/2019 Biomolecular Final1
2/9
single-stranded molecules to represent the vertices, used :atson;0ric! complements of the corresponding halves to represent edges
)oining two vertices, and synthesi*ed a picomol of each of the
-
8/13/2019 Biomolecular Final1
3/9
SOME SUCCESS STORIES
6n this section, we give a brief description of some of the problems for which molecular protocols have been or are being implemented
successfully. 'ach story that follows either illustrates a basic techni(ue in molecular computing or has successfully mar!ed definite
progress in the lab. Before proceeding further, however, we need to give a more precise description of the molecular biologica
bac!ground as well as a characteri*ation of the basic methodology employed in molecular computing.
A) "arallel O#erlap Assembl$
erhaps the foremost advantage of computing with molecules is the ready parallelism in which molecular operations ta!e place. Thebest way to eploit this parallelism is to perform the same operation simultaneously on many molecules. Adleman&s basic techni(ue
can be characteri*ed as a generate-and-filter techni(ue, i.e., generate all possible paths and filter out those that are not 2amiltonian. 6n
this approach, one must be sure to generate all the possible solutions to the problem, a!in to ma!ing sure that the data structure for a
-
8/13/2019 Biomolecular Final1
4/9
chromosome captures all possible solutions in an evolutionary algorithm. Many protocols in molecular computing eploit this method
for eample Boolean formula evaluation see the net section for some references/.
6t is therefore important to be able to generate all potential solutions to a problem. A procedure called parallel overlapassembly has
been used in molecular biology for gene reconstruction and DNA shuffling. 6t has been successfully used by ?uyang et al.in a lab
eperiment to solve an instance of another N-complete graph problem, MA@-0165'. The procedure consists of iterations ofthermal cycles that anneal given shorter DNA segments in random ways to produce larger molecules representing potential solutions
>elated procedures have been used to improve solutions to 2 by Arita et al.
B)Boolean%Ciruit E#aluation
Another important approach in molecular computing is the implementation of Boolean circuits because that would allow importing to
the world of molecules the vast progress that has been made on information processing in electronic computers. A successful
implementation of Boolean circuits would lead to the construction of ordinary computers in biomolecules, particularly of parallel
computers. 1ipton presented an early proposal for Boolean circuit evaluation as a solution to AT Boolean formula satisfiability/ andthereby problems in the class N. ?gihara and >ay have suggested protocols for implementing Boolean circuits that run in an amount
of time that is proportional to the si*e of the circuit. Amos et al. improved the implementation to have run time that is proportional to
the depth of the circuit. 6n the protocol suggested by the latter, for eample, input bits are represented by well-chosen l -mers and y
that are present in the tube if and only if the corresponding Boolean variables have value one. The gates are represented by l -mer
that contain segments that are complementary to the input molecules and the output of the gate. :ithout loss of generality, one
can assume all gates are simply NAND-gates since this operator is logically complete./ A typical evaluation of a NAND is made by
placing in a tube the values of the inputs e(ual to one and allowing the formation of a double strand that represents the evaluation, as
illustrated in +ig.
-
8/13/2019 Biomolecular Final1
5/9
D) (inite%State and urin! Mahine *mplementation
There were other simultaneous attempts to implement state machines, particularly finite-state machines +M/. ar*onet al.suggested
implementation of nondeterministic finite state machines that are self-controlled and fault tolerant. +ig. shows an +M where
various moves are possible from a
particular state % on the same input *ero. Nondeterministic computation is at the core of the difficulties that Adleman&s originaleperiment was designed to overcome. ?n the other hand, nondeterminism is supposed to be well-understood in the contet of finite-
state machines, because a so-called subsetconstruction produces a deterministic e(uivalent of a given nondeterministic +M. 6t is conceivable that greater insight about the
virtues of molecular computing may be gained by loo!ing for ways to implement nondeterminism efficiently as a native mode of
computation in a fault-tolerant and efficient way.
The implementation re(uires a dynamic molecule to represent the changing states of the +M that is capable of detecting its inputs in
its environment. 6t can be a doublestranded molecule containing a segment encoding for the current state and another segment
encoding the last symbol read that led to the current state. ?ther molecules representing inputs are added with appropriate overhangs,
which upon hybridi*ation, create restriction sites that allow cleaving with appropriate en*ymes to detach the old state and create a new
molecule that reflects the new state. Nondeterministic transitions occur because of the various possibilities in the hybridi*ationprocess. 6f run uncontrolled, the protocol will
soon produce too many copies of the finite control in the same state and thereby thwart the efficiency of the computation. The !ey to
the success in the subset construction to determini*e an +M is that whenever two nondeterministic copies of the machine find
themselves in the same state, one can safely discard one of them since their runs will be identical thereafter. 6t is desirable to have a
protocol that renders the implementation efficient in the tube, i.e., that it will self regulate to produce approimately e(ua
concentrations of the molecules representing the various states.
E)Cellular Automata 'uns
A new direction originated in :infree&s attempts to show that abstract tilings, used earlier to show computational universality, can
actually be implemented in the lab. 2e has used the @?>-rule in a two-dimensional
-
8/13/2019 Biomolecular Final1
6/9
been created. Naturally, other devices were re(uired to detect these tiny structures at the nanoscales in which they eist, namely
atomic-force microscopy.
() Other Appliations+ *s here a ,-iller Appliation./
Molecular computing has generally aimed, so far, at solving the same ordinary algorithmic problems that are commonly posed forconventional C16-based computers, albeit by an entirely different type of operational process. None of them has ehibited the !ind
of practical success that would be considered satisfactory to answer 1ipton&s impromptu call for a I!iller appJ at the second DNA-
based wor!shop in
rinceton. uch an application would suit well the nature of biomolecules, beat current and perhaps even future solidstate electronics
and would establish beyond the shadow of a doubt the power of the new computational paradigm. 1andweber et al. have proposed that
DNA se(uencing,DNA fingerprinting, and9or DNA population screening are good candidates. They would re(uire a new approach in
the way we conceive and do molecular computation now.
pecifically, thus far a practitioner is assumed to !now the composition a digital string/ of the molecules that initially encodeinformation and their subse(uent transformations in the tube. This methodology re(uires going bac! and forth between the digital and
analog DNA world, by se(uencing when compositions are un!nown/, which is an epensive step, and the converse step, synthesis of
DNA. This so-called approach bypasses digiti*ing by operating directly on un!nown segments xof DNA, using !nown
molecules, in order to compute a predetermined function f/ that specifies the computational tas!. The fact that the first five years ofwor! in the field have not, however, produced one such !iller application would ma!e some people thin! that perhaps fundamental
scientific and9or technological difficulties have to be overcome before one effectively appears on the scene. These proposals can be
thus regarded as challenges, rather than established results, and will be discussed in the following section.
0'AND C1ALLEN0ES (O' MOLEC2LA' COM"2*N0
A3 'eliabilit$4 Effiien$4 and Salabilit$
>eliability, efficiency, and scalability are perhaps the three most burning issues for molecular computing. The reliability of a protocoli.e., a DNA computation, is the degree of confidence with which a lab eperiment provides a true answer to the given problem. The
efficiency of the protocol refers to the intended and effective use of the molecules that intervene in it. The scalability of a lab
eperiment is the effective reproducibility of the eperiment with longer molecules that can encode larger problem instances while
still obtaining e(ually reliable results under comparable efficiency. These three are distinct but clearly interrelated problems
Biologists have not really faced these problems in their wor! because in that field the definition of success is different than in
computer science. :hen a biologist claims that she has cloned an organism, for eample, the contention is that one eperiment was
successful, regardless of how many were previously not or whether only one clone was actually produced./ >esearch on these
problems in molecular computing has )ust begun. Most wor! has concentrated on reliability, and we proceed to s!etch it, in the guise
of a more basic and important problem7 the encoding problem. This is a good eample in which molecular computing will probably
have a feedbac! effect on the notions of efficiency and scalability in biology.
B3 he Enodin! "roblem
?nce the encoding molecules for the input of a problem have been chosen, a molecular computer scientist is at the mercy of the
chemistry, even though she may still have some control over the protocols that she may perform with them in the laboratory
eecution. 6f the encodings are prone to
'rrors, the eperiment can be repeated any number of times and always provide the same erroneous/ results, as evidenced in .This
fact lessens the effectiveness of the standard method of increasing the reliability of a probabilistic computation with a non*ero
probability of errors by iteration. A different analysis of the problem was initiated by Baum, where it is assumed that undesirable
errors will occur only if repetitions or complementary substrands of a certain minimum stic!ing length k7KLL appeared in the
encoding. The problem is that the uncertain nature of hybridi*ations may plague the separators that are used to prevent the problem, soa more thorough approach appears to be necessary. A mismatched hybridi*ation is a bound pair of oligonucleotides that contains at
least one mismatched pair. 6n addition to frame shift errors in which the n-mers are shifted relative to each other, mismatches leadingto false positives include hairpin mismatches, bulges, and partial hybridi*ations. The encoding problem for DNA computing thus
consists of mapping the instances of an algorithmic problem in a systematic manner onto specific molecules so that7 a/ the chemical
protocols avoid all these sources of error and b/ the resulting products contain, with a high degree of reliability, enough molecules
encoding the answers to the problem&s instances to enable a successful etraction.
An optimal encoding would maimi*e the li!elihood of desired hybridi*ations while minimi*ing the occurrence of undesirable
hybridi*ations, and furthermore, would lead to e(uilibrium reaction conditions that are favorable for retrieving the solution of the
problem in the etraction phase. 0learly,
the encoding of a problem for a molecular solution has to be decided beforehand by means presumably different from DNAcomputation.
Thus, in its full generality, we have the following algorithmic problem.
-
8/13/2019 Biomolecular Final1
7/9
The function reflects a desirable (uality criterion for the protocol and can be given by mapping .olving the
encoding problem re(uires identifying appropriate criteria that capture the relevant chemistry, and moreover, give algorithms toproduce good encodings that will satisfy constraints a/ and b/.The most natural and fitting criteria can be found in the thermodynamics
that governs the hybridi*ation and ligation processes. 5ltimately, it comes down to the ibbs free-energy that nucleotides release
during hybridi*ation in passing to a lower energy state of a bound pair. The thermodynamics of hybridi*ations are fairly well-!nown
see :etmur for a survey of relevant facts, as well as anta1ucia et al. The basic (uantity is the melting temperature of a given
double strand, which is defined as the temperature at which half of a homogenous population of such double strands will have
denatured into single strands. The controlling parameters of a melting temperature are strand composition, strand concentration, and
various other solvent properties, such as p2 of the solution. Despite some fundamental wor!, this approach based on melting
temperatures has not really produced a systematic and productive way to produce good encodings. uch encodings can actually be
obtained through evolutionary searches, either in vitroor in silicothat utili*e fitness functions based on one or some of these factors
or through the use of heuristics for special-purpose encodings. +inding appropriate general metrics in oligonucleotide space and
practical solutions to the corresponding restriction of DNA 'N0?D6N is an important problem for DNA-based computing. 6n
general, even for a single good choice of (uality criterion , the encoding problem as stated is very li!ely to be N complete, i.e., asdifficult as the problem it is supposed to help solve, and so it would not admit general solutions. >elaations of the problem need to be
considered.
C3 Error%"re#entin! Codes
6t is conceivable that a more principled computational approach can produce solutions of the encoding problem that capture physico-
chemical conditions that are good enough to be validated by lab eperiments. erhaps the best eample is the combinatorial approach
proposed by the molecular computing group in Memphis. The cru of the approach is to regard an eperiment for a molecularcomputation as the transmission of a message from the protocol to the eperimentalist through a noisy channel, namely the tubes/ in
which the reactions ta!e place. The theory of communication introduced by hannon has found effective ways to handle this problem
by introducing redundancy to protect against noise. The solutions are the so-called error-correcting codes for data transmission that
information theorists have spent the last 4% years designing. The mathematical framewor! is the metric space of Boolean hypercubes
with the standard binary 2amming metric. 6n the case of information encodings in biomolecules, one can easily generali*e the2amming distance to the four-letter alphabet
A,0,,T using :atson;0ric! complementarity. This generali*ed 2amming metric gives some (uantification of the hybridi*ation
li!elihood of the molecules in the reaction. This possibility has been eplored in several papers. The problem is that oligos at a large
2amming distance can still hybridi*e perfectly at the overlap after a shift, such as in the case of the two strands in . The
physico-chemical reality of the tube ma!es it clear that the 2amming distance is not an ade(uate measure of hybridi*ation li!elihoodecept in very special circumstances. Nonetheless, frame shifts appear to be accountable for, at the epense of technical complications
in the 2amming metric, by a generali*ation, the so-called h-metric, introduced by ar*on et al. This metric may capture enough of the
reality of reaction conditions and the compleity of test tube hybridi*ations to frame and solve the encoding problem appropriately
The h-measure is defined as the minimum of all 2amming distances obtained by successively shifting and lining up the :0-
complement of y against 8 the h-metric is defined for so-called poligos, namely e(uivalence classes of n-mers at an h-measure *ero
of each other. The h-measure is not, strictly spea!ing, a metric./
D3 Buildin! and "ro!rammin! Moleular Computers+or several reasons, the greatest engineering and technological challenge posed by molecular computing is perhaps the construction o
a molecular computer. 6n a molecular computer, one would epect to find the basic features that are evident in a conventional
electronic computer in an integrated
system, namely information storage, programmability, and information processing. uch features are obviously desirable, but whether
they are actually reali*able is not very clear. 'arly papers have suggested abstract architectures a notable eample is the stic!er
architecture of >oweis et al/, but as ac!nowledged by many authors, critical engineering challenges remain unresolved about the
issues of reliability discussed earlier. 6t is now clear that such issues present the most important difficulties. The best effort to date is
being conducted by the :isconsin&s surface computing research group at the 5niversity of :isconsin-Madison.
-
8/13/2019 Biomolecular Final1
8/9
The instruction set consists of three primitive operations7 mar!, unmar!, and destroy. uccessful implementation of these operations
would permit, in principle, building a general-purpose molecular
computer. iven the difficulties with implementing traditional algorithms in DNA and their potential for evolutionary-stylecomputation, DNA computers apparently follow Michael 0onrad&s trade-off principle7 Ia computing system cannot at the same time
have high programmability, high computational efficiency, and high evolutionary adaptability.J 2e describes programmability as the
ability to communicate programs to the computing system eactly with a finite alphabet in a finite number of steps. The efficiency of a
computing system is defined as the ratio of the number of interactions in the system that are used for computing and the total number
of interactions possible in the system, and the evolutionary adaptability is defined as the ability of the system to change in response to
uncertainty. 6t is clear that biomolecules offer, by the nature of their function, a good answer to adaptability. 6f 0onrad&s principle
holds here, there is good evidence that molecular programming will be a great challenge.
E3 *mplementin! E#olutionar$ Computation
'volutionary computation is based on analogies of biological processes implemented in electronics to arrive at computer programsthat sometimes outperform software designed by standard methodologies. The most common analogy used is natural selection, or
survival of the fittest.
The various methodologies include genetic algorithms, genetic programming, evolution strategies, evolutionary programming, and
immune systems. These algorithms use a generate-and-evaluate strategy7 a population of possible solutions is maintained usually
generated at random/8 individuals are then selected from a population based upon their fitness, i.e., how well they satisfy an eternal
constraint8 the population is then updated by replacing less-fit individuals by combinations of hopefully fitter individuals through
some variation operations such as crossover and mutation. The basic evolution program '/ algorithm is shown in +ig. $. Through
successive generations, the fitness of individuals is improved, and better solutions are found that may converge to a good enoughsolution. The !ey ingredients in an evolutionary algorithm are selection pressure provided by the fitness function/ and variation
pressure provided by the genetic operations/. Cariation guarantees a fairly thorough opportunity for each solution to access the
population of solutions and thereby a chance to be evaluated8 selection guarantees that evaluation does produce better and bettersolutions.
(3 Autonom$ and Self%Assembl$
6t is fol! !nowledge now that human intervention is a bottlenec! in molecular computing, i.e., it will be necessary to automate
molecular protocols as much as possible. These are usually referred to as Isingle-potJ protocols, after :infree. These concerns havebeen addressed in one way or another in several wor!s, particularly :infree&s selfassembling reactions for tilings, fault tolerance in
error-preventing codes, and self-control of nondeterminism and molecule formation and reaction efficiency. onos!a and =arl show
how many computations can be simplified by constructing appropriate graphs in DNA molecules. 2agiya has further iterated the
importance of self-controlled and autonomous protocols that would eliminate human intervention and so reveal more about the true
power of molecular computing. ar*on et al. provide a self-assembly protocol for a family of graphs, the 0ayley graphs of so-called
automatic groups, that eploits the symmetry of the graphs and good encodings to ma!e self-assembly possible by the type of
thermocycling effective in whiplash 0> computations. iven the increasing importance of reliability for molecular programming
self-assembly, and self-regulation are important tools to achieve a solution to the autonomy problem of molecular computers.
Application!
:hile the development of DNA computational methods may have many directly applicable applications, the biggest contribution of
research in this area may be much more fundamental and will li!ely fuel many indirect benefits. 6n many papers, it is stressed that
high levels of collaboration between academic disciplines will be essential to affect progress in DNA computing. uch collaboration
may very well lead to the development of a DNA computer with practical advantages over a conventional computer but has an even
greater li!elihood of contributing to an increased understanding of DNA and other biological mechanisms. The need for additional
precision could effect progress in biomolecular techni(ues by placing demands on bio-chemists and their tools that might not
otherwise be considered.
-
8/13/2019 Biomolecular Final1
9/9
A particular area within the natural and applied sciences that may benefit from advances in DNA computation is combinatorial
chemistry. 0ombinatorial chemistry involves the construction of en*ymes, se(uences of >NA, and other molecules, particularly for
use in biomolecular engineering or medicine. Adleman describes this process as being similar to OclassicO models of DNA
computation, as combinatorial chemistry involves generating large sets of random >NA se(uences and searches for molecules with
the desired properties. Advances in either area could easily benefit the other field or even pave a way to combining the two fieldsproducing both products and related computational results in parallel.
everal papers also etend the use of biomolecular computing into applications in the emerging science of nanotechnology
specifically nano-fabrication, ma!ing use of both the small scale computational abilities of DNA and the manufacturing abilities of>NA . ince both fields are still very embryonic, the practical or even eperimental implementation of this use is still highly
speculative but promising.
Applying the techni(ues of DNA