biomolecular final1

Upload: monzieair

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Biomolecular Final1

    1/9

    Biomolecular Computing

    Molecular computing is a discipline that aims at harnessing individual molecules at nanoscales for computational purposes. The best-

    studied molecules for this purpose to date have been DNA and bacteriorhodopsin. Biomolecular computing allows one to realistically

    entertain,for the first time in history,the possibility of eploiting the massive parallelism at nanoscales inherent in natural phenomenato solve computational problems. The implementation of evolutionary algorithms in biomolecules would bring full circle the

    biological analogy and present an attractive alternative to meet large demands for computational power.

    INTRODUCTION

    The notion of harnessing individual molecules at nanoscales for computational purposes is an idea that can be traced bac! at least to

    the time when electronic computers were being constructed in the "#$%&s. 'lectrons are, in fact, orders of magnitude smaller than

    molecules, but over are re(uired )ust to communicate a carriage return to a conventional processor. The idea of improving the

    efficiency of hardware utili*ation using biomolecules is attractive for several reasons. +irst, hardware is inherently

    arallel, and parallelism is a good way to handle computational bottlenec!s. econd, bimolecules occur abundantly in nature, for

    eample,inside all !nown living cells with eu!aryote/ and without pro!aryote/ nuclei, and constitute the basic substratum of life

    0onse(uently, they have developed a structure that enables them to solve a number of difficulties for parallel computing, such asmassive communication over noisy media and load balancing problems, by mechanisms that we may not even be aware of.

    +urthermore,short biomolecules can now be synthesi*ed at low cost.

    THE ORIGINS OF MOLECULAR COMUTING

    1ately, advances in computer science have been characteri*ed by the computational implementation of well-established biological

    paradigms. Notable advances are artificial neural nets, inspired by the brain, and it&s obvious connection to natural intelligence, andevolutionary computation, inspired by the Darwinian paradigm of natural selection. 'arly ideas of molecular computing attempted to

    emulate conventional electronic implementations in other media, e.g., implementing Boolean gates in a variety of ways. A

    fundamental brea!through characteristic of a new era was made by Adleman&s "##$ paper where he reports an eperiment performed

    with molecules of fundamental importance for life,

    DNAdeoyribonucleic acid/ molecules, to solve a computational problem !nown to be difficult for ordinary computers, namely the

    2amiltonian path problem 2/. This problem is typical of an elite set of problems in the well-!nown compleity class N that

    eemplify the computational difficulty of search

    rocedures that plague a number of very important applications in combinatorial optimi*ation, operations research, 3 numerical

    computation. Adleman&s eperiment ushered in a new computational paradigm in molecular computing for several reasons. +irst, it

    showed that it is indeed possible to orchestrate individual molecules to perform computational tas!s. econd, it showed the enormouspotential of DNA molecules for solving problems beyond the reach of conventional computers that have been or may be developed in

    the future based on solid-state electronics. hortly after, in "##4, the first conference on DNA-based computing was organi*ed a

    rinceton 5niversity, and several events have been held since annually.

    A) Adlemans Landmark Experiment

    6n this section, we present the essential technical details of Adleman&s eperiment. The 2 is defined precisely as follows.

    Instance7 a directed graph and two vertices, source and destination8

    Question7 yes9no, there is a path following arcs in the graph connecting the source to the destination vertices and passingthrough each other verte eactly once.

    As mentioned before, this problem is N-complete, i.e., it is representative of many of the difficulties that afflict conventional

    computers for solving very important problems in combinatorial optimi*ation and operations research. 'ach complete problem in N

    contains all problems in the class N as special cases after some rewording and is characteri*ed by the fact that their solutions are

    easily verifiable, but etremely difficult to find in a reasonable amount of search time. The best-!nown general techni(ues to apply to

    these problems amount essentially to an ehaustive search through all possible solutions, loo!ing for satisfaction of the constraints

    re(uired by the problem. 6t is therefore an ideal candidate for a brand new computational approach using molecules.Adleman&s brilliant insight was to carefully arrange a set of DNA molecules so that the chemistry that they naturally follow would

    perform the brunt of the computational process. The !ey operations in this chemistry are stic!ing operations that allow the basic

    nucleotides of nucleic acids to form larger structures through the processes of ligation and hybridi*ation more below in ection 666-

    A/. The first DNA-based molecular computation is summari*ed in +ig. ". pecifically, Adleman Assigned well-chosen/ uni(ue

  • 8/13/2019 Biomolecular Final1

    2/9

    single-stranded molecules to represent the vertices, used :atson;0ric! complements of the corresponding halves to represent edges

    )oining two vertices, and synthesi*ed a picomol of each of the

  • 8/13/2019 Biomolecular Final1

    3/9

    SOME SUCCESS STORIES

    6n this section, we give a brief description of some of the problems for which molecular protocols have been or are being implemented

    successfully. 'ach story that follows either illustrates a basic techni(ue in molecular computing or has successfully mar!ed definite

    progress in the lab. Before proceeding further, however, we need to give a more precise description of the molecular biologica

    bac!ground as well as a characteri*ation of the basic methodology employed in molecular computing.

    A) "arallel O#erlap Assembl$

    erhaps the foremost advantage of computing with molecules is the ready parallelism in which molecular operations ta!e place. Thebest way to eploit this parallelism is to perform the same operation simultaneously on many molecules. Adleman&s basic techni(ue

    can be characteri*ed as a generate-and-filter techni(ue, i.e., generate all possible paths and filter out those that are not 2amiltonian. 6n

    this approach, one must be sure to generate all the possible solutions to the problem, a!in to ma!ing sure that the data structure for a

  • 8/13/2019 Biomolecular Final1

    4/9

    chromosome captures all possible solutions in an evolutionary algorithm. Many protocols in molecular computing eploit this method

    for eample Boolean formula evaluation see the net section for some references/.

    6t is therefore important to be able to generate all potential solutions to a problem. A procedure called parallel overlapassembly has

    been used in molecular biology for gene reconstruction and DNA shuffling. 6t has been successfully used by ?uyang et al.in a lab

    eperiment to solve an instance of another N-complete graph problem, MA@-0165'. The procedure consists of iterations ofthermal cycles that anneal given shorter DNA segments in random ways to produce larger molecules representing potential solutions

    >elated procedures have been used to improve solutions to 2 by Arita et al.

    B)Boolean%Ciruit E#aluation

    Another important approach in molecular computing is the implementation of Boolean circuits because that would allow importing to

    the world of molecules the vast progress that has been made on information processing in electronic computers. A successful

    implementation of Boolean circuits would lead to the construction of ordinary computers in biomolecules, particularly of parallel

    computers. 1ipton presented an early proposal for Boolean circuit evaluation as a solution to AT Boolean formula satisfiability/ andthereby problems in the class N. ?gihara and >ay have suggested protocols for implementing Boolean circuits that run in an amount

    of time that is proportional to the si*e of the circuit. Amos et al. improved the implementation to have run time that is proportional to

    the depth of the circuit. 6n the protocol suggested by the latter, for eample, input bits are represented by well-chosen l -mers and y

    that are present in the tube if and only if the corresponding Boolean variables have value one. The gates are represented by l -mer

    that contain segments that are complementary to the input molecules and the output of the gate. :ithout loss of generality, one

    can assume all gates are simply NAND-gates since this operator is logically complete./ A typical evaluation of a NAND is made by

    placing in a tube the values of the inputs e(ual to one and allowing the formation of a double strand that represents the evaluation, as

    illustrated in +ig.

  • 8/13/2019 Biomolecular Final1

    5/9

    D) (inite%State and urin! Mahine *mplementation

    There were other simultaneous attempts to implement state machines, particularly finite-state machines +M/. ar*onet al.suggested

    implementation of nondeterministic finite state machines that are self-controlled and fault tolerant. +ig. shows an +M where

    various moves are possible from a

    particular state % on the same input *ero. Nondeterministic computation is at the core of the difficulties that Adleman&s originaleperiment was designed to overcome. ?n the other hand, nondeterminism is supposed to be well-understood in the contet of finite-

    state machines, because a so-called subsetconstruction produces a deterministic e(uivalent of a given nondeterministic +M. 6t is conceivable that greater insight about the

    virtues of molecular computing may be gained by loo!ing for ways to implement nondeterminism efficiently as a native mode of

    computation in a fault-tolerant and efficient way.

    The implementation re(uires a dynamic molecule to represent the changing states of the +M that is capable of detecting its inputs in

    its environment. 6t can be a doublestranded molecule containing a segment encoding for the current state and another segment

    encoding the last symbol read that led to the current state. ?ther molecules representing inputs are added with appropriate overhangs,

    which upon hybridi*ation, create restriction sites that allow cleaving with appropriate en*ymes to detach the old state and create a new

    molecule that reflects the new state. Nondeterministic transitions occur because of the various possibilities in the hybridi*ationprocess. 6f run uncontrolled, the protocol will

    soon produce too many copies of the finite control in the same state and thereby thwart the efficiency of the computation. The !ey to

    the success in the subset construction to determini*e an +M is that whenever two nondeterministic copies of the machine find

    themselves in the same state, one can safely discard one of them since their runs will be identical thereafter. 6t is desirable to have a

    protocol that renders the implementation efficient in the tube, i.e., that it will self regulate to produce approimately e(ua

    concentrations of the molecules representing the various states.

    E)Cellular Automata 'uns

    A new direction originated in :infree&s attempts to show that abstract tilings, used earlier to show computational universality, can

    actually be implemented in the lab. 2e has used the @?>-rule in a two-dimensional

  • 8/13/2019 Biomolecular Final1

    6/9

    been created. Naturally, other devices were re(uired to detect these tiny structures at the nanoscales in which they eist, namely

    atomic-force microscopy.

    () Other Appliations+ *s here a ,-iller Appliation./

    Molecular computing has generally aimed, so far, at solving the same ordinary algorithmic problems that are commonly posed forconventional C16-based computers, albeit by an entirely different type of operational process. None of them has ehibited the !ind

    of practical success that would be considered satisfactory to answer 1ipton&s impromptu call for a I!iller appJ at the second DNA-

    based wor!shop in

    rinceton. uch an application would suit well the nature of biomolecules, beat current and perhaps even future solidstate electronics

    and would establish beyond the shadow of a doubt the power of the new computational paradigm. 1andweber et al. have proposed that

    DNA se(uencing,DNA fingerprinting, and9or DNA population screening are good candidates. They would re(uire a new approach in

    the way we conceive and do molecular computation now.

    pecifically, thus far a practitioner is assumed to !now the composition a digital string/ of the molecules that initially encodeinformation and their subse(uent transformations in the tube. This methodology re(uires going bac! and forth between the digital and

    analog DNA world, by se(uencing when compositions are un!nown/, which is an epensive step, and the converse step, synthesis of

    DNA. This so-called approach bypasses digiti*ing by operating directly on un!nown segments xof DNA, using !nown

    molecules, in order to compute a predetermined function f/ that specifies the computational tas!. The fact that the first five years ofwor! in the field have not, however, produced one such !iller application would ma!e some people thin! that perhaps fundamental

    scientific and9or technological difficulties have to be overcome before one effectively appears on the scene. These proposals can be

    thus regarded as challenges, rather than established results, and will be discussed in the following section.

    0'AND C1ALLEN0ES (O' MOLEC2LA' COM"2*N0

    A3 'eliabilit$4 Effiien$4 and Salabilit$

    >eliability, efficiency, and scalability are perhaps the three most burning issues for molecular computing. The reliability of a protocoli.e., a DNA computation, is the degree of confidence with which a lab eperiment provides a true answer to the given problem. The

    efficiency of the protocol refers to the intended and effective use of the molecules that intervene in it. The scalability of a lab

    eperiment is the effective reproducibility of the eperiment with longer molecules that can encode larger problem instances while

    still obtaining e(ually reliable results under comparable efficiency. These three are distinct but clearly interrelated problems

    Biologists have not really faced these problems in their wor! because in that field the definition of success is different than in

    computer science. :hen a biologist claims that she has cloned an organism, for eample, the contention is that one eperiment was

    successful, regardless of how many were previously not or whether only one clone was actually produced./ >esearch on these

    problems in molecular computing has )ust begun. Most wor! has concentrated on reliability, and we proceed to s!etch it, in the guise

    of a more basic and important problem7 the encoding problem. This is a good eample in which molecular computing will probably

    have a feedbac! effect on the notions of efficiency and scalability in biology.

    B3 he Enodin! "roblem

    ?nce the encoding molecules for the input of a problem have been chosen, a molecular computer scientist is at the mercy of the

    chemistry, even though she may still have some control over the protocols that she may perform with them in the laboratory

    eecution. 6f the encodings are prone to

    'rrors, the eperiment can be repeated any number of times and always provide the same erroneous/ results, as evidenced in .This

    fact lessens the effectiveness of the standard method of increasing the reliability of a probabilistic computation with a non*ero

    probability of errors by iteration. A different analysis of the problem was initiated by Baum, where it is assumed that undesirable

    errors will occur only if repetitions or complementary substrands of a certain minimum stic!ing length k7KLL appeared in the

    encoding. The problem is that the uncertain nature of hybridi*ations may plague the separators that are used to prevent the problem, soa more thorough approach appears to be necessary. A mismatched hybridi*ation is a bound pair of oligonucleotides that contains at

    least one mismatched pair. 6n addition to frame shift errors in which the n-mers are shifted relative to each other, mismatches leadingto false positives include hairpin mismatches, bulges, and partial hybridi*ations. The encoding problem for DNA computing thus

    consists of mapping the instances of an algorithmic problem in a systematic manner onto specific molecules so that7 a/ the chemical

    protocols avoid all these sources of error and b/ the resulting products contain, with a high degree of reliability, enough molecules

    encoding the answers to the problem&s instances to enable a successful etraction.

    An optimal encoding would maimi*e the li!elihood of desired hybridi*ations while minimi*ing the occurrence of undesirable

    hybridi*ations, and furthermore, would lead to e(uilibrium reaction conditions that are favorable for retrieving the solution of the

    problem in the etraction phase. 0learly,

    the encoding of a problem for a molecular solution has to be decided beforehand by means presumably different from DNAcomputation.

    Thus, in its full generality, we have the following algorithmic problem.

  • 8/13/2019 Biomolecular Final1

    7/9

    The function reflects a desirable (uality criterion for the protocol and can be given by mapping .olving the

    encoding problem re(uires identifying appropriate criteria that capture the relevant chemistry, and moreover, give algorithms toproduce good encodings that will satisfy constraints a/ and b/.The most natural and fitting criteria can be found in the thermodynamics

    that governs the hybridi*ation and ligation processes. 5ltimately, it comes down to the ibbs free-energy that nucleotides release

    during hybridi*ation in passing to a lower energy state of a bound pair. The thermodynamics of hybridi*ations are fairly well-!nown

    see :etmur for a survey of relevant facts, as well as anta1ucia et al. The basic (uantity is the melting temperature of a given

    double strand, which is defined as the temperature at which half of a homogenous population of such double strands will have

    denatured into single strands. The controlling parameters of a melting temperature are strand composition, strand concentration, and

    various other solvent properties, such as p2 of the solution. Despite some fundamental wor!, this approach based on melting

    temperatures has not really produced a systematic and productive way to produce good encodings. uch encodings can actually be

    obtained through evolutionary searches, either in vitroor in silicothat utili*e fitness functions based on one or some of these factors

    or through the use of heuristics for special-purpose encodings. +inding appropriate general metrics in oligonucleotide space and

    practical solutions to the corresponding restriction of DNA 'N0?D6N is an important problem for DNA-based computing. 6n

    general, even for a single good choice of (uality criterion , the encoding problem as stated is very li!ely to be N complete, i.e., asdifficult as the problem it is supposed to help solve, and so it would not admit general solutions. >elaations of the problem need to be

    considered.

    C3 Error%"re#entin! Codes

    6t is conceivable that a more principled computational approach can produce solutions of the encoding problem that capture physico-

    chemical conditions that are good enough to be validated by lab eperiments. erhaps the best eample is the combinatorial approach

    proposed by the molecular computing group in Memphis. The cru of the approach is to regard an eperiment for a molecularcomputation as the transmission of a message from the protocol to the eperimentalist through a noisy channel, namely the tubes/ in

    which the reactions ta!e place. The theory of communication introduced by hannon has found effective ways to handle this problem

    by introducing redundancy to protect against noise. The solutions are the so-called error-correcting codes for data transmission that

    information theorists have spent the last 4% years designing. The mathematical framewor! is the metric space of Boolean hypercubes

    with the standard binary 2amming metric. 6n the case of information encodings in biomolecules, one can easily generali*e the2amming distance to the four-letter alphabet

    A,0,,T using :atson;0ric! complementarity. This generali*ed 2amming metric gives some (uantification of the hybridi*ation

    li!elihood of the molecules in the reaction. This possibility has been eplored in several papers. The problem is that oligos at a large

    2amming distance can still hybridi*e perfectly at the overlap after a shift, such as in the case of the two strands in . The

    physico-chemical reality of the tube ma!es it clear that the 2amming distance is not an ade(uate measure of hybridi*ation li!elihoodecept in very special circumstances. Nonetheless, frame shifts appear to be accountable for, at the epense of technical complications

    in the 2amming metric, by a generali*ation, the so-called h-metric, introduced by ar*on et al. This metric may capture enough of the

    reality of reaction conditions and the compleity of test tube hybridi*ations to frame and solve the encoding problem appropriately

    The h-measure is defined as the minimum of all 2amming distances obtained by successively shifting and lining up the :0-

    complement of y against 8 the h-metric is defined for so-called poligos, namely e(uivalence classes of n-mers at an h-measure *ero

    of each other. The h-measure is not, strictly spea!ing, a metric./

    D3 Buildin! and "ro!rammin! Moleular Computers+or several reasons, the greatest engineering and technological challenge posed by molecular computing is perhaps the construction o

    a molecular computer. 6n a molecular computer, one would epect to find the basic features that are evident in a conventional

    electronic computer in an integrated

    system, namely information storage, programmability, and information processing. uch features are obviously desirable, but whether

    they are actually reali*able is not very clear. 'arly papers have suggested abstract architectures a notable eample is the stic!er

    architecture of >oweis et al/, but as ac!nowledged by many authors, critical engineering challenges remain unresolved about the

    issues of reliability discussed earlier. 6t is now clear that such issues present the most important difficulties. The best effort to date is

    being conducted by the :isconsin&s surface computing research group at the 5niversity of :isconsin-Madison.

  • 8/13/2019 Biomolecular Final1

    8/9

    The instruction set consists of three primitive operations7 mar!, unmar!, and destroy. uccessful implementation of these operations

    would permit, in principle, building a general-purpose molecular

    computer. iven the difficulties with implementing traditional algorithms in DNA and their potential for evolutionary-stylecomputation, DNA computers apparently follow Michael 0onrad&s trade-off principle7 Ia computing system cannot at the same time

    have high programmability, high computational efficiency, and high evolutionary adaptability.J 2e describes programmability as the

    ability to communicate programs to the computing system eactly with a finite alphabet in a finite number of steps. The efficiency of a

    computing system is defined as the ratio of the number of interactions in the system that are used for computing and the total number

    of interactions possible in the system, and the evolutionary adaptability is defined as the ability of the system to change in response to

    uncertainty. 6t is clear that biomolecules offer, by the nature of their function, a good answer to adaptability. 6f 0onrad&s principle

    holds here, there is good evidence that molecular programming will be a great challenge.

    E3 *mplementin! E#olutionar$ Computation

    'volutionary computation is based on analogies of biological processes implemented in electronics to arrive at computer programsthat sometimes outperform software designed by standard methodologies. The most common analogy used is natural selection, or

    survival of the fittest.

    The various methodologies include genetic algorithms, genetic programming, evolution strategies, evolutionary programming, and

    immune systems. These algorithms use a generate-and-evaluate strategy7 a population of possible solutions is maintained usually

    generated at random/8 individuals are then selected from a population based upon their fitness, i.e., how well they satisfy an eternal

    constraint8 the population is then updated by replacing less-fit individuals by combinations of hopefully fitter individuals through

    some variation operations such as crossover and mutation. The basic evolution program '/ algorithm is shown in +ig. $. Through

    successive generations, the fitness of individuals is improved, and better solutions are found that may converge to a good enoughsolution. The !ey ingredients in an evolutionary algorithm are selection pressure provided by the fitness function/ and variation

    pressure provided by the genetic operations/. Cariation guarantees a fairly thorough opportunity for each solution to access the

    population of solutions and thereby a chance to be evaluated8 selection guarantees that evaluation does produce better and bettersolutions.

    (3 Autonom$ and Self%Assembl$

    6t is fol! !nowledge now that human intervention is a bottlenec! in molecular computing, i.e., it will be necessary to automate

    molecular protocols as much as possible. These are usually referred to as Isingle-potJ protocols, after :infree. These concerns havebeen addressed in one way or another in several wor!s, particularly :infree&s selfassembling reactions for tilings, fault tolerance in

    error-preventing codes, and self-control of nondeterminism and molecule formation and reaction efficiency. onos!a and =arl show

    how many computations can be simplified by constructing appropriate graphs in DNA molecules. 2agiya has further iterated the

    importance of self-controlled and autonomous protocols that would eliminate human intervention and so reveal more about the true

    power of molecular computing. ar*on et al. provide a self-assembly protocol for a family of graphs, the 0ayley graphs of so-called

    automatic groups, that eploits the symmetry of the graphs and good encodings to ma!e self-assembly possible by the type of

    thermocycling effective in whiplash 0> computations. iven the increasing importance of reliability for molecular programming

    self-assembly, and self-regulation are important tools to achieve a solution to the autonomy problem of molecular computers.

    Application!

    :hile the development of DNA computational methods may have many directly applicable applications, the biggest contribution of

    research in this area may be much more fundamental and will li!ely fuel many indirect benefits. 6n many papers, it is stressed that

    high levels of collaboration between academic disciplines will be essential to affect progress in DNA computing. uch collaboration

    may very well lead to the development of a DNA computer with practical advantages over a conventional computer but has an even

    greater li!elihood of contributing to an increased understanding of DNA and other biological mechanisms. The need for additional

    precision could effect progress in biomolecular techni(ues by placing demands on bio-chemists and their tools that might not

    otherwise be considered.

  • 8/13/2019 Biomolecular Final1

    9/9

    A particular area within the natural and applied sciences that may benefit from advances in DNA computation is combinatorial

    chemistry. 0ombinatorial chemistry involves the construction of en*ymes, se(uences of >NA, and other molecules, particularly for

    use in biomolecular engineering or medicine. Adleman describes this process as being similar to OclassicO models of DNA

    computation, as combinatorial chemistry involves generating large sets of random >NA se(uences and searches for molecules with

    the desired properties. Advances in either area could easily benefit the other field or even pave a way to combining the two fieldsproducing both products and related computational results in parallel.

    everal papers also etend the use of biomolecular computing into applications in the emerging science of nanotechnology

    specifically nano-fabrication, ma!ing use of both the small scale computational abilities of DNA and the manufacturing abilities of>NA . ince both fields are still very embryonic, the practical or even eperimental implementation of this use is still highly

    speculative but promising.

    Applying the techni(ues of DNA