the rafiki map

Upload: vamakeshi

Post on 03-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 The Rafiki Map

    1/72

    Introduction to Rafiki's quest to break the genetic code.

    Quotes to set the rebel tone

    Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalistswhose minds are stocked with a multitude of facts all viewed during a long course of years, from a point of view directly opposite mine. (B)ut I look with confidence to the future to young and rising naturalists, who will be able to view both sides of the question with impartiality.

    Charles Darwin

    Origin of Species

    how an individual invents a new way of giving order to data now all assembled must here remain inscrutable and may be permanentlyso Almost always the men who achieve these fundamental inventions of new paradigm have either been very young or very new to thefield whose paradigm they change (they) are particularly likely to see that those rules no longer define a playable game and to conceiveanother set that can replace them.

    Thomas KuhnThe Structure of Scientific Revolutions

    The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...

    Isaac Asimov

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore allprogress depends on the unreasonable man.

    George Bernard Shaw

    If we were still listening to scientists wed still be on the ground.

    The Wright Brothers

    One might imagine from these quotes that Rafiki intends to champion new ideas, and indeed this is the case.

    To be frank, our mission started simply enough: we wanted to sell toys. Mea Culpa. The idea that a geometric puzzle could be useful in thestudy of genetic translation was pure marketing - just this side of a hoax. Then the Rafiki map was discovered and the attitude quicklychanged. Rafiki became intensely determined, obsessed if you will, with the task of learning the fundamentals, and then the gestalt of the

    genetic code.

    Science is today without a viable paradigm to explain the origin, meaning and general function of genetic information. Our view of geneticinformation and translation has become little more than faith-based philosophical doctrine. The perplexities and counter-instances to thecherished dogma are multiplying like Fibonacci's famous rabbits. Therefore, this is an area of science desperately in need of a paradigmshift. Come hell or high water, Rafiki intends to see that shift take place.

    Rafiki has come full circle. Instead of using the genetic code to sell toys, We intend to use toys to sell a new paradigm of the genetic code.

    Hundreds of pages of text and twice as many illustrations have been generated in a determined effort to explore and communicate theseeccentric, heretical ideas. I have no illusions about my abilities as a writer, scientist or persuader of scientific paradigms. In fact, I know that Iam marginal in all of these areas. My language is a tragically imprecise hodge-podge of non-scientific colloquialisms. Much of my thinking isunder-developed and confused, and some of the writing , especially the earliest writing, veers dangerously toward gibberish, but there's gotto be a pony in here somewhere. Besides, all of it is just good, clean fun, and making it available might provide someone somewhere theinsightful spark they need to pick up the ball and run.

    The bottom line: the dogma doesn't make any sense to me, and I've yet to find any evidence to confirm the cherished and ritualisticallyprotected dogma. What started as a lark is picking up momentum and credibility, and it is on its way toward full-scale scientific jihad. The

    Rafiki notions, although blatantly heretical, make good sense from most angles, but these ideas consistently draw immediate disdain.Granted, a big part of this reaction is due to the eccentricities of the messenger as well as the outrageousness of the message, but theevidence just isn't there to reject these wacky ideas out-of-hand. In fact, existing evidence appears to support the heresy. So the beatingswill continue until morale improves, and the crusade will march forward until the empiric death knell is sounded. Science is being put onnotice: the dogma is being put back into play. Definitive empiric evidence is now finally being requested.

    This is a quest, a search rather than a destination. It's a search worth making, and it's time that somebody formally recognize the need tomake it. The genetic code is an unsolved puzzle and a puzzle worth solving. It's fascinating and fun to boot!

    I am an inventor. I am finding out that Inventors and scientists are two different kinds of animal. They typically are at odds - understandably -but they each depend on the other. Inventors piggyback on the good work of scientists, and scientists advance through the fruits ofinvention. There is a dichotomy to the standard mindsets however. To paraphrase a popular phrase, inventors are from mars and scientistsare from venus. Scientists spend entire careers adroitly learning the nuances of dogma, and then they struggle in the good fight to cleverlyapply the existing dogma for outstanding problems - primarily in an attempt to bolster the dogma. In short, they are believers. (This view ofworkaday scientists is more deftly espoused by Thomas Kuhn.) Inventors, on the hand, never met a dogma they didn't hate. What happenswhen a cherished scientific dogma fails?

    The dogma behind the genetic code has failed.

    In the words of Thomas Kuhn, the theory is in crisis, or should be, and we must look forward to the next scientific revolution. The empiricevidence is greatly at odds with the paradigm, and this humble ;-) inventor intends to contribute his two cents to the new paradigm. Here arethe broad strokes. The devil, as always, is in the details.

    http://www.codefun.com/Index_books.htm#philosophy
  • 7/28/2019 The Rafiki Map

    2/72

    Genetic information has more than one degree of freedom in translation from nucleotides to proteins. The function that translates geneticinformation and converts primary sequences into proteins is not linear or one-dimensional. Codons are inherently ambiguous withanticodons, and this is informative during translation. In other words, there is more to making a protein than just sequencing amino acids. Inshort, synonymous codons aren't entirely synonymous.

    Any complete view of genetic translation must include the information added to the process by tRNA.

    Just as molecules and molecular information assume an ideal form, so too do the rules of molecular translation. Information has structure,and the translation of information has structure as well. The ideal form of genetic translation is a dodecahedron.

    Life must constantly find new and diverse sets of protein morphologies on the landscape of all possible protein morphologies. The geneticcode serves as a search engine in that task. The search cannot become stuck in isolated regions of the protein landscape, so it is optimizedto perform a robust and efficient search for new protein forms. Genomes and the genetic code are both full of symmetries. Thesesymmetries are co-adapted, like hand-in-glove, to execute an optimized search of protein morphologies.

    Symmetry lays the foundation for all of life. It is the starting point for all molecular processes, and organic molecules are no different.Symmetry lays the foundation, but it is symmetry breaking that actually defines organic information.

    These web pages are evolving along with the ideas. But originally they were mostly cut and pasted together from the thinking, investigatingand writing that has been done since openning Pandora's box on the genetic code. I intend to continue editting and adding to these pagesthrough time, but I especially look forward to posting the continuing contributions of others. The original Rafiki goal was to publish a generalinterest book to help circulate the eccentric ideas about genetic information and the processes of life. On that score I have so far failed.Writing in general is really time-consuming, hard work, and this project continues to expand and evolve, so web publishing is the bestoption at this time. Two books were actually written and illustrated, but they are woefully uneditted, and no serious steps toward printingwere taken. Rather than rotting in the digital ether of a single computer, they have been made available for completeness, entertainmentand as a historical documentation of this fascinating journey - a peek inside the creative process, so to speak. Why not? The web makesanything possible.

    The first book carries the cryptic working title of 'Rafiki - At the Edge.' The second one was given the working title 'Organic Computers andGenetic Information - the Rafiki Code.' The first one was done in color, and is quite colorful, but the second was done more soberly and costconsciously in anticipation of self-publication. It is tastefully done in black and white, and formatted for printing in 6' x 9'. Perhaps I will returnto the book project at a later time, but for now a 'real book' is still a part of the twisted Rafiki dream.

    At some point after writing the first book, I started writing shorter pieces in an effort to get some ideas published in an academic journal.Some of these pieces will surely be posted, but in general the idea of scientific publication is a rat hole for my time. Again, I failed. As Iadmitted earlier, I am no scientist, and there seemed to be no place for Rafiki in big-league science today. Rafiki is the science of tomorrow,I suppose. I think it was Hunter Thompson that said, 'when the going gets weird, the weird turn pro.'

    Some spit and vinegar (another typical internet rant of the heretic): I am frustrated by the institutional arrogance that I routinely encounter,the lack of curiosity or sense of adventure, and mostly by my inability to pick a good fight. Clearly, these folks are too busy chasing grantsand making a living to spend any time on their own ideas, let alone the ideas of cyberspace heretics. Their disdain is comprehendable;however, the close-mindedness is stunning. 'Not my area' is the scientific euphemism for 'piss-off, piker'. I was actually told by one of the bigboys at the get-go, "you're wrong, and so what if you're right?" Can't really argue with that, but it does remain quite vivid in my memorynonetheless.

    Here is my rebel yell:

    Proteins are not functionally or logically equivalent to sequences of amino acids, and the path to understanding this must passthrough a dodecahedron!

    The rally cry: Symmetry!

    Much of the earlier hard work as a writer was cut apart, recombined and augmented in these web pages. I have made the originals availablefree of charge in the books section of this site, but they represent dated material. They are useful, colorful and entertaining, but they are notbeing updated as Rafiki learns and grows.

    If you have wandered into the Rafiki circus as an unwitting novice, attracted by the glitz and pretty lights, please do not be frightened awayjust because you might not have a rudimentary familiarity with cell biology, biopolymers, DNA and proteins. There are many good web sitesfor these basics. Here are some l inks.

    Great intro to proteinsFolding@HomeNorth Harris CollegeGenetic Science Learning Center

    Freeland LabDNA SciencesHypermedia Glossary Of Genetic TermsThe Dictionary of Cell BiologyThe Human Genome ProjectUnraveling the Mystery of Protein Folding

    The basics are not that hard, and it doesn't take too long to get up to speed - not as long as you might think. A contemporary textbook in 9thgrade biology is all you should need to get started and join our search for a better understanding of genetic information and translation, thegestalt of life's processes. In fact, the less indoctrination the better. Smart folks with computer, math, art, cryptology or just puzzle-solverbackgrounds will thrive in this area.

    Whatever you do, don't drink the cool aid . Look at the situation with fresh eyes, and make your own critical assessment. Don't accept the'everybody knows' argument from authority without being shown the goods.

    The thrust in these pages is slightly more abstract than any biology text treatment of the subject. That is the nature of the beast, and thepreferred realm of this author. I do not intend at this time to create primer pages for the basics, but you never know. Please share with ussome of the web pages that you find particularly good or useful in these areas. We are interested in views and studies that either confirm or

    contradict the ideas put forth here, and we are just getting started.

    http://www.faseb.org/opar/protfold/protein.htmlhttp://www.ornl.gov/TechResources/Human_Genome/home.htmlhttp://www.mblab.gla.ac.uk/dictionary/http://hal.weihenstephan.de/genglos/asp/genreq.asp?list=1http://www.genaissance.com/dnasciences/sectionHome/sectionHome_DNAbasics.asphttp://www.evolvingcode.net/web_introduction.phphttp://gslc.genetics.utah.edu/units/basicshttp://science.nhmccd.edu/biol/bio1int.htm#biochemhttp://folding.stanford.edu/science.htmlhttp://wine1.sb.fsu.edu/BCH4053/Lecture08/Lecture08.htmhttp://www.codefun.com/Index_Books_Rafiki.htmhttp://www.codefun.com/Index_Books_Rafiki.htm
  • 7/28/2019 The Rafiki Map

    3/72

    This should be controversial

    The idea that primary sequence alone determines tertiary structure in protein folding should be a controversial idea.

    "a beautiful example of how an entirely acceptable conclusion can be reached that is entirely wrong because of the paucity ofknowledge at that particular time. I spent the following 15 years or so completely disproving the conclusions reached in thiscommunication."

    Christian B. Anfinsen Comment made in 1989 regarding earlier work on the structure of RNase

    There is no doubt that Christian Anfinsen was a great contributor to the body of scientific knowledge. His main contribution wasin the field of protein folding, and within that field one particular conclusion sticks out: Primary sequence determines tertiarystructure. For this conclusion in 1954 Anfinsen shared a Nobel prize in 1972. The origin of the idea, 'the thermodynamichypothesis of protein folding,' can be glimpsed at the end of the famous 1954 paper:

    This hypothesis should still be controversial. Where is the evidence to support the radical conclusions that were subsequentlymade?

    Anfinsen did not provide the evidence to support all of his conclusions. It is unquestioned - even by him - that he did nothave the ability to determine the shape of even a single protein. He certainly had no way to confirm a theory regarding theshapes of all proteins. This is in fact what he was looking for, and empiric confirmation is still lacking. It will be difficult to obtain,because his conclusions are fundamentally flawed.

    The confusion (and I got caught by this initially as well) is due to the fact that Anfinsen actually proposed two radical ideassimultaneously. They are almost always taken to be the same idea, but they do not necessarily go together. I agree that thefirst idea was all but proven - but the other idea is vastly more radical, and wasn't proven by his experiments. Here's thebreakdown of the two ideas:

    1. Due to thermodynamic molecular forces, polypeptides automatically assume unique, stable conformational ensembles. Thismight also be termed the auto-assembly hypothesis of protein synthesis.

    2. For every sequence of amino acids there is a unique, defining conformational ensemble to which it must auto-assemble.

    Anfinsen did a fabulous job of validating point #1, but didn't even scratch the surface on point #2. Conversely, it should besimple to disprove the second idea, and in fact it might already be disproved. This is primarily because proving the idearequires proof of a negative: specifically that polypeptides in physiologic conditions cannot consistently fold more than one way.Accepting that they can and do fold consistently in more than one way requires a mere handful of examples where this isknown to happen. (For a nice overview of protein folding go here: Unraveling the Mystery of Protein Folding )

    The easiest proof of this 'multi-target' view of folding is a prion. Prions are proteins involved in bizarre infectious diseases, suchas mad cow disease, where normal proteins are forced to assume shapes different from their 'native state'. Regardless of themechanism, a prion is an example where the same sequence defines at least two different proteins, and in all probability manydifferent proteins. This argument extends to other diseases as well, diseases generally described as amyloidosis. It is believedto be the process behind Alzheimer's and several forms of cancers to boot.

    When this is pointed out to cool aid drinkers the protests, excuses and apologies fly. For some reason, prions don't count, butthe stark reality is that it is thermodynamically possible to make two distinct proteins from the same sequence of amino acids inphysiologic conditions. They say, "there are exceptions to every rule." OK, show me a case where the rule actually holds. It

    should be simple to take a protein, pepper its nucleotides with copious (not just a few) 'silent mutations' and then thoroughlydemonstrate that the protein remains completely unchanged. If this study exists, I have yet to find it, and in fact we can find theconverse.

    The single-target model itself flies in the face of common sense, and actually seems rather absurd, so shouldn't we require atleast some indisputable empiric demonstrations of this cherished model, rather than handfuls of 'everyone knows' antecdotes?

    Today, with a vastly larger amount of more sophisticated evidence, the accepted hypothesis fails. The whole issue must beplaced back in the context of information theory. What Anfinsen essentially proposed was that the only information that must beextracted from nucleotide sequences in translation is residue sequence. The rest is autopilot. This theory is compelling notbecause it is consistent with the data, but because, in the words of Anfinsen, it is a 'considerable simplification.' In fact, it is anover-simplification. It effectively collapses the information content of a protein to residue configuration alone. Preposterous.

    Dogma recognizes two protein states: 1. random coil, 2. native state. This makes a tautology of 'the protein folding problem'. Ifthe investigation of folding begins with the stipulation of a single uniform, high-energy state - random coil - and it is assumedthat the result of folding will inevitably find a single, stable low-energy shape, what is there left to decide?

    http://www.faseb.org/opar/protfold/protein.htmlhttp://profiles.nlm.nih.gov/KK/B/B/J/T/_/kkbbjt.pdf
  • 7/28/2019 The Rafiki Map

    4/72

    Sequence and structure are not equivalent!

    The sequence of amino acids in a polypeptide string is a major component of the information in a protein structure, but it isclearly not the only component. The issue now should be to identify the other components and elucidate the informationmechanisms that deliver them to the final structure. This must begin by questioning the unproven assumptions behind the twostates of protein folding.

    The three basic issues are:

    1. How many target structural ensembles are thermodynamically available for an amino acid sequence when it folds?

    2. How many distinct conformations emerge post translation - during or just prior to definitive folding?

    3. What is the correlation, or what are the folding pathways between the two sets, the sets of possible initial and finalconformations of amino acid sequences?

    If there is only a single possible state to either the initial or final conformations of every sequence, then we can happily go aboutour business as usual. However, if there are in fact multiple initial conformations coming out of translation, multiple finalconformations, and a correlation between the two, then investigations of protein folding must take a completely different tact.

    The implications of accepting the dogmatic sequence=single structure viewpoint are paramount to our view of the genetic code.If it is correct that primary sequence and only primary sequence determines a single tertiary structure, then the model can berehabilitated, but if it is false then today's one-dimensional model of the genetic code is beyond repair.

    If the paradigm were actually correct we should expect to see certain things, irrefutable evidence to support it, but there is

    nothing to presently justify our blind faith in it. Anfinsen did not justify a belief in the linear paradigm of the genetic code. Thereare boundless accounts of investigator's experience that suggest the linear paradigm is secure, but conspicuously there is nodisciplined proof. There are no well-designed studies to confirm the single-target hypothesis. Whereas there absolutely shouldbe a famous study easily pointed to, reassuring us that the axiom sits on a rock-solid foundation. Anfinsen did not provide it -could not provide it - so where is the subsequent definitive study to fill this important void?

    Common sense and the empiric evidence points in the other direction. There is in fact more than sequence informationdetermining the native conformation of proteins, and ultimately the physiologic behavior of entire protein populations.

    The logical view is that the genetic code is more subtle, more powerful, and more complex than the beloved, over-simplifiedparadigm has led us to believe. There is a tremendous amount of work to be done before we can claim that this code has infact been cracked!

    Good questions with links to a few empiric answers:

    Besides folded conformations, what are some of the accepted ways that protein populations are known to change with 'silent

    mutations'?http://nar.oupjournals.org/cgi/content/abstract/26/20/4778http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11049749&dopt=Abstract

    Is there any proof that synonymous substitutions are associated with structural differences?http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9680473&dopt=Abstracthttp://nar.oupjournals.org/cgi/content/full/27/1/268 Quote from the paper: "These results support the view that structure-relatedsynonymous codon bias is a general phenomenon found in all major taxonomic groups of organisms."http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=8897597&dopt=Abstract

    Does a 'silent mutation' actually change protein folding?Silent mutations affect in vivo protein folding in Escherichia coli.

    Does a synonymous substitution actually have an evolutionary impact?http://www.american.edu/cas/bio/faculty_media/carlini/Carlini&Stephan2003.pdfhttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=140546

    More important than the overwhelming evidence against the idea of sequence-only folding is the lack of evidence for it. Thistheory, that is so cherished, should be proven before we continue to cherish it. Why do we think that translation andsubsequent folding should be perceived as a sequence-only process? The genetic code has the structure in place to go wellbeyond sequence during translation. Why do we think it couldn't or doesn't?

    There is more involved in making a protein than translating a sequence of amino acids, and there is more genetic informationthan residue identity stored in the nucleotide sequence. This information must exist in some form and somehow get translatedto the native conformation. How?

    Where is the intellectual curiosity?Where is the skepticism of dogma that frankly seems absurd?Where is the controversy and debate?Why do people get so mad when these questions are raised?

    If the proof to whether or not sequence really is the only determinant of structure is so certain, then it should be readilyavailable. 'Everybody knows' is not an adequate defense of this position, because it is too easily rebutted by 'liar liar pants on

    fire'. More likely, science has fallen asleep at the switch, and we all do get a bit surly upon abrupt awakening.

    My ears are open - send in the proof.

    http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=140546http://www.american.edu/cas/bio/faculty_media/carlini/Carlini&Stephan2003.pdfhttp://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WBK-45NSFW1-2X&_coverDate=04%2F26%2F2002&_alid=110452133&_rdoc=1&_fmt=&_orig=search&_qd=1&_cdi=6713&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=411d74d09a6b57df8e7fc021cae8802ehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=8897597&dopt=Abstracthttp://nar.oupjournals.org/cgi/content/full/27/1/268http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9680473&dopt=Abstracthttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11049749&dopt=Abstracthttp://nar.oupjournals.org/cgi/content/abstract/26/20/4778
  • 7/28/2019 The Rafiki Map

    5/72

    a can a con r u eThe most significant contribution from Rafiki is in recognizing the deficiencies in the current model, and then going on todoggedly make a stink about it. Rafiki is nothing if not irritatingly persistent, but the emperor is naked on this one, andRafiki is child-like enough to point it out. We've all been told that the genetic code has been cracked, yet many importantquestions are unanswered - in fact some really important questions have remarkably never been asked. Consequently,the genetic code remains unbroken, and our gestalt of the fundamental organization of life's clever molecular sets isfound wanting. In essence we have only partially solved the puzzle of how proteins are made from information in DNA, butit is clearly a puzzle worth solving.

    Rafiki has contributed four basic ideas:1. The genetic code is not one-dimensional. There are other dimensions of information, besides primary sequence, that

    are a part of the genetic code. Another way to put this is that the thermodynamic hypothesis of protein folding is false.Thermodynamics are important to folding, but it is not the only factor involved.

    2. There are many ways to map the correlations between nucleic acids and amino acids, but the best way to map them ison the surface of a sphere.

    3. The genetic code is optimized not just for making proteins, but for finding new proteins as well. The process andmechanisms of new morphology creation was given short shrift in past considerations of the genetic code. The symmetryof codon assignments combines with genomic symmetry to accelerate the search for new protein morphologies.

    4. Our language, metaphors and conceptual tools for studying genetic information are outdated and woefully inadequate.In addition to pointing this out, Rafiki has proposed a few modifications.

    I'm frequently asked what I'm going to do next. Well, you're looking at it.

    What do you want me to do, build a biochemistry lab? All I can do is keep shouting, because I'm not going to become abiochemist. I've actually got a life. In the mean time Rafiki has provided some excellent tools to advance the cause.

    Code World is a nifty contribution toward understanding the genetic code. Code World and the genetic code are notliterally one and the same, but their informative structures are. In this regard it is a fabulous tool to help us think about theproblem of genetic translation at its very core. In other words, it shows us how two shapes can communicate information.In this case, it shows how information stored in a dodecahedron can be communicated to a tetrahedron. Please thinkabout these concepts in the simplest possible terms, because that is what it is ultimately going to take to eventually getthe job done: What is information? what is language and communication? How could the molecules of life organize andexecute a language and achieve such sublime communication of information? This is the appropriate starting point inunderstanding a molecular code for genetic translation. This is an organizing principle for molecular languages, and it ismissing from the dogma today.

    Information must have a structure. Molecules must have access to this structure in some form when they executemethods related to that information. On what structures could these methods be implemented? Molecular information isrelated to spatial relationships, because this is fundamentally how molecular behavior is determined. The Rafiki Map takes

    the process one step further and shows us how the actual code is perfectly arranged within a dodecahedron. It effectivelydemonstrates the context of each molecular set in the code. It provides the most compressed, symmetric and objectiveview of this vital data. It perfectly demonstrates nucleotide triplets within the overall framework of the code, and itemphasizes the importance of unordered triplets in the form and function of the code. The Rafiki map highlights how thesymmetry and coordination of these triplets can work in concert with a genome loaded with the symmetry of sequencetransformations to efficiently generate novel protein morphologies. Because of the dodecahedral arrangement, theinformation involved in the genetic code can finally take shape, and the spatial relationships help define the information.

    The Rafiki map is simply a superior way to view the data.

    In all likelihood, Rafiki will not break the genetic code alone (somebody really smart probably will). But by raising longoverdue questions, and by providing useful tools - Code World and the Rafiki map - we are contributing to scientificadvancement. Perhaps this website will become an icon and a forum for that advancement, or perhaps that forum willdevelop elsewhere. If you are like-minded, curious or even just amused, please link with us. The more the merrier.

    The primary theme here is symmetry, and within that theme the dodecahedron takes center stage. The three basic areaswhere applications of the dodecahedron to life and the genetic code are:

    Translation. What is the fundamental nature of genetic information; how much of it gets translated, and by whatmolecular mechanism? If codons are truly synonymous and mutations are truly silent, why do they empirically make animpact on all areas of translation? The Rafiki doctrine of 'symmetry first' when applied to actual genetic translation, is themost persistent and heretical position taken here, but it is also the most important one to resolve.

    Teleology (The use of ultimate purpose or design as a means of explaining natural phenomena.) What is the origin,history and degree of adaptation in the genetic code, and how has symmetry played its role?

    Geometric and numeric. Can there be an optimum form to the structure of a code such as the genetic code, and wouldthe numbers involved actually have an impact on that code's optimized behavior? In this case, the basic numbers are 3,4, 20 and 64, which suggests the introduction of a dodecahedron. Is this just numerology, or is the code actually acombinatorial optimization of molecular sets as suggested by the numbers?

    All three of these areas wil l require efforts for years to come of biochemists, computer scient ists, physicists, chemists,cryptographers, philosophers, mathematicians; in general - puzzle solvers. Let's have at it!

    http://www.codefun.com/Genetic_see.htmhttp://www.codefun.com/Genetic_max.htmhttp://www.codefun.com/CodeW.htmhttp://www.codefun.com/Genetic_see.htm
  • 7/28/2019 The Rafiki Map

    6/72

    What is Information?

    While working at Bell Labs in 1949, Claude Shannon formally marked the launch of information theory with his book titled The MathematicalTheory of Communication. It came at a time when biochemistry was blazing trails into genetics, so it was natural that information theory played ahuge role in forming the discourse around the genetic code.

    Information is a conceptual entity of the universe that is gaining status with scientists in many areas. Physicists and computer scientists areunderstandably enamored with the idea that information is a physical reality. Information as a universal entity is a mostly mathematical construct,but information in our universe really does seem to have a degree of independence from its temporal physical embodiment. In this way it appearsto possess features like mass or energy. Just as potential energy is easily translated into kinetic energy, information can be translated from one

    form to another. We practice this principle when we read and write, or speak to each other. In this case information is encoded for travel invehicles that we call languages, and languages are at the heart of information translation of all sorts. Languages are codes of communication.

    In a general sense, information travels in any system that has a finite number of discrete choices, and it is quantified by a metric known as a bit.This is a bit confusing (pun intended) because the word bit is also used to describe a binary digit. Binary digits are merely symbols; they aresymbolic vessels that carry different amounts of information. The actual amount of information carried by one binary digit will depend on a numberof things. Consider a thermometer. It is an instrument or vessel used to measure temperature, and binary digits are like the markings on thethermometer. The value announced by the markings on a thermometer stand for a quantity of heat, but there is another measure to be applied tothe information provided by the instrument. The value of knowing the measurement and the precision of the measurement is different from themarkings of a scale. Heat is heat, but we can have various amounts of information about it. For instance, just knowing whether something is hot orcold provides one bit of information. Knowing the heat in a system with four possible temperatures provides two bits of information.

    It is true that a random binary digit contains one bit of information, but it is not true that a binary digit and one bit of information are synonymous.Anyone familiar with digital compression will recognize this difference. For instance, a digital image stored in one million bits of computer memory

    might easily yield to compression, and the information required to visually describe the exact same image in two different systems might becompressed into, say, one thousand bits of data. This means that each bit in the original image contains only about 1/1000 of a bit of visualinformation. Of course, realizing this compression windfall requires knowledge of an encoding schema. It also requires that we find patterns in thedata. Most significantly, it requires awareness of the system making and displaying the various potential forms of information contained by theoriginal bits.

    There is a big catch. There might be additional dimensions of information beyond just visual data contained in a file storing a digital image. Thisthen requires creative exercises in defining the quantities, forms and dimensions in various information systems. For instance, a secret textmessage, attack at dawn! might be cleverly encrypted in the hypothetical digital image just described. In this case, there are two dimensions ofinformation contained in the image, and one of them is in danger of being missed. The actual information content of each bit might decreasesignificantly by overlooking informative dimensions and using a careless compression scheme. Our ignorance of the encrypted text messagewithin the image data very well might cause us to unknowingly throw out valuable information when we compress the original file.

    There are several useful techniques we can use to identify the content and form that a quantity of information might take. It is bittersweet that theinformation identified by many of these techniques has been given the name entropy (H), because, of course, entropy has another meaning inphysics as well. The two concepts are very similar, statistical concepts, and perhaps at a profound level they represent the same concept. Myintuition is that they do, but there is a real danger of confusing the entropy of information with the entropy of thermodynamics. Nonetheless,entropy is the name we shall use here, and it will be a useful concept in our examination of genetic information systems. To limit the potential forconfusion, I will use the term entropy here only in the context of information entropy.

    In broad strokes, information entropy is a measure of the value of knowledge. Specifically, it is thevalue of knowing the precise choice made by a system given a discrete number of choices. Forinstance, the entropy of knowing the outcome of an honest coin toss is one bit, because an honestcoin toss is the epitome of one random bit of information.The coin might land heads or tails withequal probability. Knowledge of the actual outcome is worth one bit of information.

    6 bits of information.

  • 7/28/2019 The Rafiki Map

    7/72

    However, the uncertainty of an honest coin toss is at a maximum. Conversely, a two-headed coin lands heads every time, so the uncertainty andtherefore the entropy of any number of these absolutely rigged coin tosses are reduced to zero. We know the results without even tossing thecoin, so the value of knowing them is nil.

    Zero bits of information from a 2-headed coin.

    Similarly, if the coin is rigged somehow with a probability of 75% heads, 25% tails, then the entropy of knowing outcomes from this coin iscalculated to be 0.811 bits per toss. This curious value is derived from the following formula provided us by Shannon, where P(x) stands for theprobability that x will occur.

    Therefore, entropy is embedded in the concept of uncertainty. As previously described, it is no accident that this formula resembles the formulafor thermodynamic entropy. Information entropy is the sum of uncertainties of a finite state system existing in any potential state of the system. Asuncertainty changes, or the number of potential states changes, so too changes the entropy of the system. The challenge of measuring this in anysystem lies in our ability to identify the number of potential states and their probabilities. This is generally how we shall approach our efforts toquantify genetic information, and in order to do this we will rely on some combinatoric properties of discrete mathematics. The following definitionsare quoted from the website ofWolfram Research.

    Discrete Mathematics - The branch of mathematics dealing with objects which can assume only certain discrete values. Discrete objects can becharacterized by integers, whereas continuous objects require real numbers. The study of how discrete objects combine with one another and theprobabilities of various outcomes is known as combinatorics.

    Combinatorics - The branch of mathematics studying the enumeration, combination, and permutation of sets of elements and the mathematicalrelations which characterize these properties.

    Just like the four-temperature thermometer, or two coin tosses, there are two bits of information in a perfectly random sequence of nucleotides,which can be thought of as a genetic information channel or signal.

    If we follow the information through translation a little further we find a curious thing: the information content appears to go up and then down.

    The metric is in codon units, or 'triplet equivalents' (TE). The surprise for most people is that information content goes up through the tRNA phaseof translation. This is because of wobble, ironically, since it introduces new choices at the third nucleotide position. There are 160 anti-codons,whereas there are only 64 codons. Of course, there are typically only 20 amino acids, so the information content appears to fall, but does it really?

    The actual information content at each of these stages depends on the actual number of each molecular type present during translation, and theprobability of each being used. We have yet to get a good handle on tRNA in this regard. In fact, there might be hundreds of tRNA molecules withslight variations in a given cell. How might these tRNA variations affect downstream information?

    The key to genetic information at the amino acid stage is not only what is being used but also how it's being used. Sure, leucine is always leucine,but are there different ways to use it in translation. What about a tRNA that puts leucine into a peptide chain rapidly vs. slowly? This distinction, aswith the thermometer example above, delivers one bit of information to translation. More importantly, the genetic code is now working in a seconddimension. This is what is known as an additional degree of freedom. In sciencespeak we are talking about 'translation kinetics.'

    Variation in translation kinetics is an absolutely proven reality that 'codon usage' impacts translation outcomes, and the consequences of this arenot trivial. This is the mechanism that theoretically drives conformation changes in protein folding. The important thing to note is that translation

    has been experimentally proven to operate with more than one degree of freedom. In other words, knowing the amino acid sequence is now notenough to allow us to determine the outcome of protein folding! We must have additional dimensions of information; therefore, it cannot be a one-dimensional code.

    http://mathworld.wolfram.com/
  • 7/28/2019 The Rafiki Map

    8/72

    Beyond kinetics, what other degrees of freedom might there be? The next most obvious candidate is fidelity. Not every tRNA will be as reliable asthe others. As we have just seen, when probabilities change, information changes. So, if two tRNAs deliver leucine at translation, but one does itmore reliably than another, then the two deliver different amounts of information.

    The most interesting prospect for how one leucine residue might differ from another during translation is related to spatial orientation. Since tRNA

    vary in size by up to 40%, it is not unreasonable to expect that they behave differently in a spatial sense at the point where a peptide bond isactually made. If the spatial differences in tRNA impact the nature of the bond that is made, then spatial information is being delivered duringtranslation. The simplest but most significant case would be in making a distinction between the formation of a cis-peptide bond versus a trans-peptide bond. Beyond that it is not obvious how many choices might be made with respect to bond angle rotations. This absolutely is a plausiblehypothesis relating to the mechanisms of genetic translation, and there is no empiric evidence that it doesn't work this way. In fact, common sensepredicts that it does, and the observed results of translation supports common sense. What is the most plausible structure for efficiently handlingthis spatial information?

    I am told that the technology is not yet to a level where peptide bonds can be measured at the point of translation, but that time is coming. When itis finally proven that all bonds are not created equal, there will be a renewed interest in the genetic code. The downstream effects are real, andtheir study is the next great frontier.

    You heard it here first.

  • 7/28/2019 The Rafiki Map

    9/72

    What is the genetic code?

    This is the $64,000 question, what is the genetic code? The thrust of this rambling page is that we don't really know the answer tothis exceptionally important question. The genetic code clearly must involve the process of making protein based on information inDNA, but precisely how is this done? Here are some equally important and related questions:

    What is life?

    What is DNA?

    What is a protein?

    Life, as described by Erwin Schrdinger, is an aperiodic crystal. DNA is a crystal, and so is protein. Note the pattern. The cleanest

    view then to the nature of the genetic code is that it is a complex algorithm through which one crystal is formed according tospecific properties of another, sovereign crystal. This then requires a language where two crystal types must somehowcommunicate information. Tough job, don't you think? It's hard to believe there's a way to consistently do it!

    Science works by simplifying complexity, but in the case of the genetic code we have oversimplified. The chest-beating certainty ofthe scientific community has obfuscated the fact that we still do not understand nature's nifty crystal-forming algorithm affectionatelyreferred to as the genetic code. We have mastered the well-worn correlation data between sequential components of each crystaltype, but this is not the same as the language that communicates genetic information into a fully formed crystal. Science hasdogmatically insisted that they are the same thing, but where's the proof? I doubt that we'll ever see it.

    News flash: The earth rotates around the sun. Note to astronomers: lose the epicycles.

    All life on this planet is based on a genetic code. It is a system that somehow defines the construction of living things by directing theprocesses of molecular synthesis and replication. In 1953 James Watson and Francis Crick described a double helix as the structureof a huge molecule called DNA, which was known to reside in the cell nucleus and store the secrets of the genetic code. Excitementgrew, and by 1960 leaders in science were predicting that nature would be laid bare within a year, creating justifiable fears. If manactually controlled the genetic code, what would happen to life on earth? Salvadore Dali seemed to anticipate mans dominion over

    nature and its relationship to a higher truth, as shown in his painting The Temptation of Saint Anthony.

    The predictions and accompanying fears proved unfounded, however, since the code wasnt completely broken for another tenyears. Entirely synthetic life has yet to be created, and today, despite tremendous strides in genetic engineering, there is a generaldisaffection with the code. It appears that the code alone was not enough to allow man dominion over nature. The full glory ofprotein synthesis remains a mystery, so we have now moved past the code and on to proteins themselves. According toconventional thinking, the genetic code is so simple and buttoned down that its logical foundation appears remarkably trivial. Instead,todays glamour boy is the protein the idol to proteomics. It is the study of proteins and their many eccentric habits of folding thatdominates the search. Proteins are so devilishly complex that Breaking the protein makes breaking the code look like childs play.Fortunately, we have a tremendous amount of technology to help with the task as compared to 1960, and some of the greatestscientific minds are focused on a solution.

    Surprise!The genetic code is childs play.

    Enter the child.

    A funny thing happened on our way to dominion: somebody everybody forgot to break the other half of the code. A centralpremise of these pages is that the genetic code is far different from our conventional view of it. The genetic code somehow makesproteins, not just sequences of amino acids. (No, they are not the same thing.) I attempt through these writings to illustrate thisobvious fact, and the implications of having missed it. I also intend to swing a machete in the general direction of any sacred cowthat ambles into view.

    Thats how children are childish.

  • 7/28/2019 The Rafiki Map

    10/72

    Starting with a modified version of the standard Watson-Crick table we see the set of codons and amino acids that are a part of thegenetic code. From the conventional perspective, the genetic code starts and ends with pairing codons and amino acids.

    This table, in its various forms, has today come to represent the entire logic of the genetic code. I arranged this table on a somewhateccentric scheme. The important thing to note, however, is that we can arrange this table any-ol-way we like. There is no correctway to arrange and display this data according to our conventional view of the genetic code, and many different ways are in use.Since we cant say for sure where nature got the data to begin with, and we believe there is no absolute meaning in its arrangement,we are free to view the organization of this data as arbitrary. This is strongly related to the premise that the genetic code is one-dimensional, which means that it contains only one degree of freedom with respect to the information it handles. The one dimensionis the pairing of codons and amino acids. These two concepts are self-supporting to the point of forming a tautology. If assignmentsare arbitrary then the code is one-dimensional, and if the code is one-dimensional then assignments are arbitrary. Regardless, theparadigm of a one-dimensional code leaves no room for any absolute foundational logic. Adding, subtracting or shufflingassignments will change the output of the code, but leaves the foundational logic of the code unchanged.

    Acceptance of this view is not merited by empiric data, however, and it is extraordinarily detrimental to our study and use of thegenetic code. The accepted doctrine has prevented the asking of important and fascinating questions, many of which shall beaddressed in these pages. I find the one-dimensional view of things now absurd and untenable, especially in light of discoveries ofthe past five years. Warning bells should be going off all over the place, but they have not. A view of a one-diminsional code isvirtually impossible to rehabilitate in even the broadest of terms. There simply is more than one degree of freedom in translation ofgenetic information, and all of the information must be embodied in our model of the genetic code.

    Some might quibble with the precise language of my description, but the conventional approach is yet unchallenged, and I thereforeintend to aggressively challenge it. From a Rafiki perspective the nature of the data in the above table is the furthest thing fromarbitrary, and there is indeed a best way to arrange and view it. Genetic translation is an objective, molecular process. Geneticinformation must therefore be founded on objective molecular structures. There are at least two dimensions of information in thegenetic code, and surely many more. Open to debate are the natures, forms and actual mechanisms of translation for these

    additional dimensions of information.

    Like the periodic table of chemical elements, there is a sublime logic to the assignment of amino acids to codons.Without this insight we are blind, and the genetic code goes from a periodic table of elements to a table of periodic elements as viewed by Michael Teague.

    Table of Periodic Elements Michael Teague

  • 7/28/2019 The Rafiki Map

    11/72

    The conventional view of assignment data becomes particularly dysfunctional when we return to the premise of having a geneticcode in the first place. We intuitively know that cryptic information is contained in one set of molecules and communicated toanother set of molecules. We know this because we can witness the process and its results (protein synthesis). The key questionsare, what information is in there, and how does it get communicated? If one accepts conventional wisdom, the answers are, notmuch, and with a single, simple set of linear correlations. These answers are incorrect, and the insistence that we cherish them aswe have for so long has led to a truly comical view of the genetic code. More comical is the defense of it, as history will record. Allindoctrinees are in the trance of a more than forty-year post-hypnotic suggestion, causing obvious anomalies of the paradigm to gounnoticed. This is most unfortunate, so it is our job to correct it. We will start with some basic questions.

    What is the origin of the language, or how did Life get started?

    With so many alpha-amino acids to chose from, and room for 64 in the code, why does the standard set only contain 20?

    What is the logic behind the arrangement of nucleotides, codons and amino acids?

    Since the mirrors of alpha-amino acids (L and D) are equally stable and exist in equal proportions within the abiotic areas of theuniverse, why are all of the standard amino acids in the L form?

    In such a beautifully rapid, accurate and efficient information system, why is there such an ugly redundancy?

    With few exceptions, the above system appears to be used in all species and presumably back through time. Given the ravages ofevolution - changing properties of organisms rapidly and constantly - one might expect some branching into competing dialects ofthe genetic language. At least the redundancy of the language should be subject to widespread change, since it has no absolutemeaning. How could this exact system exhibit such dogged durability across time and throughout species?

    We now know the shape of DNA a double helix and we know the functional significance of this shape, but this is only geneticstorage. After all, it is a complex 3D information system, and shape imparts structure, function and meaning. What is the

    fundamental shape and meaning of the genetic code when it performs its magical role during protein synthesis?

    Answers

    To pick up a good biochemistry text today one might imagine that either there are generally accepted, plausible answers to thesequestions, or the questions are too unimportant to merit any attention or real answers. To wit:

    Why only 20? The fact that all living organisms use the same standard amino acids in protein synthesis is evidence that all specieson Earth are descended from a common ancestor.

    Why all L-amino acids? Like modern organisms, the last common ancestor (LCA) must have used L-amino acids and not D-aminoacids.

    And this is from an otherwise excellent, up-to-date, advance-level biochemistry college textbook!

    Thats it? Thats the best we can do? At least say, we dont know and we dont care. We mustnt pretend to know, or imply itsunimportant that we dont know. These arent answers; they are fables. They are known as just so stories. They equate to, they

    are because they are, and they must need to be because they are. However, not knowing these answers is a very unsettlingconcept for a lot of very intelligent people, creating more than a small component of denial.

    There is another anomaly, a gaping hole so to speak in the same texts. Leaf through them and what do you see? Information, lots ofit and presented beautifully. They illustrate an abundance of knowledge representing some of the greatest achievement of humanthought and investigation. The trend is toward shape, fit, three-dimensions. There is a tip-of-the-cap to the idea that the meaning ofthe covered subjects lies in their shapes and their space occupying attributes. Some even provide 3D glasses, and most offer linksto animated web sites to facilitate the spatial effects. The double helix is celebrated and dissected. Proteins are unfolded, folded andfit together. Electron prowling domains are drawn and speculated upon. Yet at the point where the rubber hits the road, where thegenetic code performs its magic, the descriptions revert to 1950s flatness and they are presented in living black and white.

    Toto, I have a feeling that we're not in Kansas anymore.

    DorothyThe Wizard of Oz

    How can it be that the double helix has this wonderful relationship between form and function, yet the nucleic acid complexes have

    no form-function relationship during protein synthesis? Certainly the form-function of our genetic information storage is important,but what about the genetic processor? It is likely that it has a distinct shape as well, and the shape of the processor is somehowlogically related to genetic information storage and retrieval.

    The foundation of the central dogma is that the information is co-linear. In other words, there is a line of information in DNA that iscommunicated, somehow, to a line of results in proteins. This is taken to mean that the code is one-dimensional. There is believedto be only one dimension of information passed from line to line - the one dimension being the identity of amino acids or links in theprotein chain. Due to faith in co-linearity, and due to the nascent digital information industry in the 1950s, the code itself came to beseen as linear. As Ive already stated - this is a big mistake. There is nothing really linear about the code, unless you believe that aswarm of bees is in some way linear. Nature ignores lines; its all about shapes.

    The existence and translation of information in matter is not mystical. It is a nitty-gritty process of quantizing and selectingpossibilities from a defined set of possibilities. The mysticism lies in the process by which the universe methodically bootstrapsinformation in an evermore-complex cascade of emergence. The correct term, I believe, is sequential. I will grant that the geneticcode is to an extent co-sequential, but I will not concede that it is co-linear, because these illusions of linear paradigms are cloudingour eyes and our brains. The linear indoctrination process is intense; I know, because Ive been through it. However, we can loosenthe reigns on our senses and find some sense in the madness. With the help of some recent discoveries, some clear, rational

    thinking, and some bodacious art, we can see the order in the chaos.

    http://cwx.prenhall.com/horton/
  • 7/28/2019 The Rafiki Map

    12/72

    M.C. Escher Order and Chaos

    In the middle of a table of periodic elements sits logic. As with any pattern there is an organizing force, something that drives theformation of complexity and order. The laws of nature are in play, and the patterns are there for us to see, if only we have the lightand courage to look.

    Furthermore

    Although it is commonly accepted that The Genetic Code is presently known, and therefore the fundamental rules for some type ofuniversal translation are also known, this is false. In fact, nothing could be further from the truth. Indeed, we know of a somewhatpredictive correlation between levels of translation - between a few limited sets of small organic molecules in selected organisms.We do not even know everything there is to know about this level, despite hyperbole to the contrary. Moreover, we have so far failedto identify the relationships between, and the multi-dimensional meanings in vastly larger, more informative molecular sets. We arefar from able to recognize all of the genetic information stored in the double helix - undeniably. We are also unable to interpret or

    explain the detailed patterns of genetic information spread across past, current and future life in the universe. Much of this confusionis caused by a failure to define terms. Although key terms were initially defined in biology, our knowledge has now outstripped ourdefinitions.

    There are many usages of the terms genetic and code and there are many casually accepted ways to combine the two. Althoughthere is no official definition of anything that can reasonably be called the genetic code, this label over time has been applied to avariety of general and specific concepts, causing a huge element of confusion. Terms absolutely must be defined, especially centralterms, and they no longer are. Most of the key terms today are bastardized, squishy, amoeboid and evolving. The most widelyaccepted usage of the genetic code relates to translation of nucleotides into amino acids; however, even an accurate definitionregarding this function is lacking from the dictionaries, textbooks and literature. Understandably, the minds of investigators are out ofsync on this issue, and many other issues will suffer as a consequence. I mostly get my terms from Webster, but science has a wayof mutilating common parlance. As theories and ideas evolve, vestigial terms, like the shells of hermit crabs, become inhabited bythe animus of new meaning.

    The term genetic is used here to mean: Of, relating to, or influenced by the origin or development of something. The term code isused here to mean: A systematically arranged and comprehensive collection of rules. The Genetic Code could therefore reasonablyapply to anything that fairly describes: A systematically arranged and comprehensive collection of rules relating to the origin anddevelopment of living organisms. I have let it somewhat out of its classical cage. For the term to be of any use, other than nostalgia,the genetic code must somehow operate on genetic information to animate living things.

    We have been told explicitly for decades that the genetic code has been broken, and our expectations have quite understandablysoared. Yet the term genetic code remains poorly defined by science, much less broken. Theoretically, we should at least be ableto make proteins de novo with the genetic code that we supposedly have broken, but yet we cannot. The unreasonably highexpectations caused by the hyperbole of what we actually know have apparently not been met in our quest to learn naturesexpanded secrets behind her codes of organic translation.

    Our thinking about the genetic code has, over the decades, become more detailed but progressively more confused. Models of thecode are woefully inadequate, and the languages used to describe them are extraordinarily self-contradictory. I am now convincedthat science actually has no working paradigm for a consistent exploration of genetic information. I am equally convinced that mostpeople have failed to realize this. Heck, why should they? It was announced long ago that the genetic code had been captured, andthis mis-truth has been repeated so many times that people truly believe that the genetic code is leading a peaceful existence incaptivity, inside tiny spreadsheets, stored in millions of textbooks around the world. It is embarrassing, but the most honest answerto the question of what the genetic code really is, is that we really dont know. Anything short of this would be drinking the cool-aide.

    We might call something The Genetic Code but it does not make it real. Our label and perception of it are necessarily artifacts ofthe human mind. They are working tools made of the material of the mind, for use by the material of the mind. Most people tend toforget this, and they confuse the issue further when they associate DNA with the genetic code. From here it is easy to confuseprogress in studying the former as an understanding of the latter. DNA is a component of the code, but it is not the code.Sequencing genomes and understanding codes are decidedly different activities. One is data collection, and the other is datainterpretation.

    A broader, more robust, multi-disciplinary approach is called for. Biologists must team with physicists, mathematicians, computerjocks and philosophers to advance the cause. The emperor today is quite naked, so I propose, throughout these pages, bits andpieces of cloth, a more general paradigm, a model of the genetic code that I call the Rafiki code. I have no illusions of breaking thecode myself, and in fact I do not pretend to actually propose a new comprehensive genetic code here. This, I think, is impossible. Iam proposing something far more useful, but far more dangerous. I am proposing that we shift our paradigms of genetic informationand the codes used to translate it. A paradigm shift in science, like religion, is usually an ugly, protracted, name-calling endeavor.Therefore, short of that, I hope to accomplish several things:

    Examine the terms currently in use and ask if they are well defined, appropriate and useful.

    Explore a general framework, or paradigm for genetic information that can tie together seemingly disparate observations fromseparate fields.

  • 7/28/2019 The Rafiki Map

    13/72

    Introduce new concepts, models and investigative tools that can serve as platforms for speculation and further model development.

    Mostly, I am issuing an open invitation to all interested parties: Join the fun. There is a gigantic, important and fascinating puzzle ofnature yet to be solved. It will likely take a collective imagination to solve it, and you never know who might be the one to add a newpiece.

    Discovery of the double helix coincided with the introduction of information theory and the dawn of the information age. With themcame an explosion of digital computing technology. Broad parallels between organic and digital computers are striking, but the twosystems of computation quickly diverge on their details. Computer metaphors of genetic information are useful, but care must betaken in using them. It is helpful to remember that human native languages inform our thinking, but more often they are apt to

    misinform our ideas of natures complex phenomena.

    Life is a complex adaptive system rivaled by no artificial system of digital computation. Although it is valid to draw parallels betweenthe two, it is also essential to meticulously highlight the differences. Most genetic metaphors suffer on this score from a sloppyapplication of native languages. For instance, all digital systems might be called one-dimensional in the sense that they processinformation in symbolic binary strings. Digital computations require that information be reduced to these symbolic strings of binarydigits. Storage and processing can be carried out in this single format because the digital computer is a linear finite state device. It isdependent on the precise state of the device at each step in the process, and every possible state of the device can be reduced toand completely represented by a string of zeros and ones. I see digital computers in this way as sequential, one-dimensional,dynamic mappings of logical relationships. Furthermore, computations are done in digital systems at discrete points by things knownas data processors.

    Strings and data processors of sorts exist in organic computers, but they have a vast number of informative dimensions, not justone. The genetic code has famously come to also be labeled as one-dimensional, but this is a sloppy use of native language thatmisinforms our thinking. Genetic information is never reduced to a single format or processed in a single dimension. Nothing aboutgenetic information comes close to one-dimensional. The all-too-familiar linear model of the genetic code was proposed manydecades ago, but its validity remains to be demonstrated, and many of its original premises have already been disproved.

    The term linear with respect to the genetic code could be interpreted in several different ways. The nucleotides in DNA tend to line-up, but in this sense it is more appropriate to use the word sequential. DNA and all other molecular forms of information should notbe considered as well-behaved sequences of points that form lines, but rather as mischievous sequences of shapes that form othershapes. Alternately, a mathematical relationship is considered linear when every input produces one and only one output. Hereagain, this description cannot be applied to the genetic code. It is not clear to me what one means when they describe the geneticcode as linear and one-dimensional; none-the-less, this is how it is often described.

    Our unfortunate insistence on calling the genetic code one-dimensional owes a good deal to the fact that codon and amino acidcorrelations were discovered early in investigations of genetic information. Each nucleotide triplet, or codon, is mostly linked inprotein translation to one and only one amino acid in a peptide chain. It was erroneously claimed that this was the only sort ofinformation that a codon could carry in translation. Subsequent investigations have proven this to be completely false, and the ideaof a one-dimensional codon is now untenable, yet the linear philosophical doctrine marches on with nary a question to its validity.

    The most detrimental effect of using the terms linear and one-dimensional in reference to the genetic code is that an additional term- simple - frequently gets tacked on. I invite anyone to demonstrate or explain the simplicity of genetic information and genetictranslation, at any level. Attempts to do so would suggest a failure to recognize that tiny portions of computer programs do not makesense of the entire code. The portion of the genetic code with which we are vaguely familiar is but a tiny subset of the logic behindtranslation of genetic information. They cannot be removed from the entire code and retain their meaning. Any expectation ofsimplicity in this, natures zenith of complexity, will have a low probability of being met in reality.

    Unfortunately, there are few if any competing theories with which to compare the linear model. It is a sparse, comfortable, yet highlyimperfect model, but for all practical purposes it is the only model presently considered. Here I attempt to define features that mustbe addressed by any computational model of genetic information, and with them we can question, examine, support or invalidategeneral theories.

    The most obvious difference between a digital computer and an organic computer is that the former can be called one-dimensional,and the latter clearly cannot. The digital system deals exclusively with binary strings, but the genetic system deals with, among otherthings, molecular strings. Molecular strings are sequential polymers macromolecules - that represent some of the manydimensions of genetic information. Each can be seen as a type of animated-floppy-disk of genetic data, and the informationcontained in the string exists in many dimensions as well. The beauty of molecular strings is that they provide logical, molecular,spatial sequences that can be leveraged in the time dimension. Organic computation always has a vital time element. With molecularstrings, information can undergo a stepwise process of translation from one string to another. Each string is a discrete form of

    information requiring its own finite set of rules for translation.

    Humans can easily convert a molecular string into a binary string, but a molecule cannot, nor is it clear why it ever would. Moreover,genetic systems expand information beyond strings, into dimensions that cannot be as easily quantified. Stepwise genetictranslations visit many states in many molecular forms, and none of them can be removed from time or space, or from the totalsystem. At the very least, no single form of genetic information or level of translation can be separated from the dimensions of spaceand time in which it exists, whereas a binary string easily can.

    Some less intuitive molecular dimensions of genetic information are contained in quantities, populations, interactions, co-dynamics,proportions and concentrations of various organic and inorganic molecules. With genetic information, time is of the essence, andspace is of the essence as well. We can distinguish these sorts of epi-string information from the more obvious string information,but we must take care not to consider it epigenetic information. Organic data that is not strictly carried in a molecular string cannone-the-less still be considered genetic. We will return to this issue shortly.

    A particular molecule, hemoglobin for instance, might exist in one of many potential states. Even in its standard form it is ananimated molecule that depends on shape and mobility to perform its functions. But as any doctor can tell you, there is vitalinformation in the number of possible states, the probability of each state, and the actual state of the entire molecular population at

    any given time. Additionally, and more complex, there is entropy in the temporal fluctuations of these molecular states. Theontogeny of hemoglobin populations within an organism is contained in genetic information, and the rules of changing populationswithin changing environments through time are somehow there as well.

  • 7/28/2019 The Rafiki Map

    14/72

    Unfortunately, because of the staggering breadth and complexity of these epi-string dimensions of genetic information, we areprecluded from examining the details of these forms and dimensions here. However, we are able to identify some numericalfundamentals of genetic systems, and the dimensions of information contained within many forms of molecular strings are presentlynot that far from our grasp. We begin with a thumbnail overview of a cell, and an outline of the master loop in a genetic translationprogram.

    Master Loop

    These are necessarily over-simplified diagrams to help us build a structure for tracking information as a genetic program processesit. Science must simplify to enlighten. A more accurate map of the genetic code would resemble a map of the worldwide web, with ariot of interconnecting reversible arrows and overlapping relationships. In mathematics this is known as graph theory.

    Each level in this master loop has an algorithm, or set of rules that dictate which translations are done to which portions of the data.Each new dimension of data is passed to the molecular forms at the next level. If we consider that the program is given a certainamount of information, INITIAL(Info), then we can see that the information is processed in many forms before a value is returned bythe function SURVIVE(Info). If the value returned by this function is TRUE then the loop continues. If the value is FALSE then theloop terminates. We know that every living thing on the planet has a record of perfect, unbroken strings of ancestral TRUE valuesreturned by the function SURVIVE(Info). Therefore, the data can be grouped on this criterion into a common set. But beyond that itis difficult to imagine what the exact nature, origin, or history of the data might be.

    This is where we run into a sticky problem with our definitions of two words: genetic and epigenetic. Epigenesis is a process of

    successive differentiation, a form of growth or development like adding layers to an onion. Aristotle first described it, and it remainsan important concept today. The master loop of translation is well defined by the term epigenesis, and this process is truly at theheart of genetic translation. However, this observation is insufficient to merit global replacement of the term genetic with epigenetic.We would otherwise be forced to discard the term genetic altogether in this discussion. Note that the first division of a zygote beginsa progressive differentiation in an organism. An interpretation this literal would mean that all further translation, the second cellulardivision and all subsequent translation, including mRNA transcription must be considered epigenetic. At the very least we wouldneed an arbitrary distinction, or judgment call, drawing the line on the meaning of successive differentiation. This is a false choice,and making it has negative consequences. These semantics do not provide ample reason to abandon the definition of genetic, in myopinion.

    The prefix epi means next to, apart from, outside of, or surrounding. I will not confuse it with the concept of multiple iterations, orrecursions, because these progressive translations are required in decompressing any genetic message. From this standpoint, I willuse the term epigenetic to stand for anything that is apart from or outside of something genetic. The irony is not lost on me thatalthough the hallmark of organic computation is epigenesis, the term epigenetic will stand for that which is outside genetic.Otherwise, we will need one rather impotent genetic code, and thousands of tiny epigenetic codes. The word epigenetic here is justinnately ambiguous. The following diagram will help us further simplify and visualize the metaphor.

  • 7/28/2019 The Rafiki Map

    15/72

    In a broad sense, DNA is data, the genetic code is a processor, and the output is a value returned by the function, SURVIVE(Info).Each piece of data can contribute to and be evaluated by its impact on the return value of SURVIVE, but this is the highest level ofprocessing within an organism, and there are countless billions of computations and translations for virtually all of the parts ofgenetic information. Because there are vastly more translation paths than possible return values, the genetic code cannot even beclose to linear at the highest interpretive level. A changing and unknowable environment obligates us to acknowledge this reality oftranslation. In other words, despite our desire to believe otherwise, there is no direct cause and effect or one-to-one mapping athigher levels when shuffling any isolated part of genetic information at lower levels. Specific changes in information can changesurvival and many levels of information in between, but they cannot do it when removed from the context of the whole. Geneticinformation and its translation are context dependent.

    In the most general possible sense, we could expand the metaphor back through all time and across all living things. However, thisis not the common interpretation of a genetic code metaphor. Science started its investigation with a more extreme reduction andtherefore an oversimplified view of the process. Now a restricted slice is taken from the program to stand for genetic translation.Typically, several levels are compressed into one metaphorical level, and they comprise a single dimension of information for asingle act of translation that we commonly call The Genetic Code.

    This paradigm has the undeniable advantage of simplicity. All of the rules of translation from DNA to protein can be considered inone small spreadsheet a simple, linear, one-dimensional genetic look-up table.

    Genetic Look-up Table

    It might turn out that all of the intermediate translation functions cancel out in a single dimension to leave us with this niftyapproximation (in selected organisms) but it is unlikely. In fact, I argue that it is already disproved from many angles. Besides, thelow algorithmic complexity of this one-dimensional shortcut ignores entire levels and the dimensions of information they carry. It willtherefore necessarily compress all of the data from DNA into amino acid sequences, and it was presumably already compressed to

    a fair degree before finding its way into DNA. Compression of this sort is always intellectually dangerous, because baby andbathwater can sometimes resemble each other. Some microorganisms are known to use the same nucleotide sequence in makingdifferent proteins talk about compression! There is no obvious need for a genetic translation program in the real world to compressinformation much further through these levels - unless nature somehow provides no alternatives.

    Philosophically, it seems that a genetic program will benefit from preservation, or even expansion of information in one form oranother during these translation steps. Furthermore, just by comparing the shape of DNA to the shapes of proteins there appears tobe a need for rules of translation pertaining to spatial information, and sadly no clues are given by this spreadsheet. The exact rulesmust be located somewhere in the compressed levels of the shortcut algorithm, but prolonged searches have yet to find them. Theynever will. In fact, the first guess made by scientists was that we dont need any spatial rules at all, that all the information requiredduring folding is somehow contained in this table. The task of finding them here is a fools errand; it was wishful thinking in the firstplace.

    The spreadsheet could turn out to be a valid simplification in restricted situations for some dimensions of information, but it seemsrecklessly narrow as the basis for a universal proclamation as The Genetic Code. Regardless, it is clearly inadequate as a basis formore robust understanding of genetic translation. Conversely, it might turn out that this facile metaphorical compression is actuallyharming our understanding of genetic translation. Remarkably, high-level investigators have shown little interest in even asking this

    question. The dogma is tenaciously thick and definitely hardened around this one.

  • 7/28/2019 The Rafiki Map

    16/72

    Granted, the familiar, one-dimensional metaphor is widely cherished, but the terms commonly used to describe and apply it areimprecise at best, and negligently incorrect at worse. At the very least we should change its label from The Genetic Code to acompressive shortcut map of codon and amino acid correlations in selected organisms. It isnt such a glorious title, but completelymisleading is the pronoun The in its current title. It indicates that the one and only genetic code is completely contained in thespreadsheet, but we have discovered many non-canonical spreadsheets, and we should perhaps either number them (TGC1,TGC2 TGCx) or at least show courtesy to schoolchildren and change The to A Genetic Code. More importantly, we alreadyknow that valuable genetic information is missing from these spreadsheets, and it is missing in the exact dimension that they aremeant to stand for, as well as many others. Consider the following.

    This diagram is a shortcut symbolic representation of translation steps for two nucleotide sequences, A and B, into two distinctlydifferent folded proteins. Both translations pass through a level of polypeptide known as the primary sequence of amino acids. Thespreadsheet provides a fairly reliable sequence cipher, and so the one-dimensional paradigm takes the bold step of saying thatprimary sequence is synonymous with primary structure. How do we know this? We take the next bold step and say that sinceprimary structure determines tertiary structure, which is the ultimate shape of a folded protein, all we need is the primary sequence.Proteins are all about shape, so these are fantastic steps.

    If nucleotide sequence A and B are identical, then one should expect that both folded proteins will be identical. However, this israrely true in nature. More interesting, if a nucleotide sequence change is made in a single organism from A to B in which onecodon, say for leucine, is substituted with another codon also for leucine, then the primary sequence will not change. These arecalled synonymous codons, and this sequence mutation is called a silent mutation. It is considered silent because the changepresumably cannot be heard within the language. Several studies have shown that this premise is absolutely false. Identical primarysequences after silent mutations can consistently lead to different folded proteins. These folded proteins have different enzymaticproperties, and enzymes are high-level components of organic computers - they process information at levels well beyond stringformation. The explanation for this must be found in the spreadsheet if it is to serve as the code for turning DNA into protein.

    Apart from folded proteins, silent mutations have demonstrated profound influence on translation of genetic information in many

    ways, including rate of translation, translation fidelity, signal peptide function, and the rules for splicing and amplification. Therefore,regardless of the mechanisms, the information contained in a single codon goes well beyond the primary sequence of amino acids.Silent mutations have been shown to significantly affect the competitive dynamics in natural selection, and they have even beenimplicated in human diseases, such as Hirschsprung disease and medullary thyroid carcinoma. In other words, a silent mutationcan certainly change the value returned at the highest level of translation, SURVIVE(Info). If this is silence, Id hate to hear a reallyloud noise.

    None of these findings are anticipated by, or explainable within a truly one-dimensional model. The theory of a genetic code with asingle dimension of information translated across many levels of organic molecules should be in severe crisis by these discoveries,but yet we chose to cling tightly to it. Why?

    More to the point, it was an overly optimistic, grandiose rush to judgment when the title The Genetic Code was first applied to thecodon map. We might choose to call it such, but it is a gross misnomer none-the-less. By now we should clearly recognize that nohuman is familiar with any such code of translation on virtually any level. Perhaps man has glimpsed a few faulty lines of code, butdespite great scientific strides, the logic behind the entire genetic program remains unarguably hidden from our view. Geneticinformation is far more complex than our present ability to study it. We certainly should not claim to have cracked the genetic code,and we will surely benefit in the future by speaking in terms of multi-dimensional translation, across entire programs, and throughouttime. These are not trivial semantics, because our historically loose choice of terms, and our inability to even partially define themhas had a dramatic, counterproductive impact on our thinking. We have a map of codon information in one dimension, and weshould consider it as such. Nothing more. Even at that, it is flawed.

    I have asked dozens of working scientists, scoured numerous references, and I have yet to be given an appropriate, useful definitionof the genetic code. Definitions, by definition, are whatever we say they are. Its like the price of gold, its not the inherent value of themetal, its whatever were willing to accept as a value. The value of a definition should track with its value in describing something ina useful way. On that score, I will share, in my opinion, the best definition that I have found. This definition comes from HortonsPrinciples of Biochemistry, Third Edition.

    genetic code. The correspondence between a particular three nucleotide codon and the amino acid it specifies. The standardgenetic code of 64 codons is used by almost all organisms. The genetic code is used to translate the sequence of nucleotides inmRNA into protein.

    This is a most artful state-of-the-art definition. It has one of the required appropriate qualifiers, and it avoids the classic pitfalls ofincluding words like universal, linear, one-dimensional and simple, that are thrown around with abandon elsewhere. But what does it

    really tell us? It tells us that the genetic code is a map with only one-dimension of information, that dimension being the correlationbetween codons and amino acids. It says that there are always sixty-four codons, which is probably wrong, and it should stop there,because from there it becomes even more misleading.

  • 7/28/2019 The Rafiki Map

    17/72

    The final sentence should be replaced with another qualifier: The genetic code is part of an unknown algorithm that living thingssomehow use to make proteins. The standing definition here strongly implies that the genetic code and nucleotide sequenceinformation - and nothing else are all that is required in making proteins. This is the stated rule, and of course there are exceptionsto every rule, but in this case the exception is the rule. We have yet to find even a single case where the rule applies.

    The reason that the definition is invalid is because Da Vinci was right. Organic molecules are composites of shapes, not lines ofidentities. Knowing the primary sequence of amino acids has proven to be entirely different from knowing the shape of a protein.Information in addition to primary sequence must be brought to bear during translation. We presently do not have these extra

    dimensions of information, and we certainly do not have the rules of their application during translation.

    Predicting the primary sequence of a protein is relatively easy, but predicting the shape of a folded protein requires tremendouseffort. We must compile gigantic piles of data, which must then be thoroughly and creatively massaged, and even still we have lessthan sterling results. More curious, the best data for massage is nucleotide sequence data, not amino acid sequence data. Thisimplies that the mysteriously missing information might also be found somewhere in nucleotide sequences, more so than aminoacids.

    Did we really expect it to be that simple? Consider the problem on its face. DNA, for all intents and purposes, is stored in a singlechangeless shape, whereas proteins are defined by diverse and dynamic shapes. Whatever the process, the translation ofinformation from DNA to protein involves a shift from monotony to diversity. Molecular identities expand only four to twenty, but bondidentities expand sixteen to millions. The magic of translation and the expansion of organic information therefore lie in the bondsbetween the molecules of molecular strings. This should be the focus of any investigation, definition, or map of the genetic