computational efficiency of intermolecular gene assembly

Tseren-Onolt Ishdorj | Remco Loos | Ion Petre

Computational Efficiency ofIntermolecular Gene Assembly

TUCS Technical ReportNo 826, June 2007

Computational Efficiency ofIntermolecular Gene Assembly

Tseren-Onolt IshdorjAbo Akademi University, Department of Information TechnologiesLemminkaisenkatu 14 A, 20520 Turku, [email protected]

Remco LoosResearch Group on Mathematical LinguisticsRovira i Virgili UniversityPlaca Imperial Tarraco 1, 43005 Tarragona, [email protected]

Ion PetreAbo Akademi University, Department of Information TechnologiesTurku Centre for Computer ScienceLemminkaisenkatu 14 A, 20520 Turku, [email protected] of Finland

TUCS Technical Report

No 826, June 2007

Abstract

In this paper, we investigate the computational efficiency of gene rearrangementoperations found in ciliates, a type of unicellular organisms. We show how theso-called guided recombination systems, which model this gene rearrangement,can be used as problem solvers. Specifically, we prove that these systems canuniformly solveSAT in timeO(n · m) for a boolean formula ofm clauses overnvariables.

Keywords: Computational efficiency, intermolecular gene assembly inciliates

TUCS LaboratoryComputational Biomodelling Laboratory

1 Introduction

Ciliates are unicellular organisms [1] which have attracted attention from com-puter scientists because of the complex nature of the gene rearrangement of somespecies. Specifically, the DNA in their micronucleus, used for conjugation only, istransformed into shorter molecules used for transcription. This process is calledgene assemblyand is in some sense reminiscent of the use of “linked lists” insoftware engineering, see [1].

Two main computational models have been proposed to model gene assemblyin ciliates. The so-calledintermolecularmodel, introduced by Landweber andKari [10, 8], allows for operations involving two molecules. The intramolecularmodel, proposed by Ehrenfeucht, Prescott and Rozenberg [2,14], contains onlyoperations acting on a single molecule. The computational power of the inter-molecular model has been well studied, specifically in [8] itis shown that in someformulation the model is as powerful as a Turing machine. It was recently provedthat also the intramolecular model is computationally universal [7].

Continuing the investigation of gene assembly from the perspective of com-putability theory, it was recently proved that the intramolecular model is compu-tationally efficient:SAT may be solved in this model in linear time, see [6]. Werefer to [11] for a related result on splicing systems. In this paper we address thesame question for the intermolecular model. We show how the guided recombi-nation model of [8] can be regarded as a problem solving device. The model weconsider involves the maximal parallel application of contextual recombinationrules, as defined in this paper. We present an algorithm to show that in this model,SAT can be solved in timeO(n · m) by a guided recombination system, withndenoting the number of variables andm the number of clauses.

2 Preliminaries and Notation

We assume the reader to be familiar with the basic elements offormal languages,Turing computability [15], DNA computing [13], and computational complexity[12]. We present here only some of the necessary notions and notation.

An alphabetis a finite set of symbols (letters), and a word (string) over analphabetΣ is a finite sequence of letters fromΣ; the empty word we denote byλ.The set of all words over an alphabetΣ is denoted byΣ∗. The set of all non-emptywords overΣ is denoted asΣ+, i.e.,Σ+ = Σ∗ \ {λ}. The length|x| of a wordxis the number of symbols thatx contains. The empty word has length 0.

We also define circular words overΣ by declaring two wordsu, v to be equiv-alent if and only ifu = xy andv = yx, for some wordsx, y. We also callu, vconjugates. Then thecircular word •w is the equivalence class ofw with respectto this relation, for allw ∈ Σ∗. The set of all circular words overΣ is denoted byΣ•.

1

https://www.researchgate.net/publication/220180701_Computational_Power_of_intramolecular_gene_Assembly?el=1_x_8&enrichId=rgreq-5c5f1268-5118-4bb3-bffd-e57e627d81c2&enrichSource=Y292ZXJQYWdlOzMxNTk3ODUwO0FTOjk4ODExMTUwMDc3OTc0QDE0MDA1Njk4MjI3OTI=

https://www.researchgate.net/publication/244421582_Computational_Aspects_of_Gene_UnScrambling_in_Ciliates?el=1_x_8&enrichId=rgreq-5c5f1268-5118-4bb3-bffd-e57e627d81c2&enrichSource=Y292ZXJQYWdlOzMxNTk3ODUwO0FTOjk4ODExMTUwMDc3OTc0QDE0MDA1Njk4MjI3OTI=

https://www.researchgate.net/publication/222247463_Molecular_operations_for_DNA_processing_in_hypotrichous_ciliates?el=1_x_8&enrichId=rgreq-5c5f1268-5118-4bb3-bffd-e57e627d81c2&enrichSource=Y292ZXJQYWdlOzMxNTk3ODUwO0FTOjk4ODExMTUwMDc3OTc0QDE0MDA1Njk4MjI3OTI=

https://www.researchgate.net/publication/2423600_Computational_Power_of_Gene_Rearrangement?el=1_x_8&enrichId=rgreq-5c5f1268-5118-4bb3-bffd-e57e627d81c2&enrichSource=Y292ZXJQYWdlOzMxNTk3ODUwO0FTOjk4ODExMTUwMDc3OTc0QDE0MDA1Njk4MjI3OTI=



https://www.researchgate.net/publication/31597246_Computing_Through_Gene_Assembly?el=1_x_8&enrichId=rgreq-5c5f1268-5118-4bb3-bffd-e57e627d81c2&enrichSource=Y292ZXJQYWdlOzMxNTk3ODUwO0FTOjk4ODExMTUwMDc3OTc0QDE0MDA1Njk4MjI3OTI=

A splicing scheme [5] is a pairR = (Σ,∼), whereΣ is an alphabet and∼,the pairing relation of the scheme,∼⊆ Σ∗Σ+Σ∗ × Σ∗Σ+Σ∗. Assume we havetwo stringsx, y and a binary relation between two triples of words(α, p, β) ∼(α′, p, β ′), such thatx = x′αpβx′′ andy = y′α′pβ ′y′′; then, the strings obtainedby the recombination in the context from above arez1 = x′αpβ ′y′′ and z2 =y′α′pβx′′.

When having a pair(α, p, β) ∼ (α′, p, β ′) and two stringsx andy as above,x = x′αpβx′′ andy = y′α′pβ ′y′′, we consider just the stringz1 = x′αpβ ′y′′ asthe result of the recombination (we call it one-output-recombination), because thestringz2 = y′α′pβx′′, we consider as the result of the one-output-recombinationwith the respect to the symmetric pair(α′, p, β ′) ∼ (α, p, β).

A Boolean expressionis an expression composed of variables, parenthesesand the operators., ∧ and∨. The variables can take values 0 (false) and 1 (true).An expression is satisfiable if there is some assignment of variables such that theexpression is true. Thesatisfiability problem, commonly denoted asSAT, is todetermine, given a Boolean expression, whether it is satisfiable. SAT is a wellknown NP-complete problem. A Boolean expression is said to be in conjunctivenormal form (CNF)if it is of the formE1 ∧E2 ∧ · · ·∧Ek, where eachEi, called aclause, is of the formαi1 ∨αi2 ∨ ...∨αiri

, where eachαij is a literal, that is eitherx or x, for some variablex.

3 Guided recombination systems

A splicing scheme is a pairP = (Σ,∼), whereΣ is an alphabet and∼, thepairing relation of the scheme,∼⊆ Σ∗Σ+Σ∗ × Σ∗Σ+Σ∗. In the splicing schemeP = (Σ,∼) pairs(α, p, β) ∼ (α′, p, β ′) define the contexts necessary for a recom-bination between the repeatsp. Then thecontextual intramolecular recombinationwas defined in [8].

(delp) {upwpv} =⇒delp{upv, •wp},

where u = u′α, w = βw′ = w′′α′, v = β ′v′, and(α, p, β) ∼ (α′, p, β ′).

This constrains intramolecular recombination withinupwpv to occur only ifthe restrictions of the splicing scheme concerningp are fulfilled, i.e., the firstoccurrence ofp is preceded byα and followed byβ and its second occurrence ispreceded byα′ and followed byβ ′.

Also, if (α, p, β) ∼ (α′, p, β ′), then thecontextual intermolecular recombina-tion was defined in [8] as

(insp) {upv, •wp} =⇒insp{upwpv}

where u = u′α, v = βv′, w = w′α′ = β ′w′′, and(α, p, β) ∼ (α′, p, β ′).

Intermolecular recombination between the linear strandupv and the circular strand

2



https://www.researchgate.net/publication/225229346_Head_T_Formal_language_theory_and_DNA_an_analysis_of_the_generative_capacity_of_specific_recombinant_behaviors_Bull_Math_Biol_49_737-759?el=1_x_8&enrichId=rgreq-5c5f1268-5118-4bb3-bffd-e57e627d81c2&enrichSource=Y292ZXJQYWdlOzMxNTk3ODUwO0FTOjk4ODExMTUwMDc3OTc0QDE0MDA1Njk4MjI3OTI=

•wp may take place only if the occurrence ofp in the linear strand is flanked byαandβ and its occurrence in the circular strand is flanked byα′ andβ ′.

Definition 1 For a splicing schemeP = (Σ,∼), we define the set of all contex-tual gene rearrangement operations under guiding of the splicing schemeP asfollows:

P = {insp, delp | (α, p, β) ∼ (α′, p, β ′) for someα, α′, β, β ′ ∈ Σ∗}.

We now define aguided recombination systemthat captures the series of dis-persed homologous recombination events that take place during scrambled generearrangement in ciliates.

Definition 2 A guided recombination system is a tripleR = (Σ,∼, t) where(Σ,∼) is a splicing scheme, andt ∈ Σ+ is a linear string called the axiom.

A guided recombination systemR defines a derivation relation that produces anew multiset from a given multiset of linear and circular strands, as follows. Start-ing from a “collection” (multiset) of strings with a certainnumber of availablecopies of each string, the next multiset is derived from the first one by an intra- orinter-molecular recombination between existing strings.The strands participatingin the recombination are “consumed” (their multiplicity decreases by 1) whereasthe products of the recombination are added to the multiset (their multiplicity in-creases by 1).

For two multisetsS andS ′ in Σ∗ ∪ Σ•, we say thatS derivesS ′ and we writeS =⇒R S ′, iff one of the following two cases hold:

(1) there existx ∈ S, y, •z ∈ S ′ such that

– {x} =⇒del {y, •z} according to an intramolecular recombination step inR,

– S ′(x) = S(x)−1, S ′(y) = S(y)+1, S ′(•z) = S(•z)+1, andS ′(u) = S(u)for all u /∈ {x, y, •z};

(2) there existx′, •y′ ∈ S, z′ ∈ S ′ such that

– {x′, •y′} =⇒ins {z′} according to an intermolecular recombination step inR,

– S ′(x′) = S(x′)− 1, S ′(•y′) = S(•y′)− 1, S ′(z′) = S(z′) + 1, andS ′(u) =S(u) for all u /∈ {x′, y′, •z′}.

3

Those strands which, by repeated recombinations with initial and intermediatestrands eventually produce the axiom, form the language of the guided recombi-nation system. Formally,

Lka(R) = {w ∈ Σ∗ | {(w, k)} =⇒∗

R S andt ∈ S}

((w, k) indicates the fact that the multiplicity ofw equalsk).The guided recombination systems are proved in [8] to be equivalent to Turing

machine:

Theorem 1 ([8]) Let L be a language overT ∗ accepted by a Turing machineTM = (S, Σ ∪ {#}, P ). Then there exist an alphabetΣ′, a sequenceπ ∈ Σ′∗,depending onL, and a recombination systemR such that a wordw overT ∗ is inL iff #6s0w#6π belongs toLk

a(R) for some k ≥ 1.

In line with this result, we define acceptance for guided recombination systemsas follows.

Definition 3 We say a guided recombination systemR = (Σ,∼, t) acceptsastringw iff there exists ak ≥ 1 such thatw ∈ Lk

a(R).

In other words, a guided recombination system accepts a string w if it gener-ates the axiom, when starting with some (sufficient) amount of copies ofw.

We now consider the parallelism for the guided recombination model. Intu-itively, a number of operations can be applied in parallel toa string if the applica-bility of each operation is independent of the applicability of the other operations.

In this paper we use a notion of parallelism following [6], which is themaxi-mally parallelapplication of a rule to a string.

First, we define the working places of a operationπ ∈ P on a given stringwhereπ is applicable.

Definition 4 Let w be a string. The working places of a operationπ ∈ P withrespect to a multisetS for w is a set of substrings ofw written asWp(π(w)) anddefined by

Wp(delp(w)) = {upw′pv ∈ Sub(w) | {upw′pv} =⇒delp{upv, •w′p}},

Wp(insp(w)) = {upv ∈ Sub(w) | {upv, •w′p} =⇒insp{upw′pv}

for some• w′p ∈ S}.

Definition 5 Letw be a string. The smallest working places of a operationπ ∈ Pfor w is a subset ofWp(π(w)) written asWps(π(w)) and defined by

Wps(π(w)) = {w1 ∈ Wp(π(w)) | for all w′1 ∈ Sub(w1)

andw′1 6= w1, w

′1 /∈ Wp(π(w))}.

4



https://www.researchgate.net/publication/31597246_Computing_Through_Gene_Assembly?el=1_x_8&enrichId=rgreq-5c5f1268-5118-4bb3-bffd-e57e627d81c2&enrichSource=Y292ZXJQYWdlOzMxNTk3ODUwO0FTOjk4ODExMTUwMDc3OTc0QDE0MDA1Njk4MjI3OTI=

Definition 6 Let Σ be a finite alphabet andP the set of rules defined above. Letπ ∈ P andu ∈ Σ∗. We say thatv ∈ Σ∗ is obtained fromu by applyingπ in amaximally parallel way, denotedu =⇒max

π v, if

u = α1u1α2u2 . . . αkukαk+1, andv = α1v1α2v2 . . . αkvkαk+1,

whereui ∈ Wps(π)(u), vi ∈ Σ∗ for all 1 ≤ i ≤ k, and also,αi /∈ Wp(π(u)), forall 1 ≤ i ≤ k + 1.

Example 1 Let delp be the contextual deletion operation applied in the context(x1x2, p, x3) ∼ (x3, p, x1), and consider the stringu = x1x2px3px1x2p x3px1.The unique correct result obtained by maximally parallel application ofdelp to uis:

x1x2px3px1x2px3px1 =⇒maxdelp

x1x2px1x2px1.

Finally, if in a guided recombination systemR = (Σ,∼, t) for some multi-plicity k {(w, k)} =⇒n

R S, with t ∈ S, we say thatR accepts a stringw in timen.

4 Efficiency of guided recombination systems

In this section, we use guided recombination systems as decision problem solvers.A possible correspondence between decision problems and languages can be donevia an encoding function which transforms an instance of a given decision prob-lem into a word, see, e.g., [3].

Definition 7 We say that a decision problemX is solved in timeO(t(n)) byguided recombination systems if there exists a familyA of guided recombinationsystems such that the following conditions are satisfied:

1. The encoding function of any instancex ofX having size n can be computedby a deterministic Turing machine in timeO(t(n)).

2. For each instancex of sizen of the problem one can effectively construct,in timeO(t(n)), an intermolecular guided recombination systemG(x) ∈ Awhich accepts, again in timeO(t(n)), the word encoding the instancex ifand only if the solution to the given instance of the problem isYES.

Moreover, we say that a solution isuniform if all instances of the same sizeare solved by the same guided recombination system.

Theorem 2 SAT can be solved uniformly and deterministically by a guided re-combination system in timeO(n · m), wheren denotes the number of variablesandm the number of clauses.

5

https://www.researchgate.net/publication/228057735_Computers_and_Intractability_A_Guide_to_NP-Completeness?el=1_x_8&enrichId=rgreq-5c5f1268-5118-4bb3-bffd-e57e627d81c2&enrichSource=Y292ZXJQYWdlOzMxNTk3ODUwO0FTOjk4ODExMTUwMDc3OTc0QDE0MDA1Njk4MjI3OTI=

Proof. Let us consider a propositional formulaφ of m clauses overn variablesin the conjunctive normal form. Thusφ = C1 ∧ · · · ∧ Cm, such that each clauseCj, 1 ≤ j ≤ m, is of the formCj = 〈yj,1 ∨ · · · ∨ yj,kj

〉, kj ≥ 1, whereyj,k ∈{xi, xi | 1 ≤ i ≤ n}, 1 ≤ k ≤ kj.

We encode each clauseCj as a string bounded by$j in the following form:

cj = $j†Cj†$j.

The instanceφ is encoded as follows:

φ = c1 . . . cm$m+1††x1††x1†† . . . ††xn††xn††$m+1$m+2.

It is easily seen that the size of the encoding is linear inn andm.The string appended to the formula contains both values for all variables inφ.

We design a guided recombination system which solves the encoded instance ofSAT in the following steps.

1. Excise the variable values.

2. Insert a valued variable after each clause.

3. Check if the inserted variable satisfies the clause.

4. Check if the inserted variables are consistent.

5. Generate the axiom if and only if both checks are successful.

Specifically, given a boolean formulaφ with m clauses and overn variables,we construct a guided recombination system

G = (Σ,∼, $),

with

Σ = {$i | 1 ≤ i ≤ m + 2} ∪ {xi, xi | 1 ≤ i ≤ n} ∪ {∨, 〈, 〉, †},

$ = $1$2 . . . $m$m+1$m+2.

The relation∼ is defined as follows, wherex ∈ {xi, xi | 1 ≤ i ≤ n}, b ∈ {∨, 〈},e ∈ {∨, 〉} and1 ≤ j ≤ m. Also x = xi if x = xi andx = xi if x = xi.

(†, †, x) ∼ (x, †, †), (1)

(〉, †, $j$j+1) ∼ (x, †, x), (2)

(b, x, e) ∼ (〉†, x, †), (3)

(bx, †, λ) ∼ (bx†$j$j+1, †, 〈), (4)

(†$m, $m+1, λ) ∼ (λ, $m+1, $m+2), (5)

(λ, $j, †) ∼ (bx†, $j, $j+1$j+2). (6)

6

The size of this system isO(n · m) and it is not hard to see that it can beconstructed by a deterministic Turing machine in timeO(n · m).

We will show thatG decidesSAT for a given inputφ. That is, thatG acceptsencodingφ if and only if φ is satisfiable.

For theif-part, consider the input string

$1†C1†$1 . . . $m†Cm†$m$m+1††x1††x1†† . . . ††xn††xn††$m+1$m+2.

To this word we can apply the operationdel† using contexts of (1). In fact, weapply2n del†-operations in parallel, giving

$1†C1†$1 . . . $m†Cm†$m$m+1†2n+2$m+1$m+2

as well as the circular strings•x† for all x ∈ {xi, xi | 1 ≤ i ≤ n}.In the next step, these circular strings can be inserted after each encoding of

a clause ofφ using contexts (2). Again, this is done in parallel for all clauses, sowith m ins†-operations we obtain a string of the form

$1†C1†z1†$1 . . . $m†Cm†zm†$m$m+1†2n+2$m+1$m+2

with eachzj , 1 ≤ j ≤ m in {xi, xi | 1 ≤ i ≤ n}. We interpret these insertedvariables as an assignment, where the variable inserted after each clause verifiesthis clause. It is important to note that the same variable can be inserted morethan once, up tom times, into the same string, since we also have at our disposalcircular strings excised from other copies of the input string. Recall that by Defi-nition 3 we can assume that the input word is present in the multiplicity needed togenerate all possible assignments of a verifying variable to a clause. Ifm > 2n,extra multiplicity is needed to provide enough variables toinsert.

If the formula is satisfiable, there is at least one inserted assignment in whichall inserted variables effectively verify the clause preceding it. In this case, wecan apply the contexts of (3) to performm delx-operations. This givesm stringsof the form• ∨ · · · ∨ yj,kj

〉†zi and a string

$1†〈y1,1 ∨ · · · ∨ z1†$1 . . . $m†〈ym,1 ∨ · · · ∨ zm†$m$m+1†2n+2$m+1$m+2.

Recall that each clauseCj, 1 ≤ j ≤ m, is of the formCj = 〈yj,1∨· · ·∨yj,kj〉, kj ≥

1, whereyj,k ∈ {xi, xi | 1 ≤ i ≤ n}, 1 ≤ k ≤ kj .Again, if the formula is satisfiable, there is at least one inserted assignment

which verifies all clauses and is consistent, in the sense that if a variablexi isinserted in some place, no variablexi is inserted after another clause. This meansno context of (4) can apply to the string, since this requiresthe simultaneouspresence ofxi andxi in the string.

Now we can applydel$m+1using context (5). In this and the following steps,

no parallel applications are possible, so the derivation goes on sequentially. First,del$m+1

yields•†2n+2$m+1 and

$1†〈y1,1 ∨ · · · ∨ z1†$1 . . . $m†〈ym,1 ∨ · · · ∨ zm†$m$m+1$m+2.

7

From here, we apply successivelydel$mto del$1

using the contexts of (6). Forinstance,del$m

gives•†〈ym,1 ∨ · · · ∨ zm†$m and

$1†〈y1,1 ∨ · · · ∨ z1†$1 . . . $m−1†〈ym−1,1 ∨ · · · ∨ zm−1†$m−1$m$m+1$m+2,

after whichdel$m−1can be applied. This process goes on untildel$1

yields theaxiom

$1$2 . . . $m+1$m+2.

This meansG acceptsφ.

For theonly if-part, assume thatφ is not satisfiable. In this case, the first twosteps go on exactly as described above, giving

$1†C1†z1†$1 . . . $m†Cm†zm†$m$m+1†2n+2$m+1$m+2.

However,φ is not satisfiable. This means that for all inserted assignments atleast one of the following holds:

1. Not all clauses are verified by the variable inserted afterthem.

2. The inserted assignment is inconsistent.

For case 1, assume that only clauseCl is not verified byzl. Then we cannotapply them paralleldelx-operations as before. In fact, we can only applym − 1operations giving

$1†〈y1,1 ∨ · · · ∨ z1†$1 . . . $l†〈. . . 〉†zl†$l . . .

$m†〈ym,1 ∨ · · · ∨ zm†$m$m+1†2n+2$m+1$m+2.

Alternatively, if m > 2n, zl may not have been inserted. Also thendelzlcannot

be applied and we get

$1†〈y1,1 ∨ · · · ∨ z1†$1 . . . $l†〈. . . 〉†$l . . .

$m†〈ym,1 ∨ · · · ∨ zm†$m$m+1†2n+2$m+1$m+2.

In both of these cases, ifzl+1 happens to verify clausel, we could applydelzl+1

differently, resulting in• ∨ . . . 〉(†zl)†$l$l+1† . . . 〉†zl and

$1†〈y1,1 ∨ · · · ∨ z1†$1 . . . $l†〈. . . zl+1†$l+1 . . .

$m†〈ym,1 ∨ · · · ∨ zm†$m$m+1†2n+2$m+1$m+2.

A similar situation can arise ifzl happens to verify clausel − 1. Then, instead ofdelzl−1

we could also applydelzl, resulting in• ∨ . . . 〉†zl−1†$l−1$l† . . . 〉†zl and

$1†〈y1,1 ∨ · · · ∨ z1†$1 . . . $l−1†〈. . . zl†$l . . .

$m†〈ym,1 ∨ · · · ∨ zm†$m$m+1†2n+2$m+1$m+2.

8

By our maximality requirement, these are the only possibilities. We supposethat nodel† using contexts (4) can be applied (if it can, this is treated under case 2).Now,del$m+1

using context (5) is applied as before. Alsodel$jby (6) until arriving

atdel$l(or del$l+1

). None of the strings obtained above satisfy the context of (6) atthat point, so the derivation of the axiom cannot continue. No other operations cantake place, soG does not acceptφ by this assignment. If more than one variablesdo not satisfy their clause, the situation is the same, except that we can get moresubstrings of the form$l†〈. . . 〉(†zl)†$l or $k†〈. . . zl†$l, k < l.

For case 2, assume all variables satisfy their clauses. In this case,m delx-operations apply as before, giving

$1†〈y1,1 ∨ · · · ∨ z1†$1 . . . $m†〈ym,1 ∨ · · · ∨ zm†$m$m+1†2n+2$m+1$m+2.

Now suppose thatzp andzq are inconsistent, forp < q. Now, in the same step asdel$m+1

we also applydel† by context (4). This gives•$p · · · ∨ zq†$q$q+1† and

$1†〈y1,1 ∨ · · · ∨ z1†$1 · · · ∨ zp†〈. . . 〉zq+1†$q+1 . . .

$m†〈ym,1 ∨ · · · ∨ zm†$m$m+1$m+2.

Note that if case 1 holds for anyzl unequal tozp andzq, the samedel† will stilltake place. As in case 1, the created string does not satisfy the context ofdel$q

, sothe axiom cannot be derived.

Since for all possible assignments at least one case holds, the axiom is notgenerated, soG does not acceptφ.

Finally, since each context can induce both adel and anins -operation, weshould say a few words about the operations not mentioned before.

• ins† by context (1) is not possible since we never have any circular stringwith two consecutive symbols†.

• del† by (2) cannot happen because we never have a substringx†x in a cir-cular string.

• insx by (3) is impossible because none of the circular strings we obtain hassubstring〉†x†.

• ins† by (4) cannot take place since none of the circular strings containssubstring∨x$j$j+1†〈 or 〈x$j$j+1†〈.

• ins$m+1by (5) is not possible because no circular string contains$m+1$m+2

(the circular strings generated bydelx (3) anddel† (4) only contain$1 to$m).

• ins$m+1by (6) needs a circular string containing$j$j+1$j+2, which is never

created.

9

Our algorithm has a linear running time. Excision of the variables takes atmostm steps, since we need to excise from at mostm copies of the input string.If the formula is satisfiable, we obtain the axiom afterm + 3 additional steps,giving a total running time of2m + 3 steps. Finally, the systemG we constructedsolves all instances ofSAT of m clauses overn variables, thus making our solutionuniform. This concludes the proof. 2

5 Final Remarks

In the present paper we propose a computing model which is based on the ciliateintramolecular gene assembly model developed, for instance, in [1]. The model ismathematically elegant and biologically well-motivated because only three typesof operations (two of them are formalizations of gene assembly process in ciliates)and a single string are involved. The context-sensitivity for string operations arealready well-known in formal language theory, see [?, 9]. Moreover, parallelismis a feature characteristic of bio-inspired computing models, starting with DNAcomputing, which is also the case with our model. From a computer science pointof view, in the model (eAIR system) is both as powerful as Turing machines andas efficient in solving intractable problems in feasible time. Investigating the othercomputability characteristics of eAIR system could be worthwhile.

6 Conclusion

In this paper we considered a model of gene rearrangement in ciliates. We showedthat this model can be used as an efficient problem solving device by presenting analgorithm for solvingSAT in timeO(n ·m). One especially interesting feature ofour algorithm is that we show that using small local contextsone can perform cor-rectness and consistency checks over arbitrarily large distances. We believe thatthe study of the gene rearrangement process in ciliates and its formal modelling isnot only interesting from a biological point of view, but canalso be beneficial forthe study of computation.

Acknowledgments. The work of T.-O.I. is supported by the Center for Interna-tional Mobility (CIMO) Finland, grant TM-06-4036 and by Academy of Finland,project 203667. The work of R.L. was supported by Research Grant BES-2004-6316 of the Spanish Ministry of Education and Science. The work of I.P. is sup-ported by Academy of Finland, project 108421.

10

References

[1] Ehrenfeucht, A., Harju, T., Petre, I., Prescott, D. M., and Rozenberg, G.,Com-putation in Living Cells: Gene Assembly in Ciliates, Springer, 2003.

[2] Ehrenfeucht, A., Prescott, D. M., and Rozenberg, G., Computational aspectsof gene (un)scrambling in ciliates. In: L. F. Landweber, E. Winfree (eds.)Evolution as Computation, Springer, Berlin, Heidelberg, New York, 216–256,2001.

[3] Garey, M., Jonhson, D.,Computers and Intractability. A Guide to the Theoryof NP-completeness, Freeman, San Francisco, CA, 1979.

[4] Harju, T., Petre, I., and Rozenberg, G., Two models for gene assembly inciliates,Theory is forever, LNCS3113:89–101, 2004.

[5] Head, T., Formal Language Theory and DNA: an analysis of the generativecapacity of specific recombinant behaviors.Bull. Math. Biology49: 737–759,1987.

[6] Ishdorj, T.-O, and Petre, I., Computing through gene assembly, accepted inInt. Conf. UC’07, 13–17, August, 2007, Kingston, Canada.

[7] Ishdorj, T.-O, Petre, I., and Rogojin, V., Computational power of intramolecu-lar gene assembly,International Journal of Foundations of Computer Science,to appear, 2007.

[8] Kari, L., and Landweber, L. F., Computational power of gene rearrangement.In: E. Winfree and D. K. Gifford (eds.)Proceedings of DNA Based Comput-ers, VAmerican Mathematical Society, 207–216, 1999.

[9] Kari, L., and Thierrin, G., Contextual insertion/deletions and computability.Information and Computation131:47–61, 1996.

[10] Landweber, L. F., and Kari, L., The evolution of cellular computing: Na-ture’s solution to a computational problem. In:Proceedings of the 4th DI-MACS Meeting on DNA-Based Computers, Philadelphia, PA, 3–15, 1998.

[11] Loos, R., Martın-Vide, C., and Mitrana, V., SolvingSAT andHPP with ac-cepting splicing systems,PPSN IX, LNCS4193:771–777, 2006.

[12] Papadimitriou, Ch. P.,Computational Complexity. Addison-Wesley, Read-ing, MA, 1994.

[13] Paun, Gh., Rozenberg, G., Salomaa, A.,DNA Computing - New computingparadigms, Springer-Verlag, Berlin, 1998.

11

[14] Prescott, D. M., Ehrenfeucht, A., and Rozenberg, G., Molecular operationsfor DNA processing in hypotrichous ciliates.Europ. J. Protistology37 (2001)241–260.

[15] Salomaa, A.,Formal Languages, Academic Press, New York, 1973.

12

Lemminkaisenkatu 14 A, 20520 Turku, Finland | www.tucs.fi

University of Turku• Department of Information Technology

• Department of Mathematics

Abo Akademi University• Department of Computer Science

• Institute for Advanced Management Systems Research

Turku School of Economics and Business Administration• Institute of Information Systems Sciences

ISBN 978-952-121-915-3ISSN 1239-1891

computational efficiency of intermolecular gene assembly

Documents