multistage isothermic sequencing by hybridization

9
Computational Biology and Chemistry 29 (2005) 69–77 Algorithm note Multistage isothermic sequencing by hybridization Jacek Bla˙ zewicz a,b , Piotr Formanowicz a,b, a Institute of Computing Science, Pozna´ n University of Technology, Piotrowo 3A, 60-965 Pozna´ n, Poland b Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Pozna´ n, Poland Received 28 October 2004; received in revised form 1 December 2004; accepted 2 December 2004 Abstract Despite the impressive progress in many areas of biological sciences reading DNA sequences still remains one of the most important problems. One of the methods of DNA sequencing is sequencing by hybridization which is composed of two stages—the biochemical one and the computational one. Although this method is quite modern it suffers sensitivity to errors appearing in the biochemical stage. This is a motivation for developing some new variants of the method which should be more resistant to errors of these types. Two of such non-standard approaches are multistage and isothermic sequencing by hybridization. Each of the methods is less sensitive to some types of the errors appearing in the biochemical stage, in comparison to the standard version of the method, but on the other hand, they are more sensitive to remaining types of errors. However, a combination of the two approaches reduces the sensitivity of the components. © 2004 Elsevier Ltd. All rights reserved. Keywords: Sequencing by hybridization; Isothermic libraries; Multistage approach; Algorithms 1. Introduction Reading DNA sequences still remains one of the most important problems in molecular and computational biology. Although human genome has been almost fully sequenced and also genome sequences of some other higher organisms are already known there is a need for an efficient and not expensive method for DNA sequencing. The two main methods for determining the sequence of DNA are shotgun sequencing and sequencing by hybridiza- tion (SBH) (Drmanac et al., 1989; Khrapko et al., 1989; Pevzner, 2000; Southern et al., 1992). In the standard version of the latter one an l-tuple com- position of a target DNA sequence is determined. The pro- cedure employs one of the fundamental properties of single stranded DNAs, i.e. their ability to join to complementary strands (Watson and Crick, 1953). As it is known for over 50 years, a single stranded DNA molecule is a sequence of elementary building blocks called nucleotides, denoted for short by A, C, G, and T. The Watson–Crick complementar- ity rule tells that nucleotide A is complementary to T and Corresponding author. Tel.: +48 61 8528503x276; fax: +48 61 8771525. E-mail address: [email protected] (P. Formanowicz). C is complementary to G. So, if two single stranded DNAs contains some substrings being complementary to each other then the DNAs may hybridize creating a double stranded molecule. The idea of SBH is based on the observation that an oligonucleotide library, i.e. a collection of short single stranded DNA molecules, when put into a solution of a number of copies of single stranded target DNA, the tar- get molecules will hybridize to those elements of the library which are complementary to some substring of the examined DNA. Nowadays, the oligonucleotide library is made usu- ally as a DNA chip (Fodor et al., 1991; Pease et al., 1994; Pevzner, 2000; Southern, 1988). Such a chip is a kind of a matrix divided into a number of cells each of them containing elements of the library of one type. In each of the cells there is a number of such identical short DNA strands. In the standard version of the SBH method the library is composed of all oligonucleotides of a given length l. In this case the number of element types equals 4 l and this is also a number of the cells on the chip. When such a chip is put into the solution of the single stranded DNA the molecules will join to those chip cells which contain complementary oligonucleotides. If the examined molecules were radioac- tively or fluorescently labeled then observing the image of 1476-9271/$ – see front matter © 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2004.12.001

Upload: jacek-blazewicz

Post on 26-Jun-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Multistage isothermic sequencing by hybridization

Computational Biology and Chemistry 29 (2005) 69–77

Algorithm note

Multistage isothermic sequencing by hybridization

Jacek Błazewicza,b, Piotr Formanowicza,b,∗a Institute of Computing Science, Pozna´n University of Technology, Piotrowo 3A, 60-965 Pozna´n, Poland

b Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Pozna´n, Poland

Received 28 October 2004; received in revised form 1 December 2004; accepted 2 December 2004

Abstract

Despite the impressive progress in many areas of biological sciences reading DNA sequences still remains one of the most importantproblems. One of the methods of DNA sequencing is sequencing by hybridization which is composed of two stages—the biochemical oneand the computational one. Although this method is quite modern it suffers sensitivity to errors appearing in the biochemical stage. This is amotivation for developing some new variants of the method which should be more resistant to errors of these types. Two of such non-standardapproaches are multistage and isothermic sequencing by hybridization. Each of the methods is less sensitive to some types of the errorsappearing in the biochemical stage, in comparison to the standard version of the method, but on the other hand, they are more sensitive tor©

K

1

iAaae

DtP

pcss5esi

Asotherded

t anlef atar-

raryinedsu-94;aing

here

y is

oput

ulesaryoac-e of

1d

emaining types of errors. However, a combination of the two approaches reduces the sensitivity of the components.2004 Elsevier Ltd. All rights reserved.

eywords:Sequencing by hybridization; Isothermic libraries; Multistage approach; Algorithms

. Introduction

Reading DNA sequences still remains one of the mostmportant problems in molecular and computational biology.lthough human genome has been almost fully sequencednd also genome sequences of some other higher organismsre already known there is a need for an efficient and notxpensive method for DNA sequencing.

The two main methods for determining the sequence ofNA are shotgun sequencing and sequencing by hybridiza-

ion (SBH) (Drmanac et al., 1989; Khrapko et al., 1989;evzner, 2000; Southern et al., 1992).In the standard version of the latter one anl-tuple com-

osition of a target DNA sequence is determined. The pro-edure employs one of the fundamental properties of singletranded DNAs, i.e. their ability to join to complementarytrands(Watson and Crick, 1953). As it is known for over0 years, a single stranded DNA molecule is a sequence oflementary building blocks called nucleotides, denoted forhort by A, C, G, and T. The Watson–Crick complementar-ty rule tells that nucleotide A is complementary to T and

∗ Corresponding author. Tel.: +48 61 8528503x276; fax: +48 61 8771525.E-mail address:[email protected] (P. Formanowicz).

C is complementary to G. So, if two single stranded DNcontains some substrings being complementary to eachthen the DNAs may hybridize creating a double stranmolecule.

The idea of SBH is based on the observation thaoligonucleotide library, i.e. a collection of short singstranded DNA molecules, when put into a solution onumber of copies of single stranded target DNA, theget molecules will hybridize to those elements of the libwhich are complementary to some substring of the examDNA. Nowadays, the oligonucleotide library is made ually as aDNA chip (Fodor et al., 1991; Pease et al., 19Pevzner, 2000; Southern, 1988). Such a chip is a kind ofmatrix divided into a number of cells each of them containelements of the library of one type. In each of the cells tis a number of such identical short DNA strands.

In the standard version of the SBH method the librarcomposed of all oligonucleotides of a given lengthl. In thiscase the number of element types equals 4l and this is alsa number of the cells on the chip. When such a chip isinto the solution of the single stranded DNA the molecwill join to those chip cells which contain complementoligonucleotides. If the examined molecules were raditively or fluorescently labeled then observing the imag

476-9271/$ – see front matter © 2004 Elsevier Ltd. All rights reserved.

oi:10.1016/j.compbiolchem.2004.12.001
Page 2: Multistage isothermic sequencing by hybridization

70 J. Błazewicz, P. Formanowicz / Computational Biology and Chemistry 29 (2005) 69–77

the chip one can get the information about thel-tuple com-position of the target DNA.

That is the end of the first, biochemical stage of themethod. In the ideal case, i.e. when no errors occur, thisstage provides full information about thel-tuple composi-tion of the target DNA. In other words, one gets a set ofall types of substrings of lengthl occurring in the target se-quence. The set is calledspectrumof a given sequence andits cardinality equals ton − l + 1, wheren is the length of thetarget sequence (in the ideal case the set is anideal spectrum).Unfortunately, determining spectrum is not equivalent to de-termining the whole sequence of the examined molecule. Inorder to discover this sequence one has to determine the orderof the spectrum elements in which they appear in the exam-ined DNA. Such a permutation is determined in the second,computational stage of the SBH method.

As has been mentioned, the described biochemical stageof SBH is in fact its ideal case where no errors occur. Inpractice usually two types of errors appear, i.e. thenegativeerrorsand thepositive errors(Błazewicz et al., 1999).

The negative errors may be of two kinds. They may re-sult from imperfectness of the hybridization experiment, i.e.it may happen that in the target sequence there is a substringcomplementary to some element of the oligonucleotide li-brary but the sequence do not hybridize to it. In this caset al-is ccur-r rgetD gtha ri-m in thise of ag f itso ltiset.ITb ut alls

s oft NAh raryw trumc ri thann

thea use-f omet de-v eret themt e oft it ism thes

In the paper a sequencing method being a combinationof two variants of SBH approach, more resistant to er-rors than the standard version of the method, is presented.These variants are multistage SBH(Kruglyak, 1998)(seealsoMargaritis and Skiena, 1995; Frieze and Halldorsson,2002) and isothermic SBH(Błazewicz et al., 2004a)whichare briefly described in Section2 (see also web pagehttp://bio.cs.put.poznan.pl, where the implementation of thelatter approach is presented). The combination of the meth-ods proposed in this paper is more resistant to errors thantheir components, which follows from the fact that multi-stage as well as isothermic approaches are more resistant (incomparison to the standard version of SBH) only to sometypes of errors which may appear in the biochemical stage,but are more sensitive to the errors of the remaining types(Kruglyak, 1998; Formanowicz, 2004). The combination ofthe approaches, presented and analyzed in detail in Section3, reduces this drawback. The paper ends with conclusionsin Section4.

2. Multistage and isothermic SBH

The main idea of the multistage variant of the SBH method(Kruglyak, 1998)is to execute a series of hybridization exper-i s ofi on oft -t tyo e-c cludet aret theb onu-c tegyi theo

enta u-c eri-m r-i h them x-aa e firsth oalo ct allt raryc ste l 2t incet sw ost(

o tion

he information about thisl-tuple is missed, i.e. the cardinty of the obtained spectrum is less thann − l + 1. Anotherource of the negative errors are substrings repetitions oing in the target sequence. More precisely, if in the taNA molecule there is a repetition of a substring of lent leastl it will not be detected in the hybridization expeent because due to the current technology constraints

xperiment it is possible to discover only the presenceivenl-tuple in the examined DNA, but not the number occurrences. It means that spectrum is a set but not a mu

n this case spectrum cardinality is also less thann − l + 1.he spectrum contains information about all types ofl-tupleseing parts of the examined DNA sequence but not abouchl-tuples.

The positive errors are also results of imperfectneshe hybridization experiment. In this case the target Dybridizes to some element of the oligonucleotide libhich is not 100% complementary to it. In this case specontains information about somel-tuples which do not occun the examined sequence and its cardinality is greater− l + 1.In practice in all hybridization experiments some of

bove described errors occur which considerably limitsulness of the SBH approach. Hence, in order to overchis drawback of SBH some variants of the method areeloped which may lead to reduction of the error rate. Hwo main approaches may be distinguished. In one ofhe number of errors appearing in the biochemical staghe method is tried to be minimized. In the other oneinimized an influence of these errors on the quality of

olution obtained in the computational phase.

ments with oligonucleotide libraries containing moleculencreasing lengths. This approach should lead to reductihe number of negative errors following froml-tuples repetiions. Obviously, when the value ofl increases the probabilif finding a repetition of anl-tuple in a random sequence dreases. Based on this simple observation one may conhat the longer the oligonucleotides in the library usedhe better for the quality of the solution obtained. So,est strategy would be to use libraries containing oligleotides of maximal possible length. In fact such a stras applied but the current technology constraints limitligonucleotide length to more or less 10 nucleotides.

In the multistage SBH in the first hybridization experimclassical DNA chip containing a full library of oligon

leotides of a given length is used. Obviously, this expent provides information about all types ofl-tuples appea

ng in the target sequence. The key observation on whicethod is based is that all 2l-tuples being parts of the emined sequence have to be concatenations of somel-tupleslso being parts of the sequence, i.e. those detected in thybridization experiment. From this follows that if the gf the second hybridization experiment would be to dete

ypes of 2l-tuples composing the target sequence, a libomposed of concatenations of alll-tuples detected in the firxperiment should be used instead of a full library of all-uples. The reduction of the library size is considerable she full library is composed of 42l types of oligonucleotidehile the library obtained by concatenation contains at m

n + l − 1)2 types of oligonucleotides.In order to further reduce the number ofprobes, i.e. types

f oligonucleotides in the library, used in a given itera

Page 3: Multistage isothermic sequencing by hybridization

J. Błazewicz, P. Formanowicz / Computational Biology and Chemistry 29 (2005) 69–77 71

of the method it is possible to apply another oligonucleotidelibrary. Let us observe that the library composed of concate-nations ofl-tuples detected in a previous stage of the methodwill contain many probes whose reverse complements arenot present in the target sequence. It is easy to see that everysubstring of lengthl < l′ < 2l present in the target sequencehas to contain two substrings of lengthl which overlap in 2l–l′ positions. From this follows that instead of using librarycomposed of all concatenations ofl-tuples detected in a pre-vious iteration it is possible to use a library containing probeswhich may be seen as pairs of partially overlappedl-tuplesdetected in the previous iteration of the method. This ap-proach considerably reduces a number of probes since manypairs ofl-tuples will not overlap in a required number of po-sitions(Kruglyak, 1998). Obviously, in this case the lengthof probes used in consecutive iterations is not doubled.

It is important to note that substring repetitions in the targetsequence do not affect the quality of the solutions providedby the method. If some repetitivel-tuples are not detectedthis fact will have no influence on the composition of thelibrary of 2l-tuples. In some sense repetitions are positivephenomena since the library of 2l-tuples will be composed ofless than (n − l + 1)2 types of oligonucleotides. On the otherhand, since in each iteration the length of the oligonucleotidesin the library is doubled (or multiplied by some factor lesst veryr theo ll beg

er-f riouspm thodt u-c f.T

o notm d se-q e er-r iter-ao .

lim-i alityo thodi then r ratew ful inpSm

f allo ltingt f thed tem-p ome

simplified models also exist. In one of them each pair A–Tadds 2◦C to the melting temperature of the DNA complexwhile each C–G pair adds 4◦C (Wallace et al., 1981). So, inthis model a nucleotide composition (but not the sequenceof the nucleotides) of a given oligonucleotide determines itsmelting temperature.

More formally, the isothermic oligonucleotide library is aspecial case ofisorelational oligonucleotide librarydefinedas follows(Błazewicz et al., 2004a):

Definition 1. An isorelational oligonucleotide library is alibrary L consisting of all oligonucleotides satisfying rela-tion wAxA + wCxC + wGxG + wTxT = CL, wherewA, wC,wG, wT are increments of nucleotides A, C, G and T, respec-tively, xA, xC, xG, xT denote numbers of these nucleotides inthe oligonucleotide, andCL is a constant parameter for thelibrary.

Based on the above definition an isothermic oligonu-cleotide library can be defined as follows(Błazewicz et al.,2004a):

Definition 2. An isothermic oligonucleotide libraryLτ oftemperatureτ is a library of all oligonucleotides satis-fying relationswAxA + wCxC + wGxG + wTxT = τ, wA =wT, wC = wG and 2wA = wC.

w unto idesb osti md tions( clas-s xesc telyt tione

thef nc-i fer-i ule( nb wayt e li-b thisc i-t umo pectt thata on-t yo inedu to as

andt

hat 2 if the above described modified library is used) eepetitive substring will be detected when the length ofligonucleotides used in the hybridization experiment wireater than the length of the repeat.

Unfortunately, the negative errors following from impectness of the hybridization experiment may be a seroblem for the method. Let us observe that if somel-tuple isissed in the spectrum then in the next iteration of the me

he oligonucleotide library will not contain any oligonleotides in which the missedl-tuple is the left or right halhis error is propagated through the next iterations.

The positive errors are less troublesome since they dake the detection of some substrings of the examineuence impossible. However, a large number of positivors may lead to situation where the library in the nexttion is composed of much more than (n − l + 1)2 types ofligonucleotides which increases the cost of the method

From this follows that the multistage SBH approach enates the influence of substring repetitions on the quf the obtained solutions but on the other hand, the me

s quite sensitive to the hybridization errors, especiallyegative ones. Hence, a method for reduction such erroould be required to make the multistage approach useractical applications. One of such methods is anisothermicBH, where instead of full oligonucleotide librariesisother-ic librariesare used.Roughly speaking an isothermic library is composed o

ligonucleotides which creates duplexes of a given meemperature. This temperature is equivalent to energy ouplex bonds. The calculation of an exact value of theerature is a complicated thermodynamic problem but s

Without loss of generality it may be assumed thatwA =T = 2 andwC = wG = 4. This corresponds to the amof energy or equivalently temperature which nucleotring into the stability of oligonucleotide duplexes. The m

mportant property of isothermic library is its ability to foruplexes in a more narrow range of experimental condii.e. temperature, salt concentration) than in the case ofical libraries. It means that in a given conditions all duplereated by elements of isothermic library have approximahe same stability, which leads to a reduction of hybridizarror rate.

Another important feature of the isothermic libraries isact that one such a library is not sufficient for DNA sequeng but two such libraries with melting temperatures difng by 2◦C can be used for sequencing any DNA molecBłazewicz et al., 2004a). Moreover, any DNA molecule cae covered by elements of such two libraries in such a

hat starting points of two consecutive elements of thraries in the coverage are shifted by one position. Inase we say that the libraries have theleft shift by one posion property. Similarly, it can be said that the ideal spectrbtained using such libraries has this property with res

o a sequence from which it is obtained. Let us observeny full oligonucleotide library, used in classical SBH, c

aining all sequences of given lengthl has the left shift bne position property. Similarly, any ideal spectrum obtasing such a library also has this property with respectequence from which it is obtained.

The idea of isothermic SBH has been implementedhe program is available athttp://bio.cs.put.poznan.pl.

Page 4: Multistage isothermic sequencing by hybridization

72 J. Błazewicz, P. Formanowicz / Computational Biology and Chemistry 29 (2005) 69–77

3. The combined method

In this section the possibility of combining multistage andisothermic SBH will be analyzed. The main problem whicharises is the melting temperature of the oligonucleotide li-braries. In isothermic SBH two libraries are used which tem-peratures differ by 2◦C. In order for the combined method tobe efficient the libraries used should also have temperaturesfalling into very narrow range. On the other hand, in the mul-tistage approach the libraries are constructed on the base ofthe hybridization experiment. Whether these two fundamen-tal properties of the methods are not contradicting each otheris the question which will be answered in this section.

To better understand the considered problem let us showsome properties and relationships between isothermic li-braries and spectra obtained using them. Let us begin withsome definitions.

Definition 3. Let si(p) denote character in positionp of se-quencesi, and let|si| denote length of sequencesi.

Definition 4. Let si ◦ sj denote concatenation of sequencessi and sj, i.e. si ◦ sj = si(1)si(2) · · · si(|si|)sj(1)sj(2) · · ·sj(|sj|).Definition 5. Let θ(si) denote temperature of sequencesi.

D er-aa

t thee -p mt pectt

D ob-t ductS

D fS

i

C

Pθ ofS s:

( re(

( ture

Ti sa

Proof. From the definition we haveSC(2τ+4)(Q, τ, τ + 2) ={si ◦ sj : si ∈ S(Q, τ, τ + 2) ∧ sj ∈ S(Q, τ, τ + 2) ∧ θ(si) =τ + 2 ∧ θ(sj) = τ + 2}. Moreover,∀sk∈SC(2τ+4)(Q,τ,τ+2)θ(sk−sk(1)) ∈ {2τ, 2τ + 2} and∀sk∈SC(2τ+4)(Q,τ,τ+2)θ(sk − sk(|sk|))∈ {2τ, 2τ + 2}. This follows from the fact that a singlenucleotide can have temperature 2 or 4◦C, and it completesthe proof. �

Theorem 2. If set SC(Q, τ, τ + 2) is used as an oligonu-cleotide library in a hybridization experiment then the ob-tained spectrum has the left shift by one position property.

Proof. Let us observe that each element of spectrumS(Q, τ, τ + 2) is a prefix of|S(Q, τ, τ + 2)| elements of setSC(Q, τ, τ + 2). For given sequencesi ∈ SC(Q, τ, τ + 2) letus denote such a prefix byspi . From the definition of setSC(Q, τ, τ + 2) it follows that the remaining part ofsi is alsosome element of spectrumS(Q, τ, τ + 2) (or even it may besi). Let us denote such a suffix byssi . So, we havesi = s

pi ◦ ssi ,

si ∈ SC(Q, τ, τ + 2) andspi , ssi ∈ S(Q, τ, τ + 2).

From the left shift by one position property for spectrumS(Q, τ, τ + 2) it follows that sequenceQ can be covered byprefixes of sequences from setSC(Q, τ, τ + 2) since theycompose setS(Q, τ, τ + 2). It remains to show that immedi-ately after each such a prefixspi in sequenceQ there startssuffix ssi such thatspi ◦ ssi = si.

a nceswFa ass fit2tB

hi n int con-c firsts ift byo ord-i onds to ani usedi tingt ft no hen

Lu eri-m t shiftb

efinition 6. Let S(Q, τ, τ + 2) denote spectrum of tempturesτ andτ + 2 for sequenceQ, i.e.S(Q, τ, τ + 2) containsll substrings ofQ which have temperatureτ or τ + 2.

Let us observe that according to the remark made and of the previous section for any sequenceQ and any temeratureτ if spectrumS(Q, τ, τ + 2) is an ideal spectru

hen it has the left shift by one position property with reso sequenceQ.

efinition 7. LetSC(Q, τ, τ + 2) denote set of sequencesained by concatenation of all elements of Cartesian pro(Q, τ, τ + 2) × S(Q, τ, τ + 2).

efinition 8. Let SC(τ3)(Q, τ1, τ2) denote subset oC(Q, τ1, τ2) containing sequences of temperatureτ3 only,

.e.SC(τ3)(Q, τ1, τ2) = {si : si ∈ SC(Q, τ1, τ2) ∧ θ(si)=τ3} .

laim 1. ∀si∈SC(Q,τ,τ+2)θ(si) ∈ {2τ, 2τ + 2, 2τ + 4}.roof. From the definition it follows that∀si∈S(Q,τ,τ+2)(si) ∈ {τ, τ + 2}. So, concatenation of two elements(Q, τ, τ + 2) produces sequences of three temperature

1) 2τ, if both concatenated sequences have temperatuτ;2) 2τ + 2, if one of the sequences has temperatureτ, while

the other one has temperatureτ + 2;3) 2τ + 4, if both of the sequences have tempera

τ + 2. �

heorem 1. Every oligonucleotide of temperature2τ + 4s a superstring of two oligonucleotides of temperature2τ

nd/or2τ + 2.

Indeed, it must be the case sinceSC(Q, τ, τ + 2) containsll elements being concatenations of all pairs of sequehich are elements of setS(Q, τ, τ + 2) × S(Q, τ, τ + 2).rom this follows that in setSC(Q, τ, τ + 2) sequencespippears as a prefix in|S(Q, τ, τ + 2)| sequences in whichuffixes there appear all elements ofS(Q, τ, τ + 2). Hence, i

n Q immediately afterspi there starts sequencesx of meltingemperature equal toτ or τ + 2, thens

pi ◦ sx ∈ SC(Q, τ, τ +

). Such a sequence has to appear afterspi what follows from

he left shift by one position property forS(Q, τ, τ + 2) (c.f.łazewicz et al. (2004a)). �

FromTheorem 2it follows that if in a multistage approacn the first step two isothermic libraries would be used thehe second step an oligonucleotide library obtained byatenation of all elements of the spectrum obtained in thetep could be used. In this case in both steps the left shne position property would be preserved. However, acc

ng toClaim 1, the elements of the library used in the sectep would have three temperatures, which would leadncrease of the range of temperatures of oligonucleotidesn further iterations of the method. So, it would be intereso check if setSC(Q, τ, τ + 2) without oligonucleotides oemperature 2τ + 4 or 2τ + 2 or 2τ could play a role of aligonucleotide library in the hybridization experiment. Text lemmas answer this question.

emma 1. If setSC(Q, τ, τ + 2) \ SC(2τ+4)(Q, τ, τ + 2) issed as an oligonucleotide library in a hybridization expent then the obtained spectrum does not have the lefy one position property.

Page 5: Multistage isothermic sequencing by hybridization

J. Błazewicz, P. Formanowicz / Computational Biology and Chemistry 29 (2005) 69–77 73

Proof. Let us consider sequenceQ1 = AACCCACCACAand letτ = 8. Moreover, lets1 = CCACC, s2 = CACCA,and s3 = CACCAC (s1 starts in position 4 inQ1 and s2and s3 start in position 5 inQ1). The starting point of se-quencess2 ands3 is shifted by one position relative to thestarting point of sequences1 which is the only element ofsetSC(Q1, τ, τ + 2)\SC(2τ+4)(Q1, τ, τ + 2) starting in posi-tion 4 in Q1. Sequencess2 and s3 are the only sequencesof temperature value in the set{2τ, 2τ + 2, 2τ + 4} havingthis property (i.e. being shifted by one position relative tos1). However,θ(s3) = 2τ + 4, hence it does not belong toSC(Q1, τ, τ + 2)\SC(2τ+4)(Q1, τ, τ + 2). On the other hand,despite thatθ(s2) = 2τ this sequence also does not belong tothis set since it is not concatenation of any pair of sequencesfrom setS(Q1, τ, τ + 2) (so, it is also not an element of setSC(Q1, τ, τ + 2)).

This example shows that spectrum obtained in hybridiza-tion experiment where oligonucleotide librarySC(Q, τ, τ +2)\SC(2τ+4)(Q, τ, τ + 2) is used does not have the left shiftby one position property. �

Lemma 2. If set SC(Q, τ, τ + 2)\SC(2τ+2)(Q, τ, τ + 2) isused as an oligonucleotide library in a hybridization experi-ment then the obtained spectrum does not have the left shiftby one position property.

PA inp b-s nd soo ature2 bye tion,w

La entt y onep

Paa 6 and2 bse-q at anys na-tsF l-eb cov-e etest

T neo et

Proof. The proof immediately follows fromLemmas 1–3. �

Let us observe thatTheorem 1may suggest that setSC(Q, τ, τ + 2)\SC(2τ+4)(Q, τ, τ + 2) could be used as anoligonucleotide library in the multistage approach to se-quencing process. Indeed, each sequence of temperature2τ + 4 is a superstring of two oligonucleotides of temper-atures taken from set{2τ, 2τ + 2}. (It follows also from thetheorem about the left shift by one position property (c.f.Błazewicz et al. (2004a))). However, this result does notjustify using setSC(Q, τ, τ + 2)\SC(2τ+4)(Q, τ, τ + 2) as anoligonucleotide library in the combined method, since asshown in the proof ofLemma 1, not all of these oligonu-cleotides can be constructed by concatenation of two oligonu-cleotides of temperatures equal toτ and/orτ + 2. This nega-tive result suggest that in case of isothermic variant of multi-stage SBH the libraries should be constructed in a bit differentway than in the method described in(Kruglyak, 1998)in or-der to keep the temperatures of the libraries in a narrow range.So, following in this direction let us consider the definitionsgiven below.

Definition 9. Let SVk(Q, τ, τ + 2) be a set of sequences ob-tained from Cartesian productS(Q, τ, τ + 2) × S(Q, τ, τ +2) in such a way that on the base of each pair of sequences( -q2a

D(2 -q

raryc rmicv

T rtyw

P owt eo per-aa

ap ovet timew est enceh re oft 4Mt in

roof. Let us consider sequenceQ2 = AACCCCCCCCAAA and let τ = 8. Let us observe thatosition 4 of sequenceQ2 there is starting point of suequences of temperatures 4, 8, 10, 14, 18, 22, 26 an. If in the spectrum there is no sequence of temperτ + 2 = 18 then this sequence cannot be coveredlements of such a spectrum with left shift by one posihich completes the proof. �

emma 3. If setSC(Q, τ, τ + 2)\SC(2τ)(Q, τ, τ + 2) is useds an oligonucleotide library in a hybridization experimhen the obtained spectrum does not have the left shift bosition property.

roof. Let us consider sequenceQ3 = CCCCGGGGCCCCnd letτ = 8. In positions 1, 2, . . . , 8 of sequenceQ3 therere starting points of subsequences of temperatures 10 and there is no position in this sequence where suuence of temperature 18 starts. Let us also observe thubsequence ofQ3 of temperature 20 is not a concateion of sequences of temperatureτ = 8 and/orτ + 2 = 10,o any of these sequences is not an element ofSC(Q, τ, τ + 2).rom this follows that sequenceQ3 cannot be covered by ements ofSC(Q, τ, τ + 2)\SC(2τ)(Q, τ, τ + 2) with left shifty one position (moreover, this sequence cannot bered by elements of this set with any shift), which compl

he proof. �

heorem3.SetSC(Q, τ, τ + 2)without elements havingof the three temperatures2τ, 2τ + 2 or 2τ + 4 does not havhe left shift by one position property.

si, sj) ∈ S(Q, τ, τ + 2) × S(Q, τ, τ + 2) it is created seuencesi ◦ (sj(k + 1)sj(k + 2)) · · · sj(|sj|) ∈ SVk(Q, τ, τ +) if si(|si| − p + 1) = sj(k − p + 1) for p = 1, 2, . . . , k

ndk > 0. If k = 0 thensi ◦ sj ∈ SVk(Q, τ, τ + 2).

efinition 10. Let SX = SV1(2τ−2)(Q, τ, τ + 2) ∪ SV1(2τ)Q, τ, τ + 2) ∪ SV0(2τ)(Q, τ, τ + 2), whereSVk(τ)(Q, τ, τ +) denotes subset ofSVk(Q, τ, τ + 2) containing only seuences of melting temperature equal toτ.

The next theorem shows that an oligonucleotide libonstructed in this way may be very useful in the isotheariant of multistage SBH.

heorem4. SetSX has the left shift by one position propeith respect to sequence Q. �

roof. In order to prove this theorem it suffices to shhat setSX contains all elements of spectrum for sequencQbtained using some two isothermic libraries whose temtures differ by 2◦C. So, it will be shown thatSX containsll elements of spectrumS(Q, 2τ − 2, 2τ).

Let us consider any DNA sequenceQ and let us setointer in the first position of this sequence. Let us m

he pointer through the right end of the sequence. Eachhen the pointer points some position inQ it determines somequence which starts in the first position ofQ and ends inhe position indicated by the pointer. Obviously, this sequave some melting temperature. Initially the temperatu

he sequence determined in this way equals to 2 or◦C.oving the pointer through the right end of sequenceQ the

emperature increases by 2 or 4◦C in each position. So,

Page 6: Multistage isothermic sequencing by hybridization

74 J. Błazewicz, P. Formanowicz / Computational Biology and Chemistry 29 (2005) 69–77

some position of sequenceQ the temperature of the sequencewill be equal toτ or τ + 2 ◦C.

Let us consider the case where the temperature of the se-quence determined by the pointer is equal toτ. We can denotethis sequence bysi and position|si| + 1 in sequenceQmay beseen as a first position of another subsequencesj. Repeatingthe above reasoning for sequencesj, i.e. moving the pointerthrough the right end ofQ again a situation will be reachedwhere the pointer determines sequence (sj) of temperatureτor τ + 2.

If the temperature is equal toτ, then sequencesi ◦ sj hastemperature 2τ.

It may also be the case that the pointer cannot determinea sequence of temperatureτ, but in this case it must deter-mine sequence of temperatureτ + 2. In this case we may putback the pointer by one position such that it will determine se-quencesj of temperatureτ − 2 (because if the pointer was notable to point on sequence of temperatureτ the last nucleotidehas to be C or G, so moving the pointer before this nucleotidesequence of temperatureτ − 2 will be determined).

Moreover, we can set the starting point of subsequencesjin position|si| of sequenceQ. If the last nucleotide ofsi isA or T, then the temperature of sequencesj defined in thisway equals toτ. On the other hand, if the last nucleotide ofsi is C or G, then the temperature ofsj is equal toτ + 2.It

finess let tp fi itional ofs si-tt

nce(t ls 2

sitioni eensc itionot tso ofss re inS veri

se-q Asip

chosen arbitrarily. In other words, for any sequenceQ thereexists setSX having the left shift by one position propertywith respect toQ.) �

So,Theorem 4indicates that setSX, which by definition isobtained on the base of an isothermic spectrum of the targetsequence can be used as an oligonucleotide library in the mul-tistage isothermic SBH. If setSX is used as an oligonucleotidelibrary in the hybridization experiment then as a result spec-trum S(Q, 2τ − 2, 2τ) will be obtained. Let us observe thatS(Q, 2τ − 2, 2τ) ⊆ SX and obviouslyS(Q, 2τ − 2, 2τ) ⊂(L2τ−2 ∪ L2τ) and|SX| < |L2τ−2 ∪ L2τ |, which justifies theuse ofSX in the method.

Based onTheorem 4the followingmultistage isothermicSBH(MISBH) approach can be proposed.

Algorithm 1.

1. Choose some temperatureτ determining the temperaturesof the two isothermic libraries used in the first hybridiza-tion experiment (these libraries will be the complete li-braries of oligonucleotides having temperaturesτ andτ + 2 like in the standard isothermic SBH approach).

2. Perform hybridization experiment using libraries definedin step 1. As a result spectrumS(Q, τ, τ + 2) is obtained.

3. Create setSX on the base of spectrumS(Q, τ, τ + 2).4. Perform hybridization experiment using setSX as an

5

6 rome-

on-s ob-t ins al NAs whichm s de-t n beu encesi thes1 thet s notc caseo gives

T er-ac nts ofie

P nces -q n

n both cases sequence ((si(1)si(2) · · · si(|si| − 1)) ◦ sj) hasemperature equal to 2τ − 2.

Now, let us consider the case where the pointer deequencesi of temperatureτ + 2 (because it is impossibo distinguish sequence of temperatureτ starting in the firsosition ofQ). Obviously, in this case the last nucleotide osi

s C or G. It is possible to put back the pointer by one posnd define in this way sequence of temperatureτ − 2. The

ast nucleotide ofsi may be treaten as the first nucleotideequencesj. Again, moving the pointer to the right a poion will be reached where the pointer defines sequencesj ofemperatureτ or τ + 2.

In the first case the temperature of seque(si(1)si(2) · · · si(|si| − 1)) ◦ sj) is equal to 2τ − 2. Inhe second case the temperature of this sequence equaτ.

Since the same argument can be applied to any pon sequenceQ (here we have chosen position 1), it has bhown that each element of spectrumS(Q, 2τ − 2, 2τ) can beonstructed by concatenation or overlapping in one posf two sequences taken from spectrumS(Q, τ, τ + 2). From

his follows that for sequenceQ setSX contains all elemenf spectrumS(Q, 2τ − 2, 2τ). Moreover, since elementspectrumS(Q, 2τ − 2, 2τ) can cover sequenceQ with lefthift by one position and all elements of this spectrum aX, then this set contains all sequences necessary to coQ

n the same way.Since sequenceQ has been chosen in arbitrary way,

uences from setSX may be used to cover an arbitrary DNequence with left shift by one position. (Obviously, setSX

s always defined with respect to given sequenceQ and theroperty concerns only this particular sequence, butQcan be

oligonucleotide library. As a result spectrumS(Q, 2τ −2, 2τ) is obtained.

. If the temperature should be further increased, setτ :=2τ − 2 and go to step 3, otherwise go to step 6.

. Use an algorithm for isothermic SBH (e.g. the one f(Błazewicz et al., 2004b)) to reconstruct the target squence on the base ofS(Q, 2τ − 2, 2τ).

As previously mentioned, the oligonucleotide library ctructed by concatenation of all elements of spectrumained in a given stage of the multistage approach contaot of sequences which are not parts of the examined Dequence. Instead, a library composed of sequencesay be constructed by partial overlap of subsequence

ected in a previous stage of the multistage method cased, which considerably reduces the number of sequ

n the library. This observation can be directly applied totandard version of the multistage SBH approach(Kruglyak,998). However, in the isothermic variant of the method

emperatures of the probes play a crucial role, hence it ilear if overlapping in more than one position, as in thef setSX, can be applied. The next theorem and its proofome hints which can help to answer this question.

heorem 5. Each element of isothermic libraries of temptures2τ − k and2τ − k + 2 for k = 0, 2, 4, . . . , τ can beonstructed by overlapping two sequences being elemesothermic libraries of temperaturesτ andτ + 2, i.e. such anlement is a superstring of these two elements.

roof. Let us consider arbitrary chosen DNA sequep and let us distinguish within sequencesp another seuencesi such that|si| ≤ |sp| starting in the first positio

Page 7: Multistage isothermic sequencing by hybridization

J. Błazewicz, P. Formanowicz / Computational Biology and Chemistry 29 (2005) 69–77 75

of sp. In addition, let us also distinguish insp sequences′j =si(x)si(x + 1) · · · si(|si|) for somex ∈ {1, 2, . . . , |si|} and se-quences′′j = sp(|si| + 1)sp(|si| + 2) · · · sp(|sp|).

Let us assume that sequencesp has temperature 2τ − k

for somek ∈ {0, 2, 4, . . . , τ}.Let us also assume that sequencesi has temperatureτ.In this case sequences′′j has temperature 2τ − k − τ =

τ − k andxcan be chosen in such a way that sequences′j willhave temperaturek or k + 2. Hence, sequencesj = s′j ◦ s′′jhas temperaturek + τ − k = τ or k + 2 + τ − k = τ + 2.

Now, let us assume that sequencesi has temperatureτ + 2.In this cases′′j has temperature 2τ − k − τ − 2 = τ − k −

2 andxcan be chosen in such a way thats′j will have tempera-turek + 2 ork + 4. Hence, sequencesj = s′j ◦ s′′j has temper-aturek + 2 + τ − k − 2 = τ or k + 4 + τ − k − 2 = τ + 2.

Now, let us assume that sequencesp has temperature 2τ −k + 2, for somek ∈ {0, 2, 4, . . . , τ}.

Let us also assume, similarly like previously, that sequencesi has temperatureτ.

In this case sequences′′j has temperature 2τ − k + 2 −τ = τ − k + 2 andx can be chosen in such a way that se-quences′j will have temperaturek − 2 or k. Hence, se-quencesj = s′j ◦ s′′j has temperaturek − 2 + τ − k + 2 = τ

or k + τ − k + 2 = τ + 2.Now, let us assume that sequencesi has temperatureτ + 2.

τ

tt

p set{ m-pt g oftτ

nsi f thec

o gei afterih hd mentso

et ncesc raryo of

1

thatθ(s′i) = k + 2 andθ(s′′i ) = k are tried to be determined(let us observe that one of these suffixes may not exist insi). Next, if the suffixes exist it is checked whether thereis sequencesj ∈ S(Q, τ, τ + 2) such that:(a) there is prefixs′j of sj such thats′j = s′i. If yes,

then the temperature of sequencesp = (si − s′i) ◦ s′i ◦(sj − s′j) = si ◦ (sj − s′j) is equal toτ + τ − k − 2 =2τ − k − 2 if θ(sj) = τ and the temperature equalsτ + τ + 2 − k − 2 = 2τ − k if θ(sj) = τ + 2. Obvi-ously, only sequence of temperature 2τ − k or 2τ −k + 2 may be an element of the constructed library,so only sequencessj of temperatureτ + 2 should betaken into account when suffixs′i is considered;

(b) there is prefixs′′j of sj such thats′′j = s′′i . If yes, then thetemperature of sequencesp = si ◦ (sj − s′j) is equalto τ + τ − k = 2τ − k if θ(sj) = τ and the temper-ature equalsτ + τ + 2 − k = 2τ − k + 2 if θ(sj) =τ + 2. In both cases the obtained sequence has aproper temperature of the designed library (however,in the second case the temperature is different fromthe one assumed in the current step of the designingprocedure).

2. On the base of everysi ∈ S(Q, τ, τ + 2) for whichθ(si) =τ + 2 a sequencesp for which θ(sp) = 2τ − k is tried tobe constructed. Similarly like in the previous step suffixes

st it

esnt

en

b-e de-per-rent

3

tIfence

l

In this cases′′j has temperature 2τ − k + 2 − τ − 2 =− k andx can be chosen in such a way thats′j will have

emperaturek or k + 2. Hence, sequencesj = s′j ◦ s′′j hasemperaturek + τ − k = τ or k + 2 + τ − k = τ + 2.

From this follows that in all cases sequencesp is a su-erstring of two sequences of temperatures taken fromτ, τ + 2}. Sincesp is an arbitrary DNA sequence of teerature 2τ − k or 2τ − k + 2 for k ∈ {0, 2, 4, . . . , τ} it is

rue that any such a DNA sequence is a superstrinwo elements of isothermic libraries of temperaturesτ and+ 2. �

Let us observe thatk determines the number of position which the two sequences composing an element oonstructed library overlap.

Based on the proof ofTheorem 5the following methodf constructing oligonucleotide libraries for the multista

sothermic SBH may be proposed. Let us assume that-th iteration of the MISBH method spectrumS(Q, τ, τ + 2)as been obtained. The value ofk must be chosen, whicetermines the overlap of the sequences composing elef the new library.

According to the proof ofTheorem 5, k determines themperature of the overlapping parts of the two sequeomposing the new probe. The designing of the new libf temperatures 2τ − k and 2τ − k + 2 can be divided int

our stages:

. On the base of everysi ∈ S(Q, τ, τ + 2) for whichθ(si) =τ a sequencesp for which θ(sp) = 2τ − k is tried to beconstructed. In order to do this suffixess′i ands′′i of si such

s′i ands′′i of si such thatθ(s′i) = k + 4 andθ(s′′i ) = k + 2are tried to be determined. Next, if the suffixes exiis checked whether there is sequencesj ∈ S(Q, τ, τ + 2)such that:(a) there is prefixs′j of sj such thats′j = s′i. If yes, then

the temperature of sequencesp = si ◦ (sj − s′j) isequal toτ + 2 + τ − k − 4 = 2τ − k − 2 if θ(sj) = τ

and the temperature equalsτ + 2 + τ + 2 − k − 4 =2τ − k if θ(sj) = τ + 2. From these two sequenconly the one of temperature 2τ − k can be an elemeof the constructed library, so only sequencessj oftemperatureτ + 2 should be taken into account whsuffix s′i is considered;

(b) there is prefixs′′j of sj such thats′′j = s′′i . If yes,then the temperature of sequencesp = si ◦ (sj − s′j)is equal toτ + 2 + τ − k − 2 = 2τ − k if θ(sj) = τ

and the temperature equalsτ + 2 + τ + 2 − k − 2 =2τ − k + 2 if θ(sj) = τ + 2. In both cases the otained sequence has a proper temperature of thsigned library (but in the second case it is a temature different from the one assumed in the curstep of the designing procedure).

. On the base of everysi ∈ S(Q, τ, τ + 2) for whichθ(si) =τ a sequencesp for which θ(sp) = 2τ − k + 2 is triedto be constructed. Suffixess′i and s′′i of si such thaθ(s′i) = k andθ(s′′i ) = k − 2 are tried to be determined.the suffixes exist it is checked whether there is sequsj ∈ S(Q, τ, τ + 2) such that:(a) there is prefixs′j of sj such thats′j = s′i. If yes, then the

temperature of sequencesp = si ◦ (sj − s′j) is equa

Page 8: Multistage isothermic sequencing by hybridization

76 J. Błazewicz, P. Formanowicz / Computational Biology and Chemistry 29 (2005) 69–77

to τ + τ − k = 2τ − k if θ(sj) = τ and the temper-ature equalsτ + τ + 2 − k = 2τ − k + 2 if θ(sj) =τ + 2. In both cases the obtained sequence has aproper temperature of the designed library (but in thefirst case the temperature is different from the one as-sumed in the current step of the designing procedure);

(b) there is prefixs′′j of sj such thats′′j = s′′i . If yes, then thetemperature of sequencesp = si ◦ (sj − s′j) is equalto τ + τ − k + 2 = 2τ − k + 2 if θ(sj) = τ and thetemperature equalsτ + τ + 2 − k + 2 = 2τ − k + 4if θ(sj) = τ + 2. From these two sequences only theone of temperature 2τ − k + 2 can be an element ofthe constructed library, so only sequencessj of tem-peratureτ should be taken into account when suffixs′i is considered.

4. On the base of everysi ∈ S(Q, τ, τ + 2) for whichθ(si) =τ + 2 a sequencesp for which θ(sp) = 2τ − k + 2 istried to be constructed. Suffixess′i ands′′i of si such thatθ(s′i) = k + 2 andθ(s′′i ) = k are tried to be determined. Ifthe suffixes exist it is checked whether there is sequencesj ∈ S(Q, τ, τ + 2) such that:(a) there is prefixs′j of sj such thats′j = s′i. If yes,

then the temperature of sequencesp = si ◦ (sj − s′j)is equal toτ + 2 + τ − k − 2 = 2τ − k if θ(sj) = τ

and the temperature equalsτ + 2 + τ + 2 − k − 2 =b-e de-turep of

l

theof

ffix

t hen

wingw

A

1t

his-

1.2. For every sequencesj ∈ S(Q, τ, τ + 2) for whichθ(sj) = τ + 2 check whether there are prefixes of thissequences′′j = s′′i ands′′′j = s′′′i . If these prefixes ex-ist create sequencessi ◦ (sj − s′′j ) andsi ◦ (sj − s′′′j )and add them to the created library.

2. For every sequencesi ∈ S(Q, τ, τ + 2) for whichθ(si) =τ + 2 determine, if possible, suffixess′i, s′′i ands′′′i suchthatθ(s′i) = k, θ(s′′i ) = k + 2 andθ(s′′′i ) = k + 4.2.1. For every sequencesj ∈ S(Q, τ, τ + 2) for which

θ(sj) = τ check whether there are prefixes of thissequences′j = s′i ands′′j = s′′i . If there exist such pre-fixes create sequencessi ◦ (sj − s′j) and si ◦ (sj −s′′j ) and add them to the created library.

2.2. For every sequencesj ∈ S(Q, τ, τ + 2) for whichθ(sj) = τ + 2 check whether there are prefixes of thissequences′′j = s′′i ands′′′j = s′′′i . If these prefixes ex-ist create sequencessi ◦ (sj − s′′j ) andsi ◦ (sj − s′′′j )and add them to the created library.

Based onTheorem 5andAlgorithm 2the following mul-tistage isothermic SBH approach can be proposed. In thisvariant of the approach the overlap of the sequence pairscomposing the elements of the oligonucleotide library can bechosen.

Algorithm 3.

1 esza-

li-

2 ned.

3 li-

4 ary

5

6 the

3 sub-p f them rithmd no ver( in-p f hy-b argetD hmsa , oneo bei ure.

2τ − k + 2 if θ(sj) = τ + 2. In both cases the otained sequence has a proper temperature of thsigned library (but in the first case it is a temperadifferent from the one assumed in the current stethe designing procedure);

(b) there is prefixs′′j of sj such thats′′j = s′′i . If yes, then thetemperature of sequencesp = si ◦ (sj − s′j) is equato τ + 2 + τ − k = 2τ − k + 2 if θ(sj) = τ and thetemperature equalsτ + 2 + τ + 2 − k = 2τ − k + 4if θ(sj) = τ + 2. From these two sequences onlyone of temperature 2τ − k + 2 can be an elementthe constructed library, so only sequencessj of tem-peratureτ should be taken into account when sus′i is considered.

Every constructed in the above way sequencesp havingemperature 2τ − k or 2τ − k + 2 becomes an element of tew oligonucleotide library.

The above procedure can be summarized in the folloay.

lgorithm 2.

. For every sequencesi ∈ S(Q, τ, τ + 2) for whichθ(si) =τ determine, if possible, suffixess′i, s′′i ands′′′i such thaθ(s′i) = k − 2, θ(s′′i ) = k andθ(s′′′i ) = k + 2.1.1. For every sequencesj ∈ S(Q, τ, τ + 2) for which

θ(sj) = τ check whether there are prefixes of tsequences′j = s′i ands′′j = s′′i . If there exist such prefixes create sequencessi ◦ (sj − s′j) and si ◦ (sj −s′′j ) and add them to the created library.

. Choose some temperatureτ determining the temperaturof the two isothermic libraries used in the first hybridition experiment (these libraries will be the completebraries of oligonucleotides having temperaturesτ andτ + 2 like in the standard isothermic SBH approach).

. Perform hybridization experiment using libraries defiin step 1. As a result spectrumS(Q, τ, τ + 2) is obtained

. Choose the value ofk and create the oligonucleotidebrary according toAlgorithm 2.

. Perform hybridization experiment using the new librdefined in step 3. As a result spectrumS(Q, 2τ − k, 2τ −k + 2) is obtained.

. If the temperature should be further increased, setτ :=2τ − k and go to step 3, otherwise go to step 6.

. Use an algorithm for isothermic SBH to reconstructtarget sequence on the base ofS(Q, 2τ − k, 2τ − k + 2).

As one can notice an important part ofAlgorithms 1 andis a method for isothermic SBH which is used as a

rocedure in the last step of these algorithms. One oethod which can be applied here is a tabu search algoescribed inBłazewicz et al. (2004b). The implementatiof this algorithm is available at Computation Biology Serhttp://bio.cs.put.poznan.pl). The program requires as anut a data set being spectrum obtained as a result oridization experiment and returns the reconstructed tNA sequence. Other important parts of the two algoritre the methods of construction oligonucleotide librariesf them beingAlgorithm 2. Such methods are intended to

ncluded into the Computational Biology Server in the fut

Page 9: Multistage isothermic sequencing by hybridization

J. Błazewicz, P. Formanowicz / Computational Biology and Chemistry 29 (2005) 69–77 77

4. Conclusions

The procedure proposed in this paper combines two non-standard approaches to sequencing by hybridization, i.e. themultistage and the isothermic SBH. First of them offers re-duction of number of errors following froml-tuple repetitionsin the target sequence. On the other hand, the method is quitesensitive to hybridization errors, especially to the negativeones. The basic property of the second method is the pos-sible reduction of hybridization errors, but this approach issensitive to repetitions of short subsequences in the examinedDNA sequence. So, the two methods are in some sense com-plementary to each other. The combination of the methods isless sensitive to errors than each of the two approaches.

It should be noted that the number of stages in the com-bined method must be adequate to the temperature of thelibraries used, since the temperature cannot be to high due tothe conditions in which the hybridization experiment can beperformed. This constraint is an analogous one to the limi-tation of the number of stages in multistage SBH, where thelength of the probes also cannot be too big(Kruglyak, 1998).However, even a few-stage variant of the combined methodmay be much more errorless than the standard variant of SBH.

Acknowledgement

rant3

R

Bomp.

Błazewicz, J., Formanowicz, P., Kasprzak, M., Markiewicz, W.T., 2004a.Sequencing by hybridization with isothermic oligonucleotide libraries.Disc. Appl. Math. 145, 40–51.

Błazewicz, J., Formanowicz, P., Kasprzak, M., Markiewicz, W.T.,Swiercz,A., 2004b. Tabu search algorithm for DNA sequencing by hy-bridization with isothermic libraries. Comp. Biol. Chem. 28, 11–19.

Drmanac, R., Labat, I., Brukner, I., Crkvenjakov, R., 1989. Sequencing ofmegabase plus DNA by hybridization: theory and the method. Genomics4, 114–128.

Fodor, S.P.A., Read, J.L., Pirrung, M.C., Stryer, L., Lu, A., Solas, D., 1991.Light-directed, spatially addressable parallel chemical synthesis. Sci-ence 251, 767–773.

Formanowicz, P., 2004. Resolving power of isothermic DNA sequencingchips, Bull. Pol. Ac.: Tech. in press.

Frieze, A.M., Halldorsson, B.V., 2002. Optimal sequencing by hybridizationin rounds. J. Comp. Biol. 9, 355–369.

Khrapko, K.R., Lysov, Y.P., Khorlin, A.A., Shik, V.V., Florentiev, V.L., Mirz-abekov, A.D., 1989. An oligonucleotide approach to DNA sequencing.FEBS Lett. 256, 118–122.

Kruglyak, S., 1998. Multistage sequencing by hybridization. J. Comp. Biol.5, 165–171.

Margaritis, D., Skiena, S., 1995. Reconstructing strings from substringsin rounds. In: Proceedings of the 36th Annual Symposium on Foun-dations of Computer Science, IEEE Computer Science Press, 613–620.

Pease, A.C., Solas, D., Sullivan, E., Cronin, M., Holmes, C., Fodor, S.,1994. Light-generated oligonucleotide arrays for rapid DNA sequenceanalysis. Proc. Natl. Acad. Sci. U.S.A. 91, 5022–5026.

Pevzner, P.A., 2000. Computational molecular biology. An algorithmic ap-

S 0.S par-

onu-008–

W .H.,ationbbit

W re of

The research has been partially supported by KBN gT11F00227.

eferences

łazewicz, J., Formanowicz, P., Kasprzak, M., Markiewicz, W.T., We¸glarz,J., 1999. DNA sequencing with positive and negative errors. J. CBiol. 6, 113–123.

proach. The MIT Press, Cambridge, Massachusetts.outhern, E.M., 1988. United Kingdom patent application GB8 81040outhern, E.M., Maskos, U., Elder, J.K., 1992. Analyzing and com

ing nucleic acid sequences by hybridization to arrays of oligcleotides: evaluation using experimental models. Genomics 13, 11017.

allace, R.B., Johnson, M.J., Hirose, T., Miyake, T., Kawashima, EItakura, K., 1981. The use of synthetic oligonucleotides as hybridizprobes. II. Hybridization of oligonucleotides of mixed sequence to rabeta-globin DNA. Nucleic Acids Res. 9, 879–894.

atson, J.D., Crick, F.H.C., 1953. Genetic implications of the structudeoxyribonucleic acid. Nature 171, 964–967.