dna quadruplex folding formalism – a tutorial on quadruplex topologies

8
DNA quadruplex folding formalism – A tutorial on quadruplex topologies Andreas Ioannis Karsisiotis, Christopher O’Kane, Mateus Webba da Silva School of Pharmacy & Pharmaceutical Sciences, Biomedical Sciences Research Institute, University of Ulster, Cromore Road, BT52 1SA, Coleraine, UK article info Article history: Available online 19 June 2013 Keywords: Quadruplex Formalism Quadruplex structure Glycosidic bond angle Tetrad NMR structure abstract Quadruplexes of DNA adopt a large variety of topologies that are dependent on their environment. We have been developing a formalism for quadruplex folding based on the relationship between base and its sugar – as defined by the glycosidic bond angle. By reducing the quadruplex stem to a description based on two finite states of the range of angles the glycosidic bond angle may adopt, the description of the relationships of type of loop and groove widths of a quadruplex stem are possible. In its current form this formalism has allowed for the prediction of some unimolecular quadruplex topologies. Its rules, whilst developed for unimolecular quadruplexes of three loops, are of general utility in understanding the interdependency of structural characteristics of multimolecular folds, as well as unimolecular quadru- plexes of more than three loops. Here we describe current understanding of the interdependent struc- tural features that define the quadruplex fold, and provide a tutorial for the use and application of this formalism. Ó 2013 Elsevier Inc. All rights reserved. 1. Introduction The primary sequence of DNA encodes for a variety of topolo- gies. In its normal topological form DNA adopts what is known as B-DNA form (Fig. 1). In B-DNA, the intrastrand base-stacking of bases within the right-handed helix is the dominant force con- tributing to the stability of the stem. It results in C2 0 -endo sugar pucker and approximately 11 base-pairs for a full DNA turn [1,2]. In double-stranded DNA the sequence of Watson–Crick base pair- ings A:T and G:C defines the topological characteristics of the structure. In A-DNA, tracks of sequential adenines adopt a charac- teristic p-stacking that widens the diameter of the stem of the structure, forces the sugar puckers to adopt a C3 0 -endo conforma- tion, and shortens the length of a full DNA turn to 10.5 bases. Inter- spersion of guanosines and cytosines in DNA bearing (GpC)3 at high ionic strength gives way to a Z-DNA form [3]. In this topology each unit of intrastrand stacked (GpC) is interrupted by interstrand stacking of its cytosines in a left-handed helix. This arrangement is possible since here the Watson–Crick G:C base-pair has all guano- sines adopting a syn conformation, whilst the cytosines maintain the normal anti glycosidic bond angle (see Fig. 1). The resultant grooves are larger than the minor grooves of both B- and A-DNA with guanosines forced into a C2 0 -exo and cytosines a C2 0 -endo conformations. Charge plays a role in stabilizing all of these archi- tectures by formation of cation splines where the phosphate back- bone is closest. By adopting a positively charged state forming an extra imino in acidic solutions (pH < 5.7) cytosines may recognize the Hoogsteen edge of a Watson–Crick base-paired guanosine forming a triad [4]. Triads are pseudo-planar architectures composed of three hydro- gen bond aligned bases that give way to triple-stranded DNA-tri- plexes. Some triplexes represent an extension of the B-DNA form with intrastrand base-stacking in a minor groove defined by Wat- son–Crick base-pairing of their bases [4]. Triplexes may also form from base-mismatches that define the interactions between strands at neutral pH [5], and thus with all groove dimensions dif- ferent from B-DNA. In four-stranded DNA; known as quadruplexes, the stacking of tetrads defines the topology. Tetrads are pseudo-planar architec- tures composed of four hydrogen-bound nucleobases. DNA quad- ruplexes may adopt a variety of architectures, not only dictated by the sequence of oligonucleotides, but also by a variety of envi- ronmental conditions such as nature and concentration of cations [6], solvent viscosity [7], and molecular crowding/hydration [8,9]. Their structural variations are fundamental to understand their biological significance as well as their utility to the production of biotechnologies and materials. We have been developing a formal- ism [10] that defines a set of structural descriptors as a standard for the structural interpretation [11] and programming of the self-assembly [12] of these architectures. The geometric formalism of quadruplex folding describes the interdependency of quadru- plex structural parameters on the basis of the two states that the 1046-2023/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ymeth.2013.06.004 Corresponding author. E-mail addresses: [email protected], [email protected] (M. Webba da Silva). Methods 64 (2013) 28–35 Contents lists available at SciVerse ScienceDirect Methods journal homepage: www.elsevier.com/locate/ymeth

Upload: uba

Post on 30-Apr-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Methods 64 (2013) 28–35

Contents lists available at SciVerse ScienceDirect

Methods

journal homepage: www.elsevier .com/locate /ymeth

DNA quadruplex folding formalism – A tutorial on quadruplextopologies

1046-2023/$ - see front matter � 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.ymeth.2013.06.004

⇑ Corresponding author.E-mail addresses: [email protected], [email protected] (M.

Webba da Silva).

Andreas Ioannis Karsisiotis, Christopher O’Kane, Mateus Webba da Silva ⇑School of Pharmacy & Pharmaceutical Sciences, Biomedical Sciences Research Institute, University of Ulster, Cromore Road, BT52 1SA, Coleraine, UK

a r t i c l e i n f o

Article history:Available online 19 June 2013

Keywords:QuadruplexFormalismQuadruplex structureGlycosidic bond angleTetradNMR structure

a b s t r a c t

Quadruplexes of DNA adopt a large variety of topologies that are dependent on their environment. Wehave been developing a formalism for quadruplex folding based on the relationship between base andits sugar – as defined by the glycosidic bond angle. By reducing the quadruplex stem to a descriptionbased on two finite states of the range of angles the glycosidic bond angle may adopt, the descriptionof the relationships of type of loop and groove widths of a quadruplex stem are possible. In its currentform this formalism has allowed for the prediction of some unimolecular quadruplex topologies. Its rules,whilst developed for unimolecular quadruplexes of three loops, are of general utility in understanding theinterdependency of structural characteristics of multimolecular folds, as well as unimolecular quadru-plexes of more than three loops. Here we describe current understanding of the interdependent struc-tural features that define the quadruplex fold, and provide a tutorial for the use and application of thisformalism.

� 2013 Elsevier Inc. All rights reserved.

1. Introduction

The primary sequence of DNA encodes for a variety of topolo-gies. In its normal topological form DNA adopts what is knownas B-DNA form (Fig. 1). In B-DNA, the intrastrand base-stackingof bases within the right-handed helix is the dominant force con-tributing to the stability of the stem. It results in C20-endo sugarpucker and approximately 11 base-pairs for a full DNA turn [1,2].In double-stranded DNA the sequence of Watson–Crick base pair-ings A:T and G:C defines the topological characteristics of thestructure. In A-DNA, tracks of sequential adenines adopt a charac-teristic p-stacking that widens the diameter of the stem of thestructure, forces the sugar puckers to adopt a C30-endo conforma-tion, and shortens the length of a full DNA turn to 10.5 bases. Inter-spersion of guanosines and cytosines in DNA bearing (GpC)3 athigh ionic strength gives way to a Z-DNA form [3]. In this topologyeach unit of intrastrand stacked (GpC) is interrupted by interstrandstacking of its cytosines in a left-handed helix. This arrangement ispossible since here the Watson–Crick G:C base-pair has all guano-sines adopting a syn conformation, whilst the cytosines maintainthe normal anti glycosidic bond angle (see Fig. 1). The resultantgrooves are larger than the minor grooves of both B- and A-DNAwith guanosines forced into a C20-exo and cytosines a C20-endoconformations. Charge plays a role in stabilizing all of these archi-

tectures by formation of cation splines where the phosphate back-bone is closest.

By adopting a positively charged state forming an extra imino inacidic solutions (pH < 5.7) cytosines may recognize the Hoogsteenedge of a Watson–Crick base-paired guanosine forming a triad [4].Triads are pseudo-planar architectures composed of three hydro-gen bond aligned bases that give way to triple-stranded DNA-tri-plexes. Some triplexes represent an extension of the B-DNA formwith intrastrand base-stacking in a minor groove defined by Wat-son–Crick base-pairing of their bases [4]. Triplexes may also formfrom base-mismatches that define the interactions betweenstrands at neutral pH [5], and thus with all groove dimensions dif-ferent from B-DNA.

In four-stranded DNA; known as quadruplexes, the stacking oftetrads defines the topology. Tetrads are pseudo-planar architec-tures composed of four hydrogen-bound nucleobases. DNA quad-ruplexes may adopt a variety of architectures, not only dictatedby the sequence of oligonucleotides, but also by a variety of envi-ronmental conditions such as nature and concentration of cations[6], solvent viscosity [7], and molecular crowding/hydration [8,9].Their structural variations are fundamental to understand theirbiological significance as well as their utility to the production ofbiotechnologies and materials. We have been developing a formal-ism [10] that defines a set of structural descriptors as a standardfor the structural interpretation [11] and programming of theself-assembly [12] of these architectures. The geometric formalismof quadruplex folding describes the interdependency of quadru-plex structural parameters on the basis of the two states that the

Fig. 1. Variety of structures adopted by DNA. From left to right, PDB IDs: 1ANA, 1BNA, 2DCG, 135D, and 1XAV. Side and top down views (toroid) are included. In the bottomrow, examples of stacking steps from each structure with the 50 to 30 directionality illustrated by arrows.

A.I. Karsisiotis et al. / Methods 64 (2013) 28–35 29

glycosidic bond angle can adopt. Here we describe current under-standing of this reductionist approach and give an account of whathas been established experimentally thus far.

2. A Descriptor for quadruplex folding

Unimolecular DNA quadruplexes can theoretically adopt 26folding topologies.[10] In describing some of these topologies sev-eral authors have referred to a variety of names. For the most partthe naming became a necessity due to the structural versatility ofhuman telomeric sequences [13,14]. However, it is not practical toproduce names for every one of the potential 26 topologies. Mostimportantly, in order to truly understand the nature, and interde-pendency of structural parameters of quadruplexes it is necessaryto define a basis for description: a frame of reference for quadru-plexes (Fig. 2) [10]. In this context the schematics depicting theGBA conformation of the bases play a significant role in thisdescription. The starting point is the progression of the strand clos-est to the 50-end. It always has a sugar pucker ellipsoid describedvertically. The 50-end of the quadruplex stem is positioned alwaysat the lower right corner of the square that represents the tetradpseudo-plane. This point of reference ensures that all other param-eters may vary relative to it; i.e., the other sugar pucker ellipsoidsmay adopt horizontal or vertical conformations. As it will be de-scribed later, this is a crucial feature that has experimentalimplications.

Fig. 2. (A) Frame of reference for describing unimolecular G-quadruplex topologies [10].the viewer with polarity indicated by (�) on the sugar-pucker. Loops can progress anticloThe resulting grooves are numbered anticlockwise from first to fourth. (B) The syn and

3. Groove widths in quadruplexes

The possible combinations of GBA for a tetrad can thus be de-rived assuming the frame of reference described. The 16 possibleGBA combinations define 8 possible groove-widths combinations(Fig. 3). A quadruplex has four grooves that can thus be definedby three possible groove widths: narrow (n), medium (m), andwide (w) grooves. In the schematics, pairs of guanosines in a tetraddefine its groove widths. For ease of description the grooves are de-scribed sequentially from the 50 point of reference in an anti-clock-wise manner as ‘first groove’, ‘second groove’, ‘third groove’, andlastly ‘fourth groove’. An example of groove width combinationcan be described as (wmnm) for grooves that appear in sequenceas: first ‘medium’, second ‘wide’, third ‘medium’, and fourth ‘nar-row’ (see e.g. Fig. 2).

A quadruplex is defined by the stacking of at least two tetradswith the same groove width combinations- of the same GBA com-bination, or not. Thus, there are exactly eight possible groove widthcombinations for a quadruplex stem (Fig. 3). This is a salient exper-imental feature. It essentially limits the stacking of quadruplexes ina stable stem to one, or two, variants of the same groove widthcombination. Thus far, seven of these have been observed experi-mentally. The groove width that has not been experimentally ob-served thus far is (wmmn); see tetrad combination VII in Fig. 3.How do groove widths relate to topology? Their variation is inter-

The origin, the 50-end, sits on a strand of the quadruplex stem progressing towardsckwise, as denoted by the red arrow, or clockwise as indicated by the green arrow.

anti glycosidic bond angle dispositions of guanosine.

Fig. 3. Schematic for all possible combinations of GBA for (G:G:G:G) tetrads (I–VIII). Medium, wide, and narrow groove width combinations dictated by the GBA disposition ofguanines within the tetrad are indicated. GBA disposition is indicated as syn (S) and anti (A). Topologies corresponding to each GBA combination are illustrated with thoseexperimentally determined in yellow and those that are yet to be determined in cyan.

30 A.I. Karsisiotis et al. / Methods 64 (2013) 28–35

dependent with the types of loops in the topology. We will thusnext address loops and their impact in defining topology.

4. Loops in quadruplexes

A loop is any region linking tetrads of the same groove widthcombination that define the quadruplex stem. This thus excludesall other base-pairing alignments; including tetrads that do nothave the same groove width combinations. Loops mostly consistof bases that do not engage in pairing within the topology. In gen-eric terms the longer the loops the less stable is a topology [15].However, loop length and type defines topology [12,15]. There

are three loop types: propeller (p), diagonal (d), and lateral [l]. Adiagonal loop links two bases of the same tetrad that are notbase-paired. A propeller loop links bases of different tetrads inthe same groove. Lateral loops link adjacent bases over a narrowgroove, or a wide groove. Thus, there is interdependence betweengroove and the loop that bridges it. In fact, lateral and diagonalloops have a further interdependence: they bridge guanosines ofdifferent GBA.

Loops may progress in a clockwise or anti-clockwise mannerfrom their point of origin (Figs. 2 and 3). This is important to con-sider since the expectation thus far is that all of these topologieswill be right handed. Due to preferences of dihedral angles of thebackbone this has implications in the disposition and length of

A.I. Karsisiotis et al. / Methods 64 (2013) 28–35 31

the loops. Notwithstanding this, currently we assume 26 theoreti-cal looping combinations as experimentally feasible (Fig. 3). Eachtopology is thus defined by a looping combination and its groovewidth combination. It is sufficient to describe the looping combina-tion in order to refer to a particular topology due to the GBA inter-dependency of the loops. Thus, when defining loop progression thefirst loop departs from the 50-strand in a clockwise or anti-clock-wise manner. For example an all-parallel quadruplex topology(1a in Fig. 3) has an anti-clockwise progressing first loop (�p). Itssecond and third loops are propeller, and progress in an anti-clock-wise manner (�p�p). The notation for loop progression in thistopology is thus (�p�p�p). The thrombin binding aptamer (6b inFig. 3) has loop progression (+l+l+l), with all its three lateral loopsprogressing in a clockwise manner. The description of a diagonalloop does not need explicit anti-clockwise or clockwise prefix. Thusthe topology described as 13a in Fig. 3 would be (�ld+p).

5. How to derive the groove width combination from the loopsequence?

Let us take as example the topology described by loop combina-tion (+ld�p) and derive the respective grove width combination.First, ‘‘place’’ the descriptor on the top tetrad of the quadruplexstem. This places the 50 guanosine of the descriptor on the top gua-nosine of the first strand. The representation denotes that this gua-nosine is in the first (the 50) strand. Let us now assume that thisbase is anti; Fig. 4A. The first loop is clockwise lateral (+l), thus atthe end of the loop the base is syn; Fig. 4B. Next we have a diagonalloop; and again a change of GBA from syn to anti; Fig. 4C. Finally,we have a propeller loop. For this loop the groove has to be med-ium- thus no change of GBA is necessary; Fig. 4D. We thus havein Fig. 4D the groove width combination for the loop combination(+ld�p). Assuming the base of the 50 of the descriptor to be syn(Fig. 4E) would give us the same groove width combinations withthe alternative disposition of the GBA; Fig. 4F. Now we can derive

Fig. 4. Step-by-step instructions for deducing the groove width combination oftetrads for the loop combination (+ld-p). The resultant medium (m), wide (w), andnarrow grooves (n) are highlighted in (D) and (F). Dark and light blue highlightingindicates Guanosines of syn and anti GBA disposition, respectively. Yellowhighlighting indicates guanosines for which the GBA disposition is not yet deduced.

groove width combinations from loop combinations. But, what istheir sequence in a quadruplex stem?

6. The sequence of GBA within the quadruplex stem

Quadruplexes have previously been classified as anti-parallel orparallel stranded. This classification results in distinguishing onesingle topology from all others. Anti-parallel quadruplexes can befurther subdivided into those that have at least one propeller loopand those that do not. These divisions are practical due to the nat-ure of the light absorption properties of the relevant quadruplexstem [11]. They have thus been grouped into three groups. InGroup I, we have the all-parallel quadruplexes. In these the confor-mation of the GBA of the strands within the quadruplex stem mayprogress without change: e.g., Ganti-Ganti-Ganti. In Group II, theanti-parallel quadruplex has a propeller loop. In this case there issome variation of the GBA- albeit, without alternation. The se-quence within a strand may thus be Gsyn-Ganti-Ganti, Gsyn-Gsyn-Ganti, Ganti-Gsyn-Gsyn, or Ganti-Ganti-Gsyn. In Group III,the anti-parallel quadruplexes do not have a propeller loop, andhave alternation of the GBA along the stem: Gsyn-Ganti-Gsyn andGanti-Gsyn-Ganti. However, knowing their relative positions doesnot tell us where to start. In Group III the 50 base of the quadruplexstem can be either Gsyn or Ganti, and the first strand of Group IIquadruplexes can be either Gsyn-Ganti-Ganti, or Gsyn-Gsyn-Ganti.We have observed earlier three striking features [12]: (i) for allquadruplexes observed experimentally thus far the base at thetop/end tetrad of the 50 strand has always been anti; (ii) the anti-parallel structures determined thus far that contain a propellerloop (Group II) have Gsyn-Ganti-Ganti in the first strand, and (iii)the anti-parallel structures determined thus far that contain a pro-peller loop (Group II) start with a syn residue at the 50-end of thequadruplex stem. All these have to be investigated.

The points above deserve additional commentary. For the first(i) a concurrent observation is that thus far there have not beenany occurrences for loops starting with a syn guanosine bridginga lateral wide, or a lateral narrow grooves; whilst from the nineoccurrences of diagonal loops, a single one has started with a synguanosine [16]. For the second (ii) the fact that there has not beenany observation of Gsyn-Gsyn-Ganti in the first strand does notpreclude its feasibility. This may be due to the fact that a greaternumber of Ganti would be more stable in the topology. Indeed,all the ones that have been determined thus far have a majorityof Ganti in the stem. For three-stacked quadruplexes, at least eighttopologies can have equal number of Ganti and Gsyn. For thesethere should be other factors influencing the feasibility of the se-quence of GBA for the strands, and thus a Gsyn-Gsyn-Ganti in thefirst stem is possible. For the third (iii); it is not only Group II quad-ruplexes that have a Gsyn in the 50 end of the quadruplex stem- butall anti-parallel structures with one exception. Furthermore, a re-cent free energy analysis of the sequence-structure relationshipof quadruplexes has suggested that the syn-anti step is much morestable than the anti-syn step [17]. It is thus reasonable to assumethat all anti-parallel quadruplexes may indeed start with a synconformation.

The topology that represents an exception is a Group III three-stacked system of loop combination (�ld+l): PDB id 143D (Fig. 5)[16]. Here, although the base at the end of the 50 strand adoptsan anti GBA, it may in principle be Ganti. Indeed, according to Canget al. [18] for this Group III quadruplex the most stable sequencefor the first strand should be Gsyn-Ganti-Gsyn. This expectation isdue to the fact that the free OH of C5 of the 50-end residue of a DNAoligonucleoside sequence has preference to form a hydrogen bondto N3 of purines. The three-stacked experimental topology has anadenosine at its 50-end, preceding the quadruplex stem, which

Fig. 5. Structurally determined unimolecular quadruplex topologies of three loops. In the PDB structure quadruplex backbone is colored in cyan and thickened for emphasis,syn Guanine, anti Guanine, Adenine, Cytosine, Thymine and Inosine residues are colored dark green, red, yellow, green, blue and black; respectively. In the topology schematic,anti Guanines are shaded grey, syn Guanines are shaded black and the 50-end is indicated by a blue sphere.

32 A.I. Karsisiotis et al. / Methods 64 (2013) 28–35

appears to adopt a syn GBA from the intensity of its H8-H1’ NOE inthe sequence specific assignment. Thus the first strand may indeedbe Asyn-Ganti-Gsyn-Ganti. This fact indicates that this preferencedoes not necessarily refer to the quadruplex stem. To prove thishypothesis, it would be important to demonstrate the same foldwith the alternative sequence of GBA throughout the stem of thequadruplex. Such a result would also demonstrate that Gsyn mayprecede lateral loops bridging narrow or wide grooves.

Another recent free energy analysis of the sequence-structurerelationship of quadruplexes has indicated that the quadruplexstem sequence Gsyn-Ganti-Gsyn-Ganti is as stable as Gsyn-Gsyn-Ganti-Ganti [18]. This would suggest that four-stacked anti-parallelquadruplexes with a propeller loop (Group II) could adopt a Gsyn-Gsyn-Ganti-Ganti sequence in its 50 strand in the stem. This is stillto be proven.

Rare are also the propeller loops progressing clockwise- there isa single case from over 40 occurrences [19]. This loop is expectedto be much longer than its counterpart and could thus be out-com-peted by other topologies [18]. The reason for the occurrence men-tioned resides in the fact that it proceeds from a stable diagonalloop on a two-stacked system.

Fig. 6. Depiction of the deduction of the sequence of GBA for the loop combination(�ld + p).

7. How to derive the sequence of tetrad combinations in aquadruplex stem?

We have made some strides in evaluating possibilities for three-stacked systems as described in the previous section [11,12]. How-ever, we have yet to firmly establish the relative importance of twospecific empirical approaches. In the first one can assume that thebase at the top/end tetrad of the 50 strand should be anti. Eventhough, all structures currently determined have this feature- itis not a reason sufficient to utilize this approach exclusively aswe have done previously on Group II quadruplexes. In the secondapproach, we assume that the stem of all anti-parallel quadruplex-es starts with a syn conformation. In this case the structure deter-

mined for (�ld+l), PDB id 143D, would be exception (as discussedabove). For all antiparallel quadruplex structures experimentallydetermined thus far; 20 structures or topologies, both these ap-proaches are valid and we have utilized them previously [12].

By assuming both approaches as valid, let us give an example onhow the derivation of GBA sequence within the quadruplex stemworks. Let us take the three-stacked topology (+ld�p); Fig. 6A.Firstly, it is an anti-parallel quadruplex with propeller loops; thusit is Group II with Gsyn-Ganti-Ganti, Gsyn-Gsyn-Ganti, Ganti-Gsyn-Gsyn, or Ganti-Ganti-Gsyn. By assuming that the base at the top/end tetrad of the 50 strand should be anti and concomitantly,

A.I. Karsisiotis et al. / Methods 64 (2013) 28–35 33

assuming Gsyn-Ganti steps to be favored, the first strand wouldthus be Gsyn-Ganti-Ganti (Fig. 6B). The proceeding lateral loop(+l) would thus result in a change in GBA for the first two residuesfor a three-stacked system. This goes according to the narrowgroove width as well. In a narrow groove width the base-pairsare of different GBA. Thus, the second strand is Gsyn-Gsyn-Ganti(Fig. 6C). Next, we have a diagonal loop with its change of GBA.It results in formation of a medium groove with the first strand.The base-pairs are of the same GBA along the groove. Thus, the se-quence of the third strand is Gsyn-Ganti-Ganti (Fig. 6D). The lastloop is a propeller loop (�p), and in order to establish a mediumgroove the sequence of the fourth strand has to be Gsyn-Ganti-Gan-ti (Fig. 6D).

It is thus fairly simple to establish the sequence of groove widthcombinations within a quadruplex stem. What happens if wewould start with a diagonal loop? This would imply that we donot have grooves to compare. However, the sequence of GBA is en-coded in the first strand. Let us for example take topology (d�pd).For a three-stacked quadruplex the first strand would be Gsyn-Ganti-Ganti or Gsyn-Gsyn-Ganti as discussed in the previous sec-tion. Considering the first case, the change in GBA after a diagonalloop is Gsyn. This means that the following bases have to be Gsyn-Ganti to conform to the primary rule of quadruplex folding: quad-ruplexes have the same groove width combinations. Indeed, if the50 strand top base is anti and the diagonally opposed base is syn-then second base of the second strand has to be also syn in orderfor there to be the same combination of GBA in the tetrads. Corre-spondingly, the third base of the second strand has to be anti.

For anti-parallel quadruplexes this empirical procedure resultsin a quadruplex with more Gsyn than Ganti for the three-stacked(+l + p + p) and (�l�p�p) topologies [12]. This would be a uniquecase and thus deserves immediate verification. The assumption isthat as for most DNA topologies the GBA of its bases should be anti.Indeed, if it were the case, the encoding of tetrad combination onthe 50 strand should not be Gsyn-Ganti-Ganti but Gsyn-Gsyn-Ganti.The latter would result in a majority of bases with Ganti in thequadruplex stem.

For anti-parallel quadruplexes of Group III the interleaving ofGBA makes the choices simpler. However, here there is conflictfor the approaches of (i) assuming that the base at the top/end tet-rad of the 50 strand should be anti, and (ii) the anti-parallel quad-ruplex starts with a syn 50 residue. As stated before, both are truefor the overwhelming majority of the observed structures thusfar. The exception, a three-stacked (�ld+l) topology[16], starts witha Ganti. However, the same topology has been observed for a two-stacked [20,21] as well as a four-stacked system.[22] Proof that therule is still valid should be derived by engineering the three-stacked topology starting with a Gsyn.

8. Implications of the descriptor on topology

From the geometric quadruplex descriptor it is not possible fora first groove to be narrow, or a fourth groove to be wide. Indeed,the topology described by loop progression (�l�l�l) had been pro-posed to be defined by groove width combinations (wnwn) in 2007[10] before its experimental confirmation. [23]

9. Prevalence of loop type combinations

Loop type combinations define their corresponding topologies,with two variants possible for each of the 13 loop type combina-tions, based on the anticlockwise or clockwise progression of thefirst non-diagonal loop. Out of the 26 possible topologies, only 9have been observed experimentally; Figs. 3 and 5. The Group Iall-parallel topology (�p�p�p) has several examples: (a) 1KF1

[24], (b) 1XAV [25], (c) 2LEE [26], (d) 2LBY [27], (e) 2KYP [28], (f)2LPW [29], (g) 2KZE [30] and (h) 2LXQ [31] with examples deter-mined in crowded conditions (2LD8, [32]) and in dimeric formstacked 50–50 (2LE6, [33] and 2LXV, [31]). It is noteworthy that allthese examples are three-stacked quadruplexes. There are cur-rently no two-stacked or four-stacked examples of Group Iquadruplexes.

Second most prevalent overall is the Group II, antiparalleltopology (�l�l�p), characterized by the anticlockwise progres-sion of lateral, lateral and propeller loops with five examples:(a) 2JSL [34], (b) 2JPZ [35], (c) 2KZD [30], (d) 186D [36] and(e) 2F8U [37]. There are four examples of Group II (�p�l�l)topology (a) 2GKU [38], (b) 2HY9 [39] and (c) 2JSM [34]. Twomore examples of Group II unimolecular G-Quadruplexes exist:(a) topology (�pd+l), first reported as a topology in 2009 [12]and with a new structure determined in 2012 (2LOD, [40]) and(b) topology (d+pd) (1I34, [19]). Apart from 1I34 [19] which istwo-stacked, all other Group II determined structures arethree-stacked.

Group III unimolecular quadruplexes are represented by thetopologies (�l�l�l) and (+l+l+l) and (�ld+l) and (+ld�l). A greaternumber of stacking possibilities have been experimentally deter-mined in this group: (a) (�l�l�l) as a two-stacked topology [23],(b) (+l+l+l) as the two-stacked 2KM3 [41] and 148D [42], (c) (�ld+l)as the four-stacked 201D [22], the three-stacked 143D [16] and thetwo-stacked 2KF8 [20] and 2KKA [21], and (d) (+ld�l) as the two-stacked 2KOW [43].

10. Applying the formalism to other architectures

We have been discussing the folding of unimolecular architec-tures with sequences of four G-segments interrupted by threeloops. How does the formalism apply to G-segments interruptedby more than four loops? For these systems the primary rule forquadruplex folding still applies: the stacking of tetrads of the samegroove width combination. For example PDB id 2KPR [44] is de-fined by loop progression (�l�l+p+p) in a three-stacked stem(Fig. 7). The groove width combination would be easily derivedto be (wnwm). However, another feature would have to be addedto describe where the strand is interrupted. There are currentlyvery few examples of unimolecular quadruplexes with more thanthree loops. Eventually a set of rules will have to be derived to ad-dress their description. Another example is PDB id 2O3M [45] withtopology described by (�p�p+p+p), and groove width combination(mmmm); see (Fig. 7). One may expect more examples of propellerloops to interrupt a quadruplex stem, since there is no interdepen-dence between the GBA of the guanosines and propeller loopbridges.

This feature is represented in our next case: in PDB id 2L88 [46]an all-parallel quadruplex has an all syn tetrad at its 50 end; (Fig. 7).It abides to the rule that only tetrads with the same groove widthcombinations may stack to form stable quadruplexes. It furtherdemonstrates that all-parallel quadruplexes may also start with a50 Gsyn as for all observed cases of antiparallel quadruplexes- withexception of PDB id 143D. Indeed, this issue will deserve particularattention in the future since there is a clear bias: with exception of2L88 all Group I structures experimentally verified have residuespreceding the 50 end of the quadruplex stem.

Multimeric quadruplexes also abide by the geometric formal-ism rules when it relates to the formation of grooves, and thechange in GBA for diagonal and lateral loops. However, what isthe descriptor in this case? The crucial feature is that of choice ofa single point of reference to which other strands may be refer-enced from. For example for bistranded quadruplexes the choiceof strands that abide by the current descriptor in terms of two

Fig. 7. Structures of interrupted and other unusual unimolecular G-quadruplexes. The quadruplex backbone is colored in cyan and thickened for emphasis, Gsyn, Ganti,Adenine, Cytosine, Thymine and Inosine residues are colored dark green, red, yellow, green, blue and black respectively. In the structure schematic, anti Guanines are shadedgrey, syn Guanines are shaded black and the 50-end is indicated by a blue sphere.

Fig. 8. Schematics for G-wire folds. (A1 and A2) Folding models proposed by Marsh and coworkers for d(G4T2G4) and d(G4T2G3). (A3) An anternative model for folding ofd(G4T2G4) and d(G4T2G3). (B) Model for folding of Gn long molecular wires. (C) Model for folding frayed wires.

34 A.I. Karsisiotis et al. / Methods 64 (2013) 28–35

features: the loop departing from a chosen 50 strand can onlybridge a wide groove if it progresses anti-clockwise, and narrowif it progresses clockwise. This means that the choice of first strandcannot result in a narrow first groove, or a wide fourth.

Quadruplexes can multimerize to form one-dimensional struc-tures known as G-wires. G-wires are supramolecular structuresof DNA consisting of stacked guanine tetrads of G-quadruplexes.The first reported G-wires were produced by Marsh and co-work-ers with the sequence d(G4T2G4) [47]. Their proposed structuralmodel involved the formation of tetrads consisting of the first G-segments of a pair of oligonucleotides with the second G-segmentsof another pair of oligonucleotides (Fig. 8A1). Initial analysis of theG-wire formation, suggested the absence of one of the guanines inthe quadruplex stem caused termination of the growth of the G-wire (Fig. 8A2). However, various alternative models can be de-scribed involving propeller loops. For example, alternative modelscan be proposed whereby two, three, or all sequential guanosinesof the G-segments of each oligonucleotide are connected by a pro-peller loop forming a medium groove (Fig. 8A3). In this case, twostrands fold to form a discrete parallel quadruplex with all thestrands in parallel orientation. When all guanosines of the G-seg-

ments are part of the quadruplex stem, multimerization resultsfrom stacking of bimolecular folds as in Fig. 8A3. If one or two ofthe 50 end and 30 end guanosines are not part of the quadruplexstem the same folding mechanism is still feasible. Alternatively,they may join into anti-parallel stranded tetrads. The quadruplexstem would thus not be continuous throughtout. This model alsofits with the termination of G-wire growth by the presence of thed(G4T2G3) sequence (Fig. 8A4).

Kotlyar and co-workers synthesized long monomolecular G-wires [48] formed through the intramolecular folding of a singlestrand of DNA. Initially it was assumed that the strand would formthree loops resulting in a long anti-parallel quadruplex of Group III.However, the lengths of the G-wires formed were shorter than ex-pected for this model. The reported expected poly(G) strand to G-wire ratio was 4:1 while the observed ration was approximately5:1. Although the precise structure of the folds in the producedwires has not been determined, the difference in expected and ob-served length may be explained using propeller loops. Consideringa G-segment length of 3 bases and a loop length of 1 base, a regularincrease in length of 3 tetrad separations would be achieved every15 bases (4 G-segments and 3 loops) (Fig. 8B). This is 5 times the

A.I. Karsisiotis et al. / Methods 64 (2013) 28–35 35

number of bases per 3 base separations of a single linear DNAstrand. This model achieves a good fit for the proposed intramolec-ular folding of the G-wire.

Another of the quadruplex higher order architectures are frayedwires. The initial investigation used the sequence d(A15 G15) [49]with which, gel analysis identified the occurrence of polymeriza-tion in steps of monomer addition. Enzymatic digestion analysisof the structures revealed that the G-segments had been protectedthus indicating they have formed a quadruplex structure. Based onthe reported results, it has been suggested that oligonucleotidemonomers initially assemble to form a four stranded quadruplexwith one longer strand (Fig. 8C). This structure would result in atriplex hangover to which an additional monomer could attach,thus forming additional tetrads and another triplex hangover forfurther growth. This hypothesis has yet to be verified.

Qudruplexes can also form within the context of other architec-tural forms. Sugiyama and co-workers [50], demonstrated the rec-ognition of DNA by the anti-parallel self-assembly of strands into aGroup III structure. This recognition took place within the contextof an engineered origami structure.

11. Perspectives

The formulation of a theory to describe the point of view ofobservations is necessary to consolidate understanding. It is thepremise for prediction, a means to question current understanding,as well as develop further understanding. The geometric formalismof quadruplex folding utilizes the relationship between the guan-ine base and its sugar pucker to derive the folding topology ofquadruplexes. It utilizes structural interdependencies that allowfor the derivation of topological characteristics of a quadruplex.It is thus a useful starting point for prediction, as well as engineer-ing of quadruplex folds from DNA sequence. Since publication of itsfirst version in 2007, one structural fold was purposefully engi-neered [12] and later experimentally verified [40]; and anotherfold was experimentally verified [23]. Notwithstanding this, theverification of various structural characteristics has been possiblesince. Thus, we have since derived a better understanding of the se-quence of GBA within a quadruplex stem. However, many ques-tions still need to be answered, and others need to be posedbefore a version of this formalism can be generally utilized forthe engineering of a desired topology from a single DNA strand.We are still far from an understanding of the full magnitude ofthe problem represented by the prediction of topology from DNAsequence. However, a good start would the careful parameteriza-tion of the environment that dictates folding conditions.

Acknowledgments

The authors acknowledge financial support by the BBSRC [BB/H005692], COST Action MP0802 for providing a forum for discus-sions on the issue, and EAST NMR FP7 (Contract 261863) for gen-erous access to facilities.

References

[1] R.E. Dickerson, H.R. Drew, B.N. Conner, R.M. Wing, A.V. Fratini, M.L. Kopka,Science 216 (1982) 475–485.

[2] H.R. Drew, S. Samson, R.E. Dickerson, Proceedings of the National Academy ofSciences of the United States of America-Biological Sciences 79 (1982) 4040–4044.

[3] D.J. Patel, L.L. Canuel, F.M. Pohl, Proceedings of the National Academy ofSciences of the United States of America 76 (1979) 2508–2511.

[4] I. Radhakrishnan, D.J. Patel, Structure 2 (1994) 17–32.

[5] I. Radhakrishnan, D.J. Patel, Structure 1 (1993) 135–152.[6] A.E. Engelhart, J. Plavec, O. Persil, N.V. Hud, Chapter 4 metal ion interactions

with G-quadruplex structures, in: S. Neidle, N.V. Hud (Eds.), Nucleic Acid-Metal Ion Interactions, The Royal Society of Chemistry, 2009, pp 118–153.

[7] F.M. Lannan, I. Mamajanov, N.V. Hud, Journal of the American Chemical Society134 (2012) 15324–15330.

[8] D. Miyoshi, H. Karimata, N. Sugimoto, Journal of the American ChemicalSociety 128 (2006) 7957–7963.

[9] D. Miyoshi, N. Sugimoto, Biochimie 90 (2008) 1040–1051.[10] M. Webba da Silva, Chemistry European Journal 13 (2007) 9738–9745.[11] A.I. Karsisiotis, N.M.a. Hessari, E. Novellino, G.P. Spada, A. Randazzo, M.W. da

Silva, Angewandte Chemie-International Edition 50 (2011) 10645–10648.[12] M. Webba da Silva, M. Trajkovski, Y. Sannohe, N. Ma’ani Hessari, H. Sugiyama,

J. Plavec, Angewandte Chemie-International Edition in English 48 (2009)9167–9170.

[13] J.X. Dai, M. Carver, D.Z. Yang, Biochimie 90 (2008) 1172–1183.[14] A.T. Phan, The FEBS Journal 277 (2010) 1107–1117.[15] A. Guédin, J. Gros, P. Alberti, J.L. Mergny, Nucleic Acids Research 38 (2010)

7858–7868.[16] Y. Wang, D.J. Patel, Structure 1 (1993) 263–282.[17] X. Cang, J. Sponer, T.E. Cheatham 3rd, Nucleic Acids Research 39 (2011) 4499–

4512.[18] X. Cang, J. Sponer, T.E. Cheatham 3rd, Journal of the American Chemical Society

133 (2011) 14270–14279.[19] V. Kuryavyi, A. Majumdar, A. Shallop, N. Chernichenko, E. Skripkin, R. Jones, D.J.

Patel, Journal of Molecular Biology 310 (2001) 181–194.[20] K.W. Lim, S. Amrane, S. Bouaziz, W.X. Xu, Y.G. Mu, D.J. Patel, K.N. Luu, A.T.

Phan, Journal of the American Chemical Society 131 (2009) 4301–4309.[21] Z. Zhang, J. Dai, E. Veliath, R.A. Jones, D. Yang, Nucleic Acids Research 38 (2010)

1009–1021.[22] Y. Wang, D.J. Patel, Journal of Molecular Biology 251 (1995) 76–94.[23] S. Amrane, R.W. Ang, Z.M. Tan, C. Li, J.K. Lim, J.M. Lim, K.W. Lim, A.T. Phan,

Nucleic Acids Research 37 (2009) 931–938.[24] G.N. Parkinson, M.P. Lee, S. Neidle, Nature 417 (2002) 876–880.[25] A. Ambrus, D. Chen, J.X. Dai, R.A. Jones, D.Z. Yang, Biochemistry 44 (2005)

2048–2058.[26] M. Trajkovski, M. Webba da Silva, J. Plavec, Journal of the American Chemical

Society 134 (2012) 4132–4141.[27] R.I. Mathad, E. Hatzakis, J. Dai, D. Yang, Nucleic Acids Research 39 (2011)

9023–9033.[28] V. Kuryavyi, A.T. Phan, D.J. Patel, Nucleic Acids Research 38 (2010) 6757–6773.[29] S. Amrane, M. Adrian, B. Heddi, A. Serero, A. Nicolas, J.L. Mergny, A.T. Phan,

Journal of the American Chemical Society 134 (2012) 5807–5816.[30] K.W. Lim, L. Lacroix, D.J. Yue, J.K. Lim, J.M. Lim, A.T. Phan, Journal of the

American Chemical Society 132 (2010) 12331–12342.[31] V. Kuryavyi, L.A. Cahoon, H.S. Seifert, D.J. Patel, Structure 20 (2012) 2090–

2102.[32] B. Heddi, A.T. Phan, Journal of the American Chemical Society 133 (2011)

9824–9833.[33] N.Q. Do, K.W. Lim, M.H. Teo, B. Heddi, A.T. Phan, Nucleic Acids Research 39

(2011) 9448–9457.[34] A.T. Phan, V. Kuryavyi, K.N. Luu, D.J. Patel, Nucleic Acids Research 35 (2007)

6517–6525.[35] J.X. Dai, M. Carver, C. Punchihewa, R.A. Jones, D.Z. Yang, Nucleic Acids Research

35 (2007) 4927–4940.[36] Y. Wang, D.J. Patel, Structure 2 (1994) 1141–1156.[37] J.X. Dai, D. Chen, R.A. Jones, L.H. Hurley, D.Z. Yang, Nucleic Acids Research 34

(2006) 5133–5144.[38] K.N. Luu, A.T. Phan, V. Kuryavyi, L. Lacroix, D.J. Patel, Journal of the American

Chemical Society 128 (2006) 9963–9970.[39] J.X. Dai, C. Punchihewa, A. Ambrus, D. Chen, R.A. Jones, D.Z. Yang, Nucleic Acids

Research 35 (2007) 2440–2450.[40] M. Marusic, P. Sket, L. Bauer, V. Viglasky, J. Plavec, Nucleic Acids Research 40

(2012) 6946–6956.[41] K.W. Lim, P. Alberti, A. Guédin, L. Lacroix, J.F. Riou, N.J. Royle, J.L. Mergny, A.T.

Phan, Nucleic Acids Research 37 (2009) 6239–6248.[42] P. Schultze, R.F. Macaya, J. Feigon, Journal of Molecular Biology 235 (1994)

1532–1547.[43] L.Y. Hu, K.W. Lim, S. Bouaziz, A.T. Phan, Journal of the American Chemical

Society 131 (2009) 16824–16831.[44] V. Kuryavyi, D.J. Patel, Structure 18 (2010) 73–82.[45] A.T. Phan, V. Kuryavyi, S. Burge, S. Neidle, D.J. Patel, Journal of the American

Chemical Society 129 (2007) 4386.[46] X. Tong, W. Lan, X. Zhang, H. Wu, M. Liu, C. Cao, Nucleic Acids Research 39

(2011) 6753–6763.[47] T.C. Marsh, E. Henderson, Biochemistry 33 (1994) 10718–10724.[48] A.B. Kotlyar, N. Borovok, T. Molotsky, H. Cohen, E. Shapir, D. Porath, Advanced

Materials 17 (2005) 1901.[49] E. Protozanova, R.B. Macgregor, Biochemistry 35 (1996) 16638–16645.[50] Y. Sannohe, M. Endo, Y. Katsuda, K. Hidaka, H. Sugiyama, Journal of the

American Chemical Society 132 (2010) 16311–16313.