the matp/mats site-specific system organizes the terminus

17
Cell, Volume 135 1 The MatP/matS Site-Specific System Organizes the Terminus Region of the E. coli Chromosome into a Macrodomain Romain Mercier, Marie-Agnès Petit, Sophie Schbath, Stéphane Robin, Meriem El Karoui, Frédéric Boccard, and Olivier Espéli Supplemental Experimental Procedures Detection of over-represented sub-motifs of parS in the origin proximal domain of the B. subtilis genome. The parS motif is recognized by the Spo0J protein, a chromosome partitioning and sporulation protein (Lewis and Errington, 1997, Lin and Grossman, 1998). It is 16 nt long and partially degenerate (Lin and Grossman, 1998). This pseudo-palindrome is active in both orientations and present 10 times in the Bacillus subtilis genome, eight of the occurrences being concentrated around the replication origin (Breier and Grossman, 2007). We asked whether such a motif could be detected using statistical analysis, namely whether some of its sub-words would appear among the most over-represented of all words of the same length, in the origin-proximal domain of B. subtilis chromosome. This domain was defined as the 821 kb segment spanning eight parS motifs and including the replication origin (coordinate 3800000 to 4214814, and 1 to 400000 nt, according to Genbank file numbering). Seven ribosomal operons are present in this region, 5 kb long, with nearly identical sequences. They generate the most highly exceptional words (up to 2000 such words, for 11-mers), so that all of them but one (rrnO) were masked. Determining the model relevant to test for exceptionality To evaluate the exceptionality of the parS motif or its sub-motifs (called hereafter words), the R’MES software (http://genome.jouy.inra.fr/ssb/rmes/ ), which is designed for finding short words with unexpected frequencies in genomes, was used. Expected frequencies are calculated using Markov chain models. In a Markov chain of order m (model Mm), the composition of the sequence in 1-mers to (m+1)-mers is taken into account. A p-value assessing the significance of the difference between expected and observed word frequencies is calculated using either a Gaussian approximation (for frequent words) or a compound Poisson approximation (for rare words) of the word count distribution. For convenience, p- values are converted into scores that correspond to the quantile of the Gaussian distribution (for instance, a score of 3.09 (resp. 4.75) corresponds to a p-value of 10 -3 (resp. 10 -6 )). The exceptionality of a word depends on the model used to calculate expected word counts, i. e. here on the order of the Markovian model, as the higher the order m, the fewer exceptional words. We determined, therefore, what model was relevant for this study. It was not possible to analyse 16-mers because of memory limitation so we used the maximum possible word length: 11 pb. The scores of all parS sub-words of length 11 (together with their reverse- complementary sequence) were computed, in all possible models M0 to M9. Given the low frequency of 11-mers in a DNA sequence of length 800 kb, the compound Poisson approximation was preferred to evaluate the exceptionality of these word families (Roquain and Schbath, 2007). Figure S8 clearly shows that, for the 6 sub-words of length 11 of parS,

Upload: buikhanh

Post on 12-Feb-2017

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

1

The MatP/matS Site-Specific System

Organizes the Terminus Region of the

E. coli Chromosome into a Macrodomain Romain Mercier, Marie-Agnès Petit, Sophie Schbath, Stéphane Robin, Meriem El Karoui, Frédéric Boccard, and Olivier Espéli Supplemental Experimental Procedures Detection of over-represented sub-motifs of parS in the origin proximal domain of the B. subtilis genome. The parS motif is recognized by the Spo0J protein, a chromosome partitioning and sporulation protein (Lewis and Errington, 1997, Lin and Grossman, 1998). It is 16 nt long and partially degenerate (Lin and Grossman, 1998). This pseudo-palindrome is active in both orientations and present 10 times in the Bacillus subtilis genome, eight of the occurrences being concentrated around the replication origin (Breier and Grossman, 2007). We asked whether such a motif could be detected using statistical analysis, namely whether some of its sub-words would appear among the most over-represented of all words of the same length, in the origin-proximal domain of B. subtilis chromosome. This domain was defined as the 821 kb segment spanning eight parS motifs and including the replication origin (coordinate 3800000 to 4214814, and 1 to 400000 nt, according to Genbank file numbering). Seven ribosomal operons are present in this region, 5 kb long, with nearly identical sequences. They generate the most highly exceptional words (up to 2000 such words, for 11-mers), so that all of them but one (rrnO) were masked. Determining the model relevant to test for exceptionality To evaluate the exceptionality of the parS motif or its sub-motifs (called hereafter words), the R’MES software (http://genome.jouy.inra.fr/ssb/rmes/), which is designed for finding short words with unexpected frequencies in genomes, was used. Expected frequencies are calculated using Markov chain models. In a Markov chain of order m (model Mm), the composition of the sequence in 1-mers to (m+1)-mers is taken into account. A p-value assessing the significance of the difference between expected and observed word frequencies is calculated using either a Gaussian approximation (for frequent words) or a compound Poisson approximation (for rare words) of the word count distribution. For convenience, p-values are converted into scores that correspond to the quantile of the Gaussian distribution (for instance, a score of 3.09 (resp. 4.75) corresponds to a p-value of 10-3 (resp. 10-6)). The exceptionality of a word depends on the model used to calculate expected word counts, i. e. here on the order of the Markovian model, as the higher the order m, the fewer exceptional words. We determined, therefore, what model was relevant for this study. It was not possible to analyse 16-mers because of memory limitation so we used the maximum possible word length: 11 pb. The scores of all parS sub-words of length 11 (together with their reverse-complementary sequence) were computed, in all possible models M0 to M9. Given the low frequency of 11-mers in a DNA sequence of length 800 kb, the compound Poisson approximation was preferred to evaluate the exceptionality of these word families (Roquain and Schbath, 2007). Figure S8 clearly shows that, for the 6 sub-words of length 11 of parS,

Page 2: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

2

the scores reach a plateau for models M3 to M6, and drastically decrease for models of order 7 and higher. The decrease is a general trend for other words as well (not shown here). It indicates that these models are probably too rich: they capture almost all composition bias of the genome, so that finally no word gets a significantly high score suggesting that model of order lower than 7 should be used. The comparatively lower scores for models M0 to M2 are probably due to the fact that these models are too poor to properly capture composition bias given the size of the domain analysed (800 kb). We conclude that models M3 to M6 were more appropriate to detect exceptional words in the domain. Determining the minimum word size We then proceeded to identify which minimum word size would allow the detection of parS sub-words in the “top list” of the most exceptionally frequent words. Exceptionality score for all words of a given length were computed and words were sorted so that rank 1 is given to the word with the highest score of exceptionality. Table S7 gives the ranks of the sub-words of length 9, 10 and 11 of parS, for models M3 to M6, with the compound Poisson approximation. A minimum size of 10 nt is needed to place at least one parS sub-motif in the top 100 most over-represented words. Interestingly, the most exceptionally frequent 11-mer (among all 2 097 152 possible 11-mers) is one of parS sub-motif for models M3 to M5. 9-mers, 10-mers and 11-mers containing one degenerate position were also analysed, and the parS sub-words did not get higher ranks than those obtained with exact words (not shown). We conclude that, using Markov models of order M3 to M5, analysis of 11 pb words without degeneracy allows the identification of a sub-word of parS as the most exceptionally frequent 11-mer in the 800 kb B. subtilis domain around the origin. This prompted us to predict Ter MD signature motifs by analysing 11pb words exceptionality under a Markovian model of order 5. Identification of matP gene We constructed a pSC101 derivative used as a bait to identify out proteins that bind matS. The system uses a low-copy number plasmid that carries an artificial rpsL operon encoding a streptomycin-sensitive S12 ribosomal protein. Upon introduction of this plasmid in an E. coli strain carrying the rpsL20 allele conferring streptomycin resistance, the host becomes streptomycin sensitive; the level of resistance is correlated to the expression level of the wt allele present on the plasmid (Kim et al., 2005). If a protein binds matS, it may cause repression of the wt rpsL gene and further increase the level of streptomycin resistance. Plasmid pRM1 contained a matS site between the pANT promoter and rpsL (Figure 2A) and the level of streptomycin resistance was increased by two fold. A genomic library (Levine and Marians, 1998) was transformed into MC1061 carrying pRM1 and transformants that displayed a 5 fold streptomycin resistance increase were selected. The transformed cells were grown in 1ml of LB medium for 2 h at 37°C and plated on LB agar plates containing Cm, Kan, and streptomycin 100µg/ml. Plates were incubated at 30°C for 72 h. Plasmid DNA was extracted from transformants and sequenced. Purification of MatP MatP was produced from E. coli strain BL21 (λ DE3), carrying plasmid pLysS, with matP under the control of the T7 promoter in pET28a. Cells were resuspended in a buffer containing 20mM Tris (pH 7.5), 300mM NaCl, 5mM MgCl2, 1mM DTT, 5% glycerol. They were subsequently lysed and ultracentrifuged at 45000g. The protein extract was loaded onto a heparin column (GE healthcare) and eluted with a liner gradient of NaCl (300mM-1M) in a buffer containing 20mM Tris (pH 7.5), 5mM MgCl2, 1mM DTT, 5% glycerol. Fractions containing MatP were pooled and dialysed into 20mM phosphate buffer (pH 7.5), 200mM

Page 3: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

3

NaCl, 10mM MgCl2, 1mM DTT. Following dialysis, MatP was loaded on an SP sepharose column (GE healthcare) and eluted with a liner gradient of NaCl (200mM-1M) in 20mM phosphate buffer (pH 7.5), 10mM MgCl2, 1mM DTT. Fractions containing MatP were pooled and dialyzed into buffer 20mM Tris (pH 7.5), 250mM NaCl, 1mM DTT, 5mM MgCl2, 20% Glycerol and stored at –80°C. The protein concentration was determined using the Bio-Rad protein assay kit. MatP was ~90% pure as judged by SDS polyacrylamide gel electrophoresis and staining with Coomassie blue. DNA binding assays A 41 bp fragment containing the matS19 site was used as probe in gel mobility shift assays. The oligonucleotide 5’-AAGTACGGTAAAAGGTGACAGTGTCACTTTCATTGTTGGTA-3’ was radio-labeled using polynucleotidekinase and γ-32P-ATP. Labeled oligonucleotide was annealed with its complement in 70mM KCl. Non specific competitor DNA corresponds to a shuffled sequence of the matS19 carrying oligonucleotide. Binding reactions (20µl) were performed in 20mM Tris (pH 7.5), 250mM NaCl, 1mM DTT, 5mM MgCl2, 5mM CaCl2, 5% glycerol, 250 µg/ml BSA. They contained 12.5 fmol of radio-labeled DNA and 12.5 pmol of 41 bp double-stranded competitor DNA. Reactions were incubated for 20 mn at 28°C and loaded onto a 8% polyacrylamide (29:1) gel in 0.5X TBE. Gels were run at 4°C at 120V, dried, and exposed to a phosphoimager screen (Molecular Dynamics). Bands were quantified using ImageQuantTM. Competition experiments were performed with 40 bp double-stranded oligonucleotides corresponding to matS or to mutated matS sites. Supplemental References Breier, A.M. and Grossman, A.D. (2007) Whole-genome analysis of the chromosome

partitioning and sporulation protein Spo0J (ParB) reveals spreading and origin-distal sites on the Bacillus subtilis chromosome. Mol. Microbiol. 64, 703-718.

Kim, M. S., Bae, S. H., Yun, S. H., Lee, H. J., Ji, S. C., Lee, J. H., Srivastava, P., Lee, S. H., Chae, H., Lee, Y., et al. (2005). Cnu, a novel oriC-binding protein of Escherichia coli. J. Bacteriol. 187, 6998-7008.

Levine, C., and Marians, K. J. (1998). Identification of dnaX as a high-copy suppressor of the conditional lethal and partition phenotypes of the parE10 allele. J. Bacteriol. 180, 1232-1240.

Lewis, P.J. and Errington, J. (1997) Direct evidence for active segregation of oriC regions of the Bacillus subtilis chromosome and co-localization with the Spo0J partitioning protein. Mol Microbiol, 25, 945-954.

Lin, D.C. and Grossman, A.D. (1998) Identification and characterization of a bacterial chromosome partitioning site. Cell, 92, 675-685.

Roquain, E. and Schbath, S. (2007) Improved compound Poisson approximation for the number of occurrences of multiple words in a stationary Markov chain. Adv. Appl. Prob. 39 1-13.

Page 4: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

4

Figure S1. Competition assay of the matS-MatP complex (A) Electro-Mobility Shift Assay of a radiolabelled double-stranded oligonucleotide containing matS was performed with 50 nM of MatP and increasing concentrations of competitor DNA (matS site, left panel; pseudo-matS L1 site, right panel). Concentrations of competitor DNA were: 0 nM (lane 2), 0.5 nM (lanes 3 and 9), 1 nM (lanes 4 and 10), 10 nM (lanes 5 and 11), 100 nM (lanes 6 and 12), 1 µM (lanes 7 and 13), 10 µM (lanes 8 and 14). Lane 1 is a control containing only the matS double-stranded oligonucleotide. (B) Quantification of competition assays performed in the presence of unlabeled double-stranded oligonucleotide containing matS (black line), unlabeled double-stranded oligonucleotide containing the pseudo-matS L1 site present in the Left macrodomain (red line), and unlabeled double-stranded oligonucleotide containing the 2 changes present in the pseudo-matS L1 and pseudo-matS L2 sites (green line).

Page 5: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

5

Figure S2. Control of the replication initiation is not affected in a matP strain (A) Flow cytometry analysis of the DNA content versus the cell size of wt and matP cells grown to stationary phase in minimal medium. B) Flow cytometry analysis of the DNA content versus the cell size of a wt and a matP strain grown until O.D.600 = 0.2 in rich medium. C) As in B for growth in minimal medium. D) Rifampicin-cephalexin run out experiment of the wt and matP strains grown to O.D.600 = 0.2 in minimal medium (left panel) or rich medium (right panel).

Page 6: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

6

Figure S3. Ssb-YFP fusion allows measurement of the C period during time lapse experiments An ssb-yfp fusion (Reyes-Lamothe et al., 2008) was transduced from AB1157 into the chromosome of MG1655 (A) and matP::cmR (B) strains. Ssb-YFP foci were observed in the conditions used for the timelapses. C) The number of foci was monitored in both strains. D) The C period is deduced from the number of foci according to the generation time (τ). The timing of replication initiation (R.i) is determined by flow cytometry (Figure S2).

Page 7: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

7

Figure S4. MatP segregation in MG1655 and AB1157 strains (A) Representative cells of the MG1655 matP-gfp strain in exponential growth phase (O.D.600 = 0.2) in minimal medium A with casamino acids and glucose. MatP-GFP is not evenly distributed but appears as foci. Scale Bar is 2 μm. B) Localization versus cell length of MatP-GFP foci in MG1655 in minimal medium A with casamino acids and glucose. The positions of the foci from the nearest pole were plotted versus the cell length (grey in cells with one focus; red and green in cells with two foci). C) Localization versus cell length of MatP-GFP foci in AB1157 in minimal medium A with casamino acids and glucose. D) Localization versus cell length of MatP-GFP foci in MG1655 in M9 minimal medium with glycerol. E) Localization versus cell length of MatP-GFP foci in AB1157 in M9 minimal medium with glycerol.

Page 8: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

8

Figure S5. MatP is colocalized with the Ter macrodomain (A) Colocalization of the MatP-mCherry (red) with two parS tags, localized 100 kb apart in the Ter macrodomain, marked with ParBT1-YFP (blue) and ParBP1-CFP (green). B) Colocalization of the MatP-mCherry with two parS tags localized 400 kb apart in the Ter macrodomain marked with ParBT1-YFP (blue) and ParBP1-CFP (green). C) Colocalization of the MatP-mCherry with two parS tags localized 1270 kb apart in the Right macrodomain marked with ParBT1-YFP (blue) and the Left macrodomain marked with ParBP1-CFP (green). Scale bar is 2μm.

Figure S6. MatP inactivation induces a loss of the cohesion of a lacO marker in the Ter macrodomain MG1655 lacO:ter1 marker labeled with LacI-CFP (green). B) MG1655 matP lacO:ter1 marker labeled with LacI-CFP (green). The percentage of cells with 1 and 2 foci are indicated. 200 cells were counted. Cells were grown until O.D.600 = 0.2 in minimal medium

Page 9: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

9

supplemented with casamino acids (0.1%), glucose (0.4%), arabinose (0.1%), IPTG (1 mM ) and anhydrotetracycline (100 ng/ml).

Figure S7. Localization of the parS sites on the E. coli MG1655 chromosome map. The coordinates of the macrodomain’s borders are indicated in kb from oriC and in minutes from the thr operon in the clockwise direction.

Page 10: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

10

Figure S8. Determination of the relevant Markovian model for the analysis of large regions of bacterial chromosomes. Scores of parS sub-word as a function of the Markov chain model.

Page 11: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

11

Table S1. Identification of 11 mers with biased localization in the Ter macrodomain. word Number of

occurrences in the Ter MD

Number of occurrences outside the Ter MD

Expected occurrences in the Ter MD

Expected occurrences outside the Ter MD

R’MES score in the Ter MD

R’MES score outside the Ter MD

Contrast Score p-value of a Binomial test

rank R’MES

rank Contrast

gacactgtcac* 7 0 0,206 0,428 5,8401 0,3901 0,00038234 1 7 tgacactgtca* 7 2 0,278 0,529 5,4928 1,2863 0,01014399 2 40 gacagtgtcac* 6 0 0,203 0,433 5,2374 0,3806 0,00105739 3 18 gacgttgtcac* 7 3 0,347 1,304 5,2224 1,0634 0,00118453 4 19 gacaacgtcac* 7 3 0,366 1,494 5,1538 0,8787 0,00077925 5 13 gacccgaacga 5 1 0,119 0,473 5,0918 0,3135 0,00163929 6 23 atagggtagat 4 1 0,056 0,263 4,9413 0,734 0,00408165 7 30 tagttacaaca 5 1 0,163 0,54 4,7889 0,2089 0,0032439 8 29 ataaacggccc 6 3 0,311 1,683 4,7611 0,7114 0,00078735 9 14 tgacaacgtca* 7 5 0,511 1,786 4,7238 1,8075 0,00730525 10 34

Among the most exceptional and contrasted words, six were overlapping and presented a common decanucleotide GACRNYGTCA. We noticed that an extra 5’ GT was frequently found upstream of items of the words 1, 3, 4 and 5 and that the 3’C was also found downstream of items of the words 2 and 6 allowing the identification of a 13-mer consensus sequence presented on Figure 1B * matS family

Page 12: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

12

Table S2. matS positions on the E. coli chromosome

Name Position from thrL

(pb)

Gene or intergenic region containing

matS Pattern sequence

matS1 1135761 flgI aaaacgtcgctgcggtaatggtgacagcgtcacttcctccgtttggacgtcag

matS2 1155522 holB tgacctggctttcacgcgaagtgacaatgtcacaggatgcattacttgccgca

matS3 1261216 ychB-prs gatgcaacgtctggaacaaggtgacgttgtcaccgaaactcagcttgcccggc

matS4 1320765 trpE tgccggaaagtgcctggattgtgacagtgtcacctaaagctgtaatgcgcagc

matS5 1340371 pyrF gcgacctggtcgatcttggcatgacactgtcacctgcagattatgcagaacgt

matS6 1399845 abgB ttgttgttgttattttaaaggtgacggtgtcacgtttttcgggatagggcagt

matS7 1449555 tynA-maoC atgtacacatcatgcataatgtgacaacgtcacaaaacttagtgaaataaaag

matS8 1458009 paaH aatgaaagtagaaaagaaaagtgacggtgtcacggaaattgacgatgttttat

matS9 1498133 ydcK aaactaatgagtcatgaatggtgacgctgtcacttatatatgcaccctggctg

matS10 1503539 ydcO gtttaaagacaacatcagttgtgacaacgtcaccttgcgcgatgacgatcacg

matS11 1512978 ydcV gtagtgatattcttgatgccgtgacactgtcacttaaagtggcggcgctggcg

matS12 1534831 narW cttcgctgtttacttgttttgtgacactgtcacttgaaagggagcttcccgcc

matS13 1561306 yddT-yddU gctgaaatcacagtatttaagtgacagtgtcacgttaaatgaaaacccgcgag

matS14 1598496 ydeW cgttatgtattttgatattcgtgacaacgtcaccttttgcatcaaaaaagtag

matS15 1617990 marA-marB tacaacagctagttgaaaacgtgacaacgtcactgaggcaatcatgaaaccac

matS16 1628348 ydfI caacggcttgtgcgtaggaagtgacaacgtcacgcataacatgaccgttttct

matS17 1663199 ynfI-ynfJ gcgcaatccggcaataataggttacagtgtcacgtttttttatctcttaaagc

matS18 1702026 ydgJ cgcatccagctctcgcgcttgtgacagtgtcacggtaaagggtttatcaacga

matS19 1743175 ydhQ tgatgttaccaacaatgaaagtgacactgtcaccttttaccgtactgccgtct

matS20 1763237 ydiH-ydiI atatgcgtcacacttttctggtgacaacgtcacaaaatggcggtcgtcaatcg

matS21 1836850 ynjD taactttacggtggataaaggtgacattgtcacgttaatggggccgtctggct

matS22 1849223 ansA atctcggtaaaccggtcattgtgacagggtcacaaatcccgctggctgagtta

matS23 1914888 yebS cgttggtgattttaacgacggtgacgttgtcacatcttaatgtcgaagaactgpseudo matS L1 2352367 glpB gcaaaaacacggcctgcgctgtgccattgtcactcgtggtcaaagcgcactgcpseudo matS L2 2522148 xapA agccgccagcgtttgcgcatgtgacaatttcacatcgcttaaaccttccgcca

Page 13: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

13

Table S3. Strains used in this study Strain Revelant genotype/Description reference/source DY330 W3110 ΔlacU169 gal490 [ ]cI857 Δ(cro-bioA) Yu et al., 2000 BL21(DE3)(pLysS) F−ompT hsdSB (rB

− mB−) gal dcm (λDE3) pLysS Novagen

DH5α F−endA1 hsdR17 supE44 thi-1 recA1 gyrA relA1 ΔlacU169 80lacZΔM15 Lab collection MC1061

hsdR2 hsdM+ hsdS+ araD139 Δ(ara-leu)7697D(lac)X74 galE15 galK16 rpsL (Strr) mcrA mcrB1 Lab collection

MG1655 F- lambda- ilvG- rfb-50 rph-1 Lab collection FBG150 MG1655 ΔlacIZ ΔattB::aadA valens et al., 2004 RM1 MG1655 ΔmatP::frt-cat-frt this work RM2 MG1655 ΔmatP this work RM3 MG1655 matP-gfp::frt-kn-frt this work RM4 MG1655 matP-mcherry::frt-kn-frt this work RM5 MG1655 osmB::parS(P1)-frt-cat-frt (Ter2) feaR::parS(pMT1) (Ter4) this work RM6 MG1655 yniA::parS(P1)-frt-cat-frt (Ter7) feaR::parS(pMT1) (Ter4) this work RM7 RM2 osmB::parS(P1)-frt-cat-frt (Ter2) feaR::parS(pMT1) (Ter4) this work RM8 RM2 yniA::parS(P1)-frt-cat-frt (Ter7) feaR::parS(pMT1)(Ter4) this work RM9

MG1655 gusC::parS(pMT1) (Ter6) yedL::parS(P1)-frt-cat-frt (Left3) yecT::matS24 yedS::matS25::frt-Kn-frt

this work

RM10 RM4 osmB::parS(P1)-frt-cat-frt (Ter2) feaR::parS(pMT1) (Ter4) this work RM11 RM4 yniA::parS(P1)-frt-cat-frt (Ter7) feaR::parS(pMT1) (Ter4) this work RM12 RM4 ybfD::parS(P1)-frt-cat-frt (Right2) yfgE::parS(pMT1) (Left1) this work RM13 MG1655 crL::parSP1-frt-cat-frt (NSR2) Espeli et al., 2008 RM14 MG1655 ybbL::parSP1-frt-cat-frt (NSR5) Espeli et al., 2008 RM15 MG1655 pheP::parSP1-frt-cat-frt (Right1) Espeli et al., 2008 RM16 MG1655 ybfD::parSP1-frt-cat-frt (Right2) Espeli et al., 2008 RM17 MG1655 ycdN::parSP1-frt-cat-frt (Right5) Espeli et al., 2008 RM18 MG1655 osmB::parSP1-frt-cat-frt (Ter2) Espeli et al., 2008 RM19 MG1655 ydaA::parSP1-frt-cat-frt (Ter3) Espeli et al., 2008 RM20 MG1655 feaR::parSP1-frt-cat-frt (Ter4) Espeli et al., 2008 RM21 MG1655 Ter::parSP1-Kn (Ter5) Li et al., 2003 RM22 MG1655 gusC::parSP1-frt-cat-frt (Ter6) espeli et al., 2008 RM23 MG1655 yniA::parSP1-frt-cat-frt (Ter7) this work RM24 MG1655 yoaC::parSP1-frt-cat-frt (Ter8) Espeli et al., 2008 RM25 MG1655 yedL::parSP1-frt-cat-frt (Left3) Espeli et al., 2008 RM26 MG1655 hisI::parSP1-frt-cat-frt (Left2) Espeli et al., 2008

Page 14: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

14

RM27 MG1655 yfgE::parSP1-frt-cat-frt (Left1) Espeli et al., 2008 RM28 MG1655 ygeB::parSP1-frt-cat-frt (NSL3) Espeli et al., 2008 RM29 MG1655 oriC::parSP1-Kn (oriC) Li et al., 2003 RM30 MG1655 hupA::parSP1-frt-cat-frt (Ori2) Espeli et al., 2008 RM31 MG1655 aidB::parSP1-frt-cat-frt (Ori3) Espeli et al., 2008 RM32 RM2 crL::parSP1-frt-cat-frt (NSR2) this work RM33 RM2 ybbL::parSP1-frt-cat-frt (NSR5) this work RM34 RM2 pheP::parSP1-frt-cat-frt (Right1) this work RM35 RM2 ybfD::parSP1-frt-cat-frt (Right2) this work RM36 RM2 ycdN::parSP1-frt-cat-frt (Right5) this work RM37 RM2 osmB::parSP1-frt-cat-frt (Ter2) this work RM38 RM2 ydaA::parSP1-frt-cat-frt (Ter3) this work RM39 RM2 feaR::parSP1-frt-cat-frt (Ter4) this work RM40 RM2Ter::parSP1-Kn (Ter5) this work RM41 RM2 gusC::parSP1-frt-cat-frt (Ter6) this work RM42 RM2 yniA::parSP1-frt-cat-frt (Ter7) this work RM43 RM2 yoaC::parSP1-frt-cat-frt (Ter8) this work RM44 RM2 yedL::parSP1-frt-cat-frt (Left3) this work RM45 RM2 hisI::parSP1-frt-cat-frt (Left2) this work RM46 RM2 yfgE::parSP1-frt-cat-frt (Left1) this work RM47 RM25 ygeB::parSP1-frt-cat-frt (NSL3) this work RM48 RM2 oriC::parSP1-frt-cat-frt (oriC) this work RM49 RM2 hupA::parSP1-frt-cat-frt (Ori2) this work RM50 RM2 aidB::parSP1-frt-cat-frt (Ori3) this work RM51 MG1655 gusC::parSP1-frt-cat-frt (Ter6) ΔmatS23 this work RM52 MG1655 yniA::parSP1-frt-cat-frt (Ter7) ΔmatS23 this work RM53 MG1655 yoaC::parSP1-frt-cat-frt (Ter8) ΔmatS23 this work RM54 MG1655 yedL::parSP1-frt-cat-frt (Lef3) ΔmatS23 this work RM55 MG1655 gusC::parSP1-frt-cat-frt (Ter6) ΔmatS23 ΔmatS22::frt-Kn-frt this work RM56 MG1655 yniA::parSP1-frt-cat-frt (Ter7) ΔmatS23 ΔmatS22::frt-Kn-frt this work RM57 MG1655 yoaC::parSP1-frt-cat-frt (Ter8) ΔmatS23 ΔmatS22::frt-Kn-frt this work RM58 MG1655 yedL::parSP1-frt-cat-frt (Left3) ΔmatS23 ΔmatS22::frt-Kn-frt this work RM59 MG1655 gusC::parSP1-frt-cat-frt (Ter6) ΔmatS23 ΔmatS22 ΔmatS21::frt-Kn-frt this work RM60 MG1655 yniA::parSP1-frt-cat-frt (Ter7) ΔmatS23 ΔmatS22 ΔmatS21::frt-Kn-frt this work RM61 MG1655 yoaC::parSP1-frt-cat-frt (Ter8) ΔmatS23 ΔmatS22 ΔmatS21::frt-Kn-frt this work RM62 MG1655 yedL::parSP1-frt-cat-frt (Left3) ΔmatS23 ΔmatS22 ΔmatS21::frt-Kn-frt this work RM63 MG1655 gusC::parSP1-frt-cat-frt (Ter6) ΔmatS22 ΔmatS21::frt-Kn-frt this work RM64 MG1655 yniA::parSP1-frt-cat-frt (Ter7) ΔmatS22 ΔmatS21::frt-Kn-frt this work

Page 15: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

15

RM65 MG1655 yoaC::parSP1-frt-cat-frt (Ter8) ΔmatS22 ΔmatS21::frt-Kn-frt this work RM66 MG1655 yedL::parSP1-frt-cat-frt (Left3) ΔmatS22 ΔmatS21::frt-Kn-frt this work RM67 MG1655 yoaC::parSP1-frt-cat-frt (Ter8) yecT::matS24::frt-Kn-frt this work RM68 MG1655 yedL::parSP1-frt-cat-frt (Left1) yecT::matS24::frt-Kn-frt this work RM69 MG1655 yoaC::parSP1-frt-cat-frt (Ter8) yedS::matS25::frt-Kn-frt this work RM70 MG1655 yedL::parSP1-frt-cat-frt (Left3) yedS::matS25::frt-Kn-frt this work RM71 MG1655 yoaC::parSP1-frt-cat-frt (Ter8) yecT::matS24 yedS::matS25::frt-Kn-frt this work RM72 MG1655 yedL::parSP1-frt-cat-frt (Left3) yecT::matS24 yedS::matS25::frt-Kn-frt this work RM73 RM4 [lacO240-Cm] at position ter1 (1567kb) this work RM74 MG1655 [lacO240-Cm] at position ter1 (1567kb) this work RM75 RM2 [lacO240-Cm] at position ter1 (1567kb) this work AB1157 thr-1 leu-6 thi-1 lacY1 galK2 ara-14 xyl-5 mtl-1 proA2 his-4 argE3 str-31 tsx-33 supE44 rec+Lab collection RM76 AB1157 matP-gfp::frt-kn-frt this work RM77 MG1655 ssb-yfp::frt-cat-frt this work RM78 RM2 ssb-yfp::frt-cat-frt this work

Page 16: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

16

Table S4. Plasmids used in this study Plasmid Description reference/source pTSC29 pSC101 ts replicon lab collection pRM1 pTSC29 Δplac (XmaI-XmaI) this work pRM2 Pant-matS-rpsL cloned into pRM1 this work pRM3 Pant-shuffle-rpsL cloned into pRM1 this work pRM4 matP cloned into pET28A this work pALA2905 GFP-Δ30ParBP1 expression vector Li et al., 2003 pFH2973 CFP-Δ30ParBP1/ YFP-Δ23ParBT1 expression vector Nielsen et al., 2006b pCP20 FLP expression vector, pSC101ts replicon Datsenko and Wanner, 2000 pTSA29-CXI pTSA29 carrying cI857-PR-(xis -int ) Valens et al., 2004 pKD3 template plasmid frt-cat-frt Datsenko and Wanner, 2000 pKD4 template plasmid frt-Kn-frt Datsenko and Wanner, 2000 pGBKD3-parSP1 pGB2 carrying frt-cat-frt from pKD3 upstream from parS (P1) Espéli et al., 2008 pGBKD3-parSpmT1 pGB2 carrying frt-cat-frt from pKD3 upstream from parS (pMT1) this work pGBMKn-GFP egfp cloned into pGBM2 carrying frt-Kn-frt from pKD4 Espéli et al., 2003 pGBMKn-mcherry mcherry cloned into pGBM2 carrying frt-Kn-frt from pKD4 this work pLau53 lacI-cfp cloned into pBAD24 Lau et al., 2003

Table S5. Positions of the parS tags used in this study

name parS position (pb from thr)oriC 3909402 Ori-1 3928826 Ori-2 4197685 Ori-3 4413507 Ori-4 9883 NSR-1 71279 NSR-2 258144 NSR-3 316878 NSR-4 401608 NSR-5 515143 Right-1 602547 Right-2 738100 Right-3 806549 Right-4 1056444 Right-5 1080438 Ter-1 1308375 Ter-2 1341067 Ter-3 1395706 Ter-4 1444252 Ter-5 1568683 Ter-6 1689438 Ter-7 1806680 Ter-8 1892515 Left-1 2616013 Left-2 2095272 Left-3 2009123 NSL-1 3739211 NSL-2 3079683 NSL-3 3010469

Page 17: The MatP/matS Site-Specific System Organizes the Terminus

Cell, Volume 135

17

Table S6. Strains used in the genetic interactions analysis FBG146 MG1655 ΔlacIZ ΔattBλ::aadA-attRλ-‘lacZ-cat Valens et al., 2004 RM79 MG1655 ΔlacIZ ΔattBλ::aadA-attRλ-‘lacZ-cat ΔmatP This work attR17, clockwise at 806549 (17’36) Name attL coordinates Genetic position of attL Gene Distance attL-attR LR159 410762 8’85 araJ -395 LR19 682256 14’68 ybeW -124 LR14 9141197 19’75 ybjD-ybjX 107 LR128 1002346 21’60 ycbU 195 LR64 1208908 26’06 pinE 402 LR161 1226937 26’45 ycgL-ycgM 420 LR17 1342702 28’92 yciT-yciR 536 LR160 1461878 31’49 paaX 655 LR1 1782830 38’43 pps 976 LR18 1914670 41’25 ycbR-yebS 1108 LR67 2481144 53’46 emrK 1674 LR126 2574066 55’47 eutS-maeB 1767

Table S7. Ranks of parS sub-words of length 9, 10, 11 for Markov chain models 3 to 6

Sub-word length

Sub-word sequence Count Rank M3 Rank M4 Rank M5 Rank M6

9 cgtgtaaca 6 13085 8938 3494 6095 acgtgtaac 6 2600 2469 1017 831 cacgtgtaa 8 906 1293 864 478 tacacgtga 6 6539 11360 7735 2999 acacgtgaa 7 14026 23345 22913 29713 cacgtgaaa 12 4286 5616 2446 1061 acgtgaaac 8 24833 21093 6660 2994 cgtgaaaca 11 53099 35064 22894 7625

10 acgtgtaaca 5 586 376 152 147 cacgtgtaac 6 21 16 7 7 tcacgtgtaa 4 3951 5880 3187 1733 tacacgtgaa 3 21716 35912 28331 30887 acacgtgaaa 4 22945 32147 28459 34162 cacgtgaaac 6 857 665 227 42 acgtgaaaca 5 21290 18641 7117 5392

11 tgttacacgtg 5 1 1 1 2 gttacacgtga 3 564 664 330 254 ttacacgtgaa 3 3453 5280 3057 3714 tacacgtgaaa 3 5264 7819 5164 5147 acacgtgaaac 3 3413 3528 2453 1285 cacgtgaaaca 5 176 150 70 25