supplementary information - nature. 4. remove dna templates from mrnas by using a turbo dna -free...

27
WWW.NATURE.COM/NATURE | 1 SUPPLEMENTARY INFORMATION doi:10.1038/nature13761 SUPPLEMENTARY METHODS Ribosome display-based protein barcoding This section describes the procedure to generate PRMC complexes from barcoded DNA templates: 1. Prepare linear barcoded DNA templates by PCR as follows: PCR components Volume (ΞΌL) Template (e.g., plasmids, refer to Supplementary Table 5 for DNA sequences) x (~5-10 ng) Upstream barcoded primer (β€œTemplate barcoded (F)”, 100 ΞΌM) 0.5 Downstream universal primer (β€œTemplate (R)”, 100 ΞΌM) 0.5 10Γ— PCR buffer (supplied with Taq) 5 MgCl2 (50 mM) 1.5 DMSO 2.5 Betaine (5 M) 10 dNTPs (10 mM each) 1 Platinum Taq DNA polymerase (5U/ ΞΌl) 0.4 dH2O to 50 30 cycles of thermal cycling with an annealing temperature of 54˚C. Note: 1) All oligos used in this study were purchased from Integrated DNA Technologies and the sequences can be found in Supplementary Table 5. 2) DMSO and betaine were found to enhance the yield. 2. Purify PCR products with a QIAquick PCR purification kit (Qiagen) and measure their concentrations by NanoDrop. Mix barcoded DNA templates for subsequent assays and analyses. 3. In vitro transcribe mRNA templates by using a HiScribe T7 kit (NEB) as follows: In vitro transcription components Volume (ΞΌL) Mixed linear DNA templates x (~6-10 ΞΌg) 10Γ— transcription buffer (supplied with the kit) 20 20x ribonucleotide mix (supplied with the kit) 10 20x HMW mix (supplied with the kit) 10 T7 polymerase (500 U/ΞΌl) 10 dH2O to 200 The IVT reaction is incubated at 42˚C for 2 h. Note: A long incubation time can increase mRNA hydrolysis. 4. Remove DNA templates from mRNAs by using a TURBO DNA-free kit (Cat# AM1907, Ambion), and purify transcribed mRNAs with an RNeasy Mini kit (Qiagen). Purified mRNAs can be stored at - 80˚C for later use. 5. Prepare mRNA-cDNA hybrids via reverse transcription as follows:

Upload: nguyenkhue

Post on 16-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

W W W. N A T U R E . C O M / N A T U R E | 1

SUPPLEMENTARY INFORMATIONdoi:10.1038/nature13761

2

SUPPLEMENTARY METHODS

Ribosome display-based protein barcoding

This section describes the procedure to generate PRMC complexes from barcoded DNA templates:

1. Prepare linear barcoded DNA templates by PCR as follows:

PCR components Volume (Β΅L) Template (e.g., plasmids, refer to Supplementary Table 5 for DNA sequences)

x (~5-10 ng)

Upstream barcoded primer (β€œTemplate barcoded (F)”, 100 ΞΌM)

0.5

Downstream universal primer (β€œTemplate (R)”, 100 ΞΌM)

0.5

10Γ— PCR buffer (supplied with Taq) 5 MgCl2 (50 mM) 1.5 DMSO 2.5 Betaine (5 M) 10 dNTPs (10 mM each) 1 Platinum Taq DNA polymerase (5U/ ΞΌl) 0.4 dH2O to 50

30 cycles of thermal cycling with an annealing temperature of 54˚C. Note: 1) All oligos used in this

study were purchased from Integrated DNA Technologies and the sequences can be found in

Supplementary Table 5. 2) DMSO and betaine were found to enhance the yield.

2. Purify PCR products with a QIAquick PCR purification kit (Qiagen) and measure their

concentrations by NanoDrop. Mix barcoded DNA templates for subsequent assays and analyses.

3. In vitro transcribe mRNA templates by using a HiScribe T7 kit (NEB) as follows:

In vitro transcription components Volume (Β΅L) Mixed linear DNA templates x (~6-10 ΞΌg) 10Γ— transcription buffer (supplied with the kit) 20 20x ribonucleotide mix (supplied with the kit) 10 20x HMW mix (supplied with the kit) 10 T7 polymerase (500 U/ΞΌl) 10 dH2O to 200

The IVT reaction is incubated at 42˚C for 2 h. Note: A long incubation time can increase mRNA

hydrolysis.

4. Remove DNA templates from mRNAs by using a TURBO DNA-free kit (Cat# AM1907, Ambion),

and purify transcribed mRNAs with an RNeasy Mini kit (Qiagen). Purified mRNAs can be stored at -

80˚C for later use.

5. Prepare mRNA-cDNA hybrids via reverse transcription as follows:

SUPPLEMENTARY INFORMATION

2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

3

Reverse transcription components (Mix 1) Volume (Β΅L) mRNA templates x (~0.2 ΞΌM) RT primer (β€œRT primer”, 10 ΞΌM) 20 dNTPs (10 mM each) 10 dH2O to 110

Incubate the Mix 1 at 65Β°C for 5 min and then place it on ice for at least 1 min.

Reverse transcription components (Mix 2) Volume (Β΅L) 10Γ— RT buffer 20 MgCl2 (25 mM) 40 DTT (0.1 M) 10 RNaseOUT (40 U/Β΅L) 10 SuperScript III RT (200 U/Β΅L) 10

Incubate the mixture of Mix 1 and 2 at 50˚C for ~30 min. The RT reaction can be scaled up (e.g., β‰₯1

mL) using multiple tubes. Note: Avoid a prolonged incubation of mRNAs in the presence of Mg2+ to

alleviate mRNA hydrolysis.

6. Precipitate mRNA–cDNA hybrids in the reaction mixture by using isopropanol. For example, add

60 Β΅L ammonium acetate (5 M), 12 Β΅L EDTA (0.5 M) to a 0.5-mL reaction mixture and then mix it

with 0.6 mL isopropanol. After incubation at -20˚C for 30 min, collect the precipitates by

centrifugation (14,000 g, 4 ˚C) for 15 min and wash them with 70% ethanol (DEPC treated). Quantify

mRNA–cDNA hybrids by measuring their cDNAs via real-time PCR. Note: mRNAs lacking barcoding

DNA can lead to formation of non-barcoded proteins, which can be separated from PRMC complexes

via streptavidin pull-down (see the step 10).

7. In vitro translate and display proteins on PRMC complexes by using a PURExpress Ξ” Ribosome kit

(NEB) as follows:

In vitro translation components Volume (Β΅L) Solution A 40 Factor mix 12 Ribosomes (13.3 Β΅M) 2.2 (~0.3 Β΅M) RNase inhibitor (40 U/Β΅L, Cat# M0314, NEB) 2 mRNA–cDNA hybrids x (~0.4 Β΅M) dH2O to 100

Incubate the reaction at 37˚C for 30 min. Note: mRNA–cDNA templates were added at a higher molar

concentration than that of ribosomes to decrease polysome formation.

8. Quench the reaction by addition of 100 Β΅L ice-cold buffer HKM (50 mM HEPES, pH 7.0, 250 mM

KOAc, 25 mM Mg(OAc)2, 0.25 U/mL RNasin (Promega), 0.5 mg/mL chloramphenicol, 5 mM 2-

mercaptoethanol and 0.1% (v/v) Tween 20). Centrifuge (14,000 g, 4˚C) the tube for 10 min to remove

insoluble components. Note: PRMC complexes should be kept on ice or in cold room to improve their

W W W. N A T U R E . C O M / N A T U R E | 3

SUPPLEMENTARY INFORMATION RESEARCH

4

stability.

9. Purify PRMC complexes containing full-length proteins of interest by using Flag-tag affinity

purification. Incubate a 200-Β΅L reaction mixture with 40 Β΅L anti-Flag M2 magnetic beads (Sigma-

Aldrich), which are blocked with the buffer HKM supplemented with 100 Β΅g/mL yeast tRNA and 10

mg/mL BSA, with gentle mixing for ~2-4 h in cold room. Elute bound PRMC complexes with the

buffer HKM in the presence of 0.1 mg/mL Flag peptide.

10. To remove ribosome complexes lacking barcoding DNAs, as well as the Flag peptide which might

interfere with following assays, further purify PRMC complexes with streptavidin-coated magnetic

beads (Dynabeads M-270 Streptavidin, Life Technologies). For example, incubate 200 Β΅L eluent (from

the step 9) with 100 Β΅L streptavidin magnetic beads, which were pretreated with 0.1 M NaOH (refer

to the manual of the beads) and blocked with the buffer HKM in the presence of 100 Β΅g/mL yeast

tRNA and 10 mg/mL BSA, for 1 h in cold room with gentle mixing. Elute bound PRMC complexes

with 20 Β΅L buffer HKM containing 5 mM biotin.

11. Quantify PRMC complexes by measuring their cDNAs via real-time PCR. Note: Estimated yields

of PRMC complexes varied from 2.5 to 10.6% of the molar amounts of added mRNA–cDNA hybrids

based on a test of individually displayed proteins of different sizes.

HaloTag-based protein barcoding

Enzymatic tags (e.g., HaloTag, SNAP-tag and CLIP-tag) can be applied to the covalent coupling of

various proteins to a barcoding DNA30. Compared with chemical conjugation methods, they can

improve the binding of proteins to an enzyme ligand-modified DNA and catalyze the bond formation.

This section describes how to prepare conjugates of HaloTagged proteins and barcoding dsDNAs

(Extended Data Fig. 2).

1. Prepare a HaloTag ligand-modified primer. Incubate 100 ΞΌL conjugation reaction containing an

amino modified oligo (100 ΞΌM), a succinimidyl ester (O4) Halo-ligand (10 mM, freshly prepared in

DMSO, Promega) and 50 ΞΌL formamide in 50 mM Na2HPO4, pH 8.0, 150 mM NaCl at room

temperature for 1 h. Purify the ligand-modified oligo by reverse-phase HPLC using a Zorbax Eclipse

XDB-C18 column (5 ΞΌm, 9.4Γ—250 mm, Agilent Technologies) and an elution gradient of 5-70%

CH3CN/H2O (0.1 M triethylammonium acetate). Lyophilize the modified oligo for further use. Note:

Formamide denaturation of the oligo was found to improve the conjugation efficiency.

2. Prepare barcoded templates via the first PCR:

PCR components Volume (Β΅L)

SUPPLEMENTARY INFORMATION

4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

5

Universal backbone DNA template (β€œUniversal template of barcoding DNA 1 or 2”, 1 nM)

1

Upstream barcoded primer (β€œBarcoding DNA-1 or 2 (F)”, 12.5 ΞΌM)

1

Universal downstream primer (β€œBarcoding DNA (R)”, 12.5 ΞΌM)

1

Platinum PCR SuperMix (Life Technologies) 22.5

25 cycles of thermal cycling with an annealing temperature of 58˚C. Barcoded primers were prepared

in 96-well plates.

3. Prepare barcoding dsDNAs with desthiobiotin, acrydite and Halo-ligand modifications via the

secondary PCR:

PCR components Volume (Β΅L) Barcoded template (~0.1-1 nM) 1 Universal upstream modification primer (β€œBarcoding DNA modification (F)”, 25 ΞΌM)

2

Universal downstream modification primer (β€œBarcoding DNA modification (R)”, 25 ΞΌM)

2

Platinum PCR SuperMix 45

30 cycles of thermal cycling with an annealing temperature of 60˚C.

Purify PCR products with AMPure XP beads (Beckman Coulter) and quantify them with NanoDrop.

4. To generate protein–DNA conjugates, incubate ~0.5-2 ΞΌM barcoding dsDNAs and ~2-5 ΞΌM

HaloTagged proteins in a conjugation buffer (50 mM HEPES, pH 7.5, 150 mM NaCl, 2 mM EDTA and

5% glycerol) with gentle shaking at room temperature for 2-4 h. Note: The yields of protein–DNA

conjugates were estimated to be above 15% based on a test of proteins of various sizes.

5. To remove free barcoding dsDNAs, purify the conjugates, as well as free proteins, by using the anti-

Flag M2 or His-tag (Dynabeads) magnetic beads, and elute them with 50 mM sodium phosphate, pH

8.0, 300 mM NaCl, 1 mM EDTA, 5% glycerol and 0.1% (v/v) Tween 20, in the presence of 0.1 mg/mL

Flag peptide or 250 mM imidazole.

6. To remove free proteins, purify the conjugates by using the M-270 streptavidin-coated magnetic

beads, and elute them with assay buffers in the presence of 5 mM biotin.

7. Quantify protein–DNA conjugates by real-time PCR. The conjugated can be stored at -80˚C for

future use.

Array deposition

This section outlines the protocol to immobilize SM barcoded proteins on the surface of a microscopic

glass slide.

W W W. N A T U R E . C O M / N A T U R E | 5

SUPPLEMENTARY INFORMATION RESEARCH

6

1. Clean glass slides and coverslips (e.g., 24 x 60 mm rectangular, No. 2) by sonication in 5% Contrad

70, 1 M NaOH, 0.1 N HCl and Milli-Q H2O, and air dried in an AirClean PCR hood.

2. Treat the slide surface with Bind-Silane (GE Healthcare). A detailed protocol can be found at

http://arep.med.harvard.edu/polony/polony_protocols/bind_silane.htm.

3. Prepare a gel-casting solution as follows:

Gel-casting solution components Volume (Β΅L) 2Γ— deposition buffer (40 mM HEPES, pH 7.0, 100 mM KOAc, 12 mM Mg(OAc)2, 0.5 U/mL RNasin (Promega) and 0.2% Tween 20)

45

40% acrylamide/bis-acrylamide (19:1, molecular grade, Ambion) 15 Bridge amplification primer (F) (β€œBridge amplification (F)”, 1 mM) 25 Bridge amplification primer (R) (β€œBridge amplification (R)”, 1 mM) 25 dH2O to 90

Note: Because oxygen trapped in solution or on glass surface can inhibit acrylamide polymerization,

the reagents are degassed with argon and put into an anaerobic chamber (Coy Lab). The reagent mixing

and gel polymerization process are handled in the chamber.

4. Prior to the immobilization, dilute samples with the deposition buffer to a protein concentration

ranging from 0.1 to 1 nM. Note: The protein concentration can be adjusted to optimize polony

densities.

5. Prepare a gel-casting mix by adding 10 Β΅L diluted protein sample to 90 Β΅L gel-casting solution.

6. Add 1 Β΅L 10% (v/v) TEMED and 1 Β΅L 5% (w/v) ammonium persulfate to the gel-casting mix, and

apply β‰₯ 20 ΞΌL the gel-casting mix to the Bind-Silane-treated slide surface. To form a gel layer of less

than 5-ΞΌm thickness, place a coverslip on the top of the liquid and tightly press it against the slide to

form a liquid layer evenly spread over its surface. Note: A degassed gel-casting mix undergoes a faster

polymerization than usual, so complete this process quickly or otherwise reduce the amounts of

TEMED and ammonium persulfate.

7. Allow the gel to polymerize in the chamber for ~4 h.

8. Gently remove the coverslip under the Milli-Q H2O with the help of a steel blade. Wash the slide

with Milli-Q H2O in a Coplin jar, dry it by a quick spin and place it face up in a PCR hood.

Polony amplification, linearization and blocking

This procedure is to convert barcoding DNAs into linearized and 3’-OH blocked polonies prior to

sequencing. The process is partly similar to the cluster generation applied to Illumina platforms10. To

facilitate changing reagents and buffers during polony amplification, a protein-loaded slide was

assembled into a FC 81 transmission flow cell with a 1.85-mm-thick polycarbonate flow channel

SUPPLEMENTARY INFORMATION

6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

7

(BioSurface Technologies). The flow cell temperature was controlled by a VWR modular heating

block.

1. Prepare the following buffers:

Buffer Components Volume RNA digesting buffer 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2 and

0.1% (v/v) Triton X-100 100 mL

Amplification buffer 20 mM Tris-HCl, pH 8.8, 10 mM ammonium sulfate, 2 mM magnesium sulfate, 0.1% (v/v) Triton X-100, 1.3% (v/v) DMSO, 2M betaine

1,000 mL

Linearization buffer 20 mM Tris-HCl, pH 8.8, 10 mM KCl, 10 mM ammonium sulfate, 2 mM magnesium sulfate and 0.1% (v/v) Triton X-100

100 mL

Blocking buffer 20 mM Tris-acetate, pH 7.9, 50 mM KOAc, 10 mM Mg(OAc)2 and 0.25 mM CoCl2

100 mL

Wash buffer W1 1Γ—SSC and 70% formamide 500 mL Wash buffer W2 0.3Γ—SSC and 0.1% (v/v) Tween 20 200 mL

Note: Milli-Q H2O and molecular biology grade reagents are used to avoid nuclease contamination.

2. Clean flow cell components including the polycarbonate flow channel and a coverslip by sonication

in 5% Contrad 70 and Milli-Q H2O, and air dried in an AirClean PCR hood.

3. For samples containing PRMC complexes, digest mRNAs by adding the RNA digesting buffer in

the presence of 10 U/mL RNase H (NEB) into the flow cell and incubating it at 37˚C for 20 min. Wash

the flow cell with the wash buffer W2 (3Γ—3 mL).

4. Increase the flow cell temperature to 60˚C, and maintain it for the polony amplification process

(steps 5-8).

5. Wash the flow cell with deionized formamide (3Γ—3 mL, Ambion).

6. Wash the flow cell with the amplification buffer (3Γ—3 mL).

7. Add the amplification buffer in the presence of 200 ΞΌM dNTPs and 80 U/mL Bst polymerase (NEB)

into the flow cell and incubate it for 5 min.

8. Repeat the steps 5-7 for additional 31 cycles.

9. Decrease the flow cell temperature to 37˚C.

10. Wash the flow cell with the wash buffer W2 (3Γ—3 mL) and the linearization buffer (3Γ—3 mL).

11. To linearize polonies, add the linearization buffer in the presence of 10 U/mL USER enzyme (NEB)

and incubate the flow cell at 37˚C for 1 h.

12. Wash off the excised strands with the wash buffers W1 (3Γ—3 mL) and W2 (3Γ—3 mL).

W W W. N A T U R E . C O M / N A T U R E | 7

SUPPLEMENTARY INFORMATION RESEARCH

8

13. Wash the flow cell with the blocking buffer (3Γ—3 mL).

14. To block 3’-OH ends of polonies and primers, add the blocking buffer in the presence of 10 ΞΌM

ddNTPs and 250 U/mL terminal transferase (NEB) and incubate the flow cell at 37˚C for 10 min. To

drive the reaction to completion, refill the flow cell with the fresh reagents and repeat this step twice.

Note: The 3’-OH blocking can prevent nonspecific ligation of labeled oligos to polonies and gel-

anchored primers during sequencing.

15. Wash the flow cell with the wash buffer W2 (3Γ—3 mL).

Polony sequencing-by-ligation and colocalization analysis

Polonies generated by our approach are compatible with both sequencing-by-synthesis and

sequencing-by-ligation chemistries. Programmable synthetic barcodes can expand choices of

sequencing strategies. In this work, we modified a sequencing-by-ligation method reported by our lab11

(http://www.polonator.org/protocols/). As detailed protocols of the sequencing method can be found

in our previous reports11,51 (http://arep.med.harvard.edu/Polonator/), this section only focuses on

differences of the current protocol.

1. To facilitate the deconvolution of sequencing signals from colocalized protein and probe polonies,

two rounds of sequencing with different anchor primers (β€œSequencing 1” and β€œSequencing 2”,

Supplementary Table 5) were successively conducted for protein and probe libraries.

2. Because polony sequencing was performed with a three-channel fluorescence imaging setup, a

three-color sequencing method was designed to decode synthetic barcodes only composed of A, T and

C. Thus, for each query position (e.g., position 1 to 5), an anchor primer is ligated with three

fluorescently labeled degenerate nonamer pools. As previously described11,51, each sequencing-by-

ligation cycle comprises four steps:

(i) Hybridize an anchor primer (10 ΞΌM) to polonies in a hybridization buffer (5Γ—SSC and 0.1% (v/v)

Tween 20) at 60˚C for 10 min and then decrease the temperature to 40˚C.

(ii) Ligate polony-bound anchor primers with nonamers (2 ΞΌM each pool) in a ligation buffer (50 mM

Tris-HCl, pH 7.6, 10 mM MgCl2, 1 mM ATP and 5 mM DTT) in the presence of 30 U/Β΅l T4 DNA

ligase (Enzymatics) at room temperature for 20 min, and then increase the temperature to 35˚C and

maintain it for 40 min.

(iii) Scan the polony slide by using a fluorescence microscope to determine ligated nonamers.

(iv) Strip off polony-bound primers by washing with the buffer W1 at 60˚C and then with the buffer

W2.

SUPPLEMENTARY INFORMATION

8 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

9

Note: To save the ligase and oligos used for each cycle, the hybridization and ligation steps were

performed in a gasket chamber (~0.5 mL) assembled with a polony slide and a microarray gasket slide

(Cat# G2534-60008, Agilent Technologies). The stripping was performed in a Coplin jar.

3. For polony colocalization analysis, reference images constructed for protein and probe polony

sequencing are aligned with the assist of a cross-library reference. Thus, protein and probe polonies

were hybridized with anchor primers labelled with different fluorophores (β€œSequencing 1-Cy3” and

β€œSequencing 2-Cy5”, Supplementary Table 5), and their super-imposed images served as the reference.

4. Polony colocalization analysis was performed at each image position. MATLAB scripts analyze all

combinations of protein polony and probe polony positions to identify and count the protein polonies

within a threshold distance (e.g., 0.7 Β΅m) from probe polonies.

W W W. N A T U R E . C O M / N A T U R E | 9

SUPPLEMENTARY INFORMATION RESEARCH

10

SUPPLEMENTARY NOTES

1. Colocalization statistics

To compare degrees of colocalization between different protein and probe pairs in an experiment,

we measured colocalization ratios defined as the percentages of protein polonies colocalized with

corresponding probe polonies, and performed Student’s t-tests for the measurements at multiple

imaging positions. The contribution from random colocalization can be estimated by calculating the

mean value of pair cross-correlation function (PCCF) over the distance interval of zero to the

colocalization threshold. In addition, the PCCF statistic 39 can be applied to characterize colocalization

patterns of two polony species that were overlapped or partially overlapped. Below is how the PCCF

values were calculated.

Let i and j be two types of objects for colocalization analysis and A be a sampled array area. A

cross-correlation Ripley K-function �̂�𝐾(π‘Ÿπ‘Ÿ) can be estimated 52 as

�̂�𝐾𝑖𝑖,𝑗𝑗(π‘Ÿπ‘Ÿ) = 1π΄π΄οΏ½Μ‚οΏ½πœ†π‘–π‘–οΏ½Μ‚οΏ½πœ†π‘—π‘—

βˆ‘ βˆ‘ Ο‰(π‘–π‘–π‘˜π‘˜, 𝑗𝑗𝑙𝑙)I(π‘‘π‘‘π‘–π‘–π‘˜π‘˜,𝑗𝑗𝑙𝑙 < π‘Ÿπ‘Ÿ)π‘™π‘™π‘˜π‘˜

where π‘‘π‘‘π‘–π‘–π‘˜π‘˜,𝑗𝑗𝑙𝑙 is the distance between the centroids of k’th location of type i objects and the l’th location

of type j objects, and I(π‘‘π‘‘π‘–π‘–π‘˜π‘˜,𝑗𝑗𝑙𝑙 < π‘Ÿπ‘Ÿ) is the indicator function with the value 1 if π‘‘π‘‘π‘–π‘–π‘˜π‘˜,𝑗𝑗𝑙𝑙 < π‘Ÿπ‘Ÿ is true and

0 otherwise. The density of type i objectives οΏ½Μ‚οΏ½πœ† can be estimated as

οΏ½Μ‚οΏ½πœ†π‘–π‘– = 𝑁𝑁𝑖𝑖𝐴𝐴

where 𝑁𝑁𝑖𝑖 is the total number of i objects. The weight function, Ο‰(π‘–π‘–π‘˜π‘˜, 𝑗𝑗𝑙𝑙) provides an edge correction

but was here ignored (Ο‰(π‘–π‘–π‘˜π‘˜, 𝑗𝑗𝑙𝑙) β‰ˆ 1). The function �̂�𝐾𝑖𝑖,𝑗𝑗(π‘Ÿπ‘Ÿ) can be interpreted as the ratio of the

number of i and j objects localized within radius r of each other, over the number that would be

expected by chance. Following 39, we also computed a PCCF that considered colocalization within a

radial interval [π‘Ÿπ‘Ÿ, π‘Ÿπ‘Ÿ + βˆ†π‘Ÿπ‘Ÿ) via

1π΄π΄οΏ½Μ‚οΏ½πœ†π‘–π‘–οΏ½Μ‚οΏ½πœ†π‘—π‘—(2πœ‹πœ‹π‘Ÿπ‘Ÿβˆ†π‘Ÿπ‘Ÿ + πœ‹πœ‹βˆ†π‘Ÿπ‘Ÿ2)

βˆ‘ βˆ‘ 𝐼𝐼(π‘Ÿπ‘Ÿ ≀ π‘‘π‘‘π‘–π‘–π‘˜π‘˜,𝑗𝑗𝑙𝑙 < π‘Ÿπ‘Ÿ + βˆ†π‘Ÿπ‘Ÿ)𝑗𝑗𝑖𝑖

where βˆ‘ βˆ‘ 𝐼𝐼(π‘Ÿπ‘Ÿ ≀ π‘‘π‘‘π‘–π‘–π‘˜π‘˜,𝑗𝑗𝑙𝑙 < π‘Ÿπ‘Ÿ + βˆ†π‘Ÿπ‘Ÿ)𝑗𝑗𝑖𝑖 and π΄π΄οΏ½Μ‚οΏ½πœ†π‘–π‘–οΏ½Μ‚οΏ½πœ†π‘—π‘—(2πœ‹πœ‹π‘Ÿπ‘Ÿβˆ†π‘Ÿπ‘Ÿ + πœ‹πœ‹βˆ†π‘Ÿπ‘Ÿ2) are, respectively, an actual count of

colocalized objects i and an average number of objects i that are colocalized with objects j by chance.

The PCCF mean values were calculated over the interval of 0 to the colocalization threshold (π‘Ÿπ‘Ÿ = 0

and βˆ†π‘Ÿπ‘Ÿ = the colocalization threshold). In computing a PCCF value for an experiment in which Q

images were analyzed, colocalization events were aggregated over all images and divided by Q times

the expected number of random colocalization per image. By definition, randomly colocalized objects

should have PCCF values of 1. However, to assess whether PCCFs derived in actual experiments were

SUPPLEMENTARY INFORMATION

1 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

11

statistically significantly different from 1, following 39 we estimated 95% confidence intervals of the

PCCFs of randomly colocalized objects using Monte-Carlo simulations. Specifically, each simulation

assumed Q images, and within each image, Ni and Nj polony and probe objects, respectively, where Q

was the number of images analyzed in the experiment whose PCCF was being evaluated, and Ni and

Nj were the mean numbers of polony and probe objects observed in the actual experiment. Coordinates

for the protein and probe polonies were randomly picked using uniform locations. All dimensions were

scaled to actual image dimensions in pixels. For each simulation, a PCCF was computed in the same

manner as in the actual experiment by aggregating colocalization events over Q random images.

Finally, means and confidence intervals for these random PCCFs were obtained from 1,000 simulations.

2. Initial mathematical model of SM-based protein library vs. probe library binding assay

This note describes a mathematical model whose aim is to assist understanding of the sensitivity and

specificity of detection of protein–probe interactions in complex mixtures. The following items are

assumed:

1) 𝒏𝒏 species of barcoded proteins 𝑃𝑃1, 𝑃𝑃2, … , 𝑃𝑃𝑛𝑛 are allowed to interact with π’Žπ’Ž species of

barcoded probes 𝑅𝑅1, 𝑅𝑅2, … , π‘…π‘…π‘šπ‘š in a one-pot assay. It is assumed that each protein is present in

the same concentration and that the total protein concentration is 𝑃𝑃#. Similarly, it is assumed that

the total concentration of probes is 𝑅𝑅# and the concentration of each 𝑅𝑅𝑗𝑗 is 𝑅𝑅#/π‘šπ‘š. It is assumed

that probe concentrations are titratable and that 𝑅𝑅#/π‘šπ‘š ≫ 𝑃𝑃#/𝑛𝑛. For simplicity, we will assume

here that π‘šπ‘š = 𝑛𝑛 and that for each protein 𝑃𝑃𝑖𝑖 one probe 𝑅𝑅𝑖𝑖 (denoted with the same index) has

been chosen or designed to specifically target the protein.

2) Due to folding and other issues relating to the efficiency of ribosome display, only a fraction Ξ±

of each protein is in an active form that is capable of binding specifically to their targeting probes.

The active and inactive forms of the protein 𝑃𝑃𝑖𝑖 will be denoted 𝑃𝑃𝑖𝑖+ and 𝑃𝑃𝑖𝑖

βˆ’ , with total

concentrations 𝛼𝛼𝑃𝑃#𝑛𝑛 and (1βˆ’π›Όπ›Ό)𝑃𝑃#

𝑛𝑛 , respectively. For similar reasons, only a fraction of probes

are active and can specifically bind to their targeted proteins, and their active and inactive forms

will similarly be denoted 𝑅𝑅𝑗𝑗+ and 𝑅𝑅𝑗𝑗

βˆ’, with concentrations πœπœπ‘…π‘…#𝑛𝑛 and (1βˆ’πœπœ)𝑅𝑅#

𝑛𝑛 . These fractions are

assumed to be stable throughout the assay, and active and inactive forms of the proteins and

probes are assumed not to be able to interconvert. The fractions Ξ± and will be assumed to apply

to all proteins and all probes, respectively.

3) For 𝑖𝑖 = 1,2, … 𝑛𝑛, the active forms of protein 𝑃𝑃𝑖𝑖 and its specifically targeting probe 𝑅𝑅𝑖𝑖 will

interact according to the reaction

W W W. N A T U R E . C O M / N A T U R E | 1 1

SUPPLEMENTARY INFORMATION RESEARCH

12

(S1) 𝑃𝑃𝑖𝑖+ + 𝑅𝑅𝑖𝑖+𝐾𝐾𝐷𝐷↔ (𝑃𝑃𝑖𝑖+𝑅𝑅𝑖𝑖+)𝑆𝑆

where (𝑃𝑃𝑖𝑖+𝑅𝑅𝑖𝑖+)𝑆𝑆 denotes the complex formed from the specific interaction, and 𝐾𝐾𝐷𝐷 the

dissociation constant of this complex, and where 𝐾𝐾𝐷𝐷 applies equally to each such protein–probe

pair. All forms of protein 𝑃𝑃𝑖𝑖 will also interact non-specifically with all forms of all probes,

including with specific probe 𝑅𝑅𝑖𝑖 . This leads to four reactions between the active or inactive

protein 𝑃𝑃𝑖𝑖 and each of the n probes 𝑅𝑅𝑗𝑗 (𝑗𝑗 = 1,2, … 𝑛𝑛) , all of which are assumed to be

characterized by the same non-specific dissociation constant U:

(U1) 𝑃𝑃𝑖𝑖+ + 𝑅𝑅𝑗𝑗+π‘ˆπ‘ˆβ†” (𝑃𝑃𝑖𝑖+𝑅𝑅𝑗𝑗+)π‘ˆπ‘ˆ (j=1,..,n)

(U2) 𝑃𝑃𝑖𝑖+ + π‘…π‘…π‘—π‘—βˆ’π‘ˆπ‘ˆβ†” (𝑃𝑃𝑖𝑖+π‘…π‘…π‘—π‘—βˆ’)π‘ˆπ‘ˆ (j=1,..,n)

(U3) π‘ƒπ‘ƒπ‘–π‘–βˆ’ + 𝑅𝑅𝑗𝑗+π‘ˆπ‘ˆβ†” (π‘ƒπ‘ƒπ‘–π‘–βˆ’π‘…π‘…π‘—π‘—+)π‘ˆπ‘ˆ (j=1,..,n)

(U4) π‘ƒπ‘ƒπ‘–π‘–βˆ’ + π‘…π‘…π‘—π‘—βˆ’π‘ˆπ‘ˆβ†” (π‘ƒπ‘ƒπ‘–π‘–βˆ’π‘…π‘…π‘—π‘—βˆ’)π‘ˆπ‘ˆ (j=1,..,n)

It will also be assumed that (i) non-specific interactions between probes and proteins are always

binary, and we can therefore neglect the possibility of ternary or higher complexes, and (ii) probes

only non-specifically interact with proteins, and proteins only with probes, and thus that probes

and probes, and proteins and proteins, will not interact.

4) After these reactions reach equilibrium, protein–probe complexes of all of these sorts are

irreversibly captured by chemical crosslinking, and free probes are removed from the solution,

leaving a residual concentration 𝑅𝑅0. It is assumed that both free and complexed protein and

probe molecules are then deposited on the surface of the array in proportion to their solution

concentrations, and then immobilized on the array. Of these, it is assumed that only a fraction Ξ²

of protein and a fraction Ξ³ of probe molecules bear barcoding DNAs that can be successfully

amplified into polonies and detected on the array, and that amplifiability of protein and probe

DNAs is independent of whether the proteins and probes are free or in complex.

5) The following simplifications will be made regarding computation of PCCF statistics (see above):

Instead of computing PCCFs by counting all pairs of 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖 polonies within a specified

distance threshold, PCCFs will be calculated from the numbers of 𝑃𝑃𝑖𝑖 polonies that are found

colocalized with 𝑅𝑅𝑖𝑖 polonies in either of the following ways: (i) specific and non-specifically

SUPPLEMENTARY INFORMATION

1 2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

13

bound 𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖 complexes in which both components form polonies (as per the assumption 4)

will be counted as intrinsically colocalized polonies; (ii) 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖 polonies that are formed on

the array by other means may be found to be randomly colocalized. A central value for random

colocalization will be computed as the number of non-𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖 -derived 𝑃𝑃𝑖𝑖 polonies that are

expected to be found by chance within the distance threshold from non-𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖 -derived 𝑅𝑅𝑖𝑖 polonies, given the numbers of these polonies obtained from 4 above. The sum of (i) and (ii) will

be used to compute a central PCCF for 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖 on the array, and variation from this central

value will be estimated by random simulations described below. This calculation of PCCF differs

from the formal definition given above and in 39 by being non-symmetrical in 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖. Also,

in counting 𝑃𝑃𝑖𝑖 polonies that are near 𝑅𝑅𝑖𝑖 polonies instead of counting all pairs of neighboring

𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖 polonies, it ignores the extra pairs that would be taken into account in the PCCF as

formally defined should a 𝑃𝑃𝑖𝑖 polony be found near multiple 𝑅𝑅𝑖𝑖 polonies, and is thus

conservative regarding colocalization counts compared to its formal definition.

The equilibriums of the five reactions in the assumption 3, and the assumption 1 that 𝑅𝑅#/𝑛𝑛 ≫ 𝑃𝑃#, yield

2𝑛𝑛 + 1 equations involving the concentration [𝑃𝑃𝑖𝑖+] of free 𝑃𝑃𝑖𝑖+ and 2𝑛𝑛 equations involving the

concentration [π‘ƒπ‘ƒπ‘–π‘–βˆ’] of free π‘ƒπ‘ƒπ‘–π‘–βˆ’

(S1β€²) [𝑃𝑃𝑖𝑖+]πœπœπ‘…π‘…#𝑛𝑛𝐾𝐾𝐷𝐷

= [(𝑃𝑃𝑖𝑖+𝑅𝑅𝑖𝑖+)𝑆𝑆]

(U1β€²) [𝑃𝑃𝑖𝑖+]πœπœπ‘…π‘…#𝑛𝑛𝑛𝑛 = [(𝑃𝑃𝑖𝑖+𝑅𝑅𝑗𝑗+)π‘ˆπ‘ˆ] (j=1,..,n)

(U2β€²) [𝑃𝑃𝑖𝑖+](1 βˆ’ 𝜏𝜏)𝑅𝑅#

𝑛𝑛𝑛𝑛 = [(𝑃𝑃𝑖𝑖+π‘…π‘…π‘—π‘—βˆ’)π‘ˆπ‘ˆ] (j=1,..,n)

and

(U3β€²) [π‘ƒπ‘ƒπ‘–π‘–βˆ’]πœπœπ‘…π‘…#𝑛𝑛𝑛𝑛 = [(π‘ƒπ‘ƒπ‘–π‘–βˆ’π‘…π‘…π‘—π‘—+)π‘ˆπ‘ˆ] (j=1,..,n)

(U4β€²) [π‘ƒπ‘ƒπ‘–π‘–βˆ’](1 βˆ’ 𝜏𝜏)𝑅𝑅#

𝑛𝑛𝑛𝑛 = [(π‘ƒπ‘ƒπ‘–π‘–βˆ’π‘…π‘…π‘—π‘—βˆ’)π‘ˆπ‘ˆ] (j=1,..,n)

Note that here there is a single (S1β€²) equation involving the one specifically targeting probe 𝑅𝑅𝑖𝑖, but n

instances each of (U1β€²)-(U4β€²), one for each 𝑅𝑅𝑗𝑗 for j=1,..,n.

From the assumption 2 and the equations (S1β€²), (U1β€²) and (U2β€²), we get

W W W. N A T U R E . C O M / N A T U R E | 1 3

SUPPLEMENTARY INFORMATION RESEARCH

14

[𝑃𝑃𝑖𝑖+] + [(𝑃𝑃𝑖𝑖

+𝑅𝑅𝑖𝑖+)𝑆𝑆] + βˆ‘ [(𝑃𝑃𝑖𝑖

+𝑅𝑅𝑗𝑗+)π‘ˆπ‘ˆ]

𝑛𝑛

𝑗𝑗=1+ βˆ‘ [(𝑃𝑃𝑖𝑖

+π‘…π‘…π‘—π‘—βˆ’)π‘ˆπ‘ˆ]

𝑛𝑛

𝑗𝑗=1= 𝛼𝛼𝑃𝑃#

𝑛𝑛

or

[𝑃𝑃𝑖𝑖+] (1 + πœπœπ‘…π‘…#

𝑛𝑛𝐾𝐾𝐷𝐷+ βˆ‘ πœπœπ‘…π‘…#

𝑛𝑛𝑛𝑛

𝑛𝑛

𝑗𝑗=1+ βˆ‘ (1 βˆ’ 𝜏𝜏)𝑅𝑅#

𝑛𝑛𝑛𝑛

𝑛𝑛

𝑗𝑗=1) = [𝑃𝑃𝑖𝑖

+] (1 + πœπœπ‘…π‘…#𝑛𝑛𝐾𝐾𝐷𝐷

+ 𝑅𝑅#𝑛𝑛 ) = 𝛼𝛼𝑃𝑃#

𝑛𝑛

which leads in turn to

[𝑃𝑃𝑖𝑖+] = 𝛼𝛼𝑃𝑃#

𝑛𝑛 + 𝑅𝑅# ( 𝜏𝜏𝐾𝐾𝐷𝐷

+ 𝑛𝑛𝑛𝑛)

= 𝛼𝛼𝑃𝑃#

𝑛𝑛 + 𝑅𝑅#�̃�𝐾𝐷𝐷

= 𝛼𝛼𝑃𝑃#�̃�𝐾𝐷𝐷𝑛𝑛�̃�𝐾𝐷𝐷 + 𝑅𝑅#

where �̃�𝐾 can be interpreted as an adjusted specific dissociation constant

�̃�𝐾𝐷𝐷 = 1𝜏𝜏

𝐾𝐾𝐷𝐷+ 𝑛𝑛

𝑛𝑛

Similarly from the assumption 2 and equations (U3β€²) and (U4β€²) we get

[π‘ƒπ‘ƒπ‘–π‘–βˆ’] =

(1 βˆ’ 𝛼𝛼)𝑃𝑃#

𝑛𝑛 + 𝑛𝑛𝑅𝑅#𝑛𝑛

=(1 βˆ’ 𝛼𝛼)𝑃𝑃#𝑛𝑛𝑛𝑛(𝑛𝑛 + 𝑅𝑅#)

Using equations (S1β€²) and (U1β€²)-(U4β€²), the total concentration [(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)] of (𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖) complexes

between the protein 𝑃𝑃𝑖𝑖 and its specifically targeting probe 𝑅𝑅𝑖𝑖 in any of their active and inactive forms

is

[(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)] = 𝛼𝛼𝑃𝑃#�̃�𝐾𝐷𝐷𝑛𝑛�̃�𝐾𝐷𝐷 + 𝑅𝑅#

(πœπœπ‘…π‘…#𝑛𝑛𝐾𝐾𝐷𝐷

+ 𝑅𝑅#𝑛𝑛𝑛𝑛) +

(1 βˆ’ 𝛼𝛼)𝑃𝑃#𝑛𝑛𝑛𝑛(𝑛𝑛 + 𝑅𝑅#) (𝑅𝑅#

𝑛𝑛𝑛𝑛)

= 𝑃𝑃#𝑅𝑅#𝑛𝑛 (

𝛼𝛼 (1 βˆ’ (𝑛𝑛 βˆ’ 1)�̃�𝐾𝐷𝐷𝑛𝑛 )

𝑛𝑛�̃�𝐾𝐷𝐷 + 𝑅𝑅#+

(1 βˆ’ 𝛼𝛼)𝑛𝑛(𝑛𝑛 + 𝑅𝑅#))

Total free protein concentration can also be computed as

[𝑃𝑃𝑖𝑖+] + [𝑃𝑃𝑖𝑖

βˆ’] = 𝛼𝛼𝑃𝑃#�̃�𝐾𝐷𝐷𝑛𝑛�̃�𝐾𝐷𝐷 + 𝑅𝑅#

+(1 βˆ’ 𝛼𝛼)𝑃𝑃#𝑛𝑛𝑛𝑛(𝑛𝑛 + 𝑅𝑅#) = 𝑃𝑃# ( 𝛼𝛼�̃�𝐾𝐷𝐷

𝑛𝑛�̃�𝐾𝐷𝐷 + 𝑅𝑅#+

(1 βˆ’ 𝛼𝛼)𝑛𝑛𝑛𝑛(𝑛𝑛 + 𝑅𝑅#))

We also have a total concentration [(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗≠𝑖𝑖)] of (𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗) complexes between 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑗𝑗 probes (j

i) that are not targeted to 𝑃𝑃𝑖𝑖, in any of their active and inactive forms. This is simplified as

[(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗≠𝑖𝑖)] = ([𝑃𝑃𝑖𝑖+] + [𝑃𝑃𝑖𝑖

βˆ’]) (𝑛𝑛 βˆ’ 1)𝑅𝑅#𝑛𝑛𝑛𝑛

Finally, we must also consider that probe 𝑅𝑅𝑖𝑖 will be in non-specific complexes with other proteins 𝑃𝑃𝑗𝑗≠𝑖𝑖

than its specific target. By our assumptions above, since all proteins 𝑃𝑃𝑗𝑗 (𝑗𝑗 β‰  𝑖𝑖) behave identically with

SUPPLEMENTARY INFORMATION

1 4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

15

respect to their targeting and non-targeting probes to 𝑃𝑃𝑖𝑖, we have [𝑃𝑃𝑗𝑗+] + [π‘ƒπ‘ƒπ‘—π‘—βˆ’] = [𝑃𝑃𝑖𝑖+] + [π‘ƒπ‘ƒπ‘–π‘–βˆ’] for all

𝑗𝑗 β‰  𝑖𝑖, and therefore that

[(𝑃𝑃𝑗𝑗≠𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)] = ([𝑃𝑃𝑖𝑖+] + [π‘ƒπ‘ƒπ‘–π‘–βˆ’])(𝑛𝑛 βˆ’ 1)𝑅𝑅#

𝑛𝑛𝑛𝑛

Arraying, polony formation, and colocalization statistics

It is now assumed that the mixture is arrayed for SM assaying, and that polonies are formed on the

array. Following the assumption 4, the fractions of polonies relevant to evaluation of 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖 colocalization can be computed as follows:

𝑓𝑓(𝑃𝑃𝑃𝑃) =𝛽𝛽𝛾𝛾[(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)]

𝐢𝐢

Fraction of (𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖) complexes between 𝑃𝑃𝑖𝑖 and its specifically targeting probe 𝑅𝑅𝑖𝑖 that are detectable on the array as intrinsically colocalized polonies

𝑓𝑓(𝑃𝑃𝑃𝑃) =𝛽𝛽𝛾𝛾[(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗≠𝑖𝑖)]

𝐢𝐢

Fraction of (𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗) complexes between 𝑃𝑃𝑖𝑖 and other probes 𝑅𝑅𝑗𝑗 (j i) that are detectable on the array as polonies of 𝑃𝑃𝑖𝑖 that are intrinsically colocalized with those of other probes.

𝑓𝑓𝑃𝑃=𝛽𝛽(1 βˆ’ 𝛾𝛾)([(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)] + [(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗≠𝑖𝑖)]) + 𝛽𝛽([𝑃𝑃𝑖𝑖+] + [π‘ƒπ‘ƒπ‘–π‘–βˆ’])

𝐢𝐢

Fraction of 𝑃𝑃𝑖𝑖 polonies that do not appear intrinsically colocalized with probe polonies

𝑓𝑓(𝑃𝑃𝑃𝑃) =𝛽𝛽𝛾𝛾[(𝑃𝑃𝑗𝑗≠𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)]

𝐢𝐢

Fraction of (𝑃𝑃𝑗𝑗 βˆ™ 𝑅𝑅𝑖𝑖) complexes between probe 𝑅𝑅𝑖𝑖 and other proteins 𝑃𝑃𝑗𝑗 (j i) that are detectable on the array as polonies of 𝑅𝑅𝑖𝑖 that are intrinsically colocalized with the other proteins.

𝑓𝑓𝑃𝑃 =(1 βˆ’ 𝛽𝛽)𝛾𝛾([(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)] + [(𝑃𝑃𝑗𝑗≠𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)]) + 𝛾𝛾 𝑅𝑅

0

𝑛𝑛𝐢𝐢

Fraction of 𝑅𝑅𝑖𝑖 polonies that do not appear intrinsically colocalized with protein polonies

where

𝐢𝐢 = (1 βˆ’ (1 βˆ’ 𝛽𝛽)(1 βˆ’ 𝛾𝛾))[(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)] + 𝛽𝛽[(𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗≠𝑖𝑖)] + 𝛾𝛾[(𝑃𝑃𝑗𝑗≠𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖)] + 𝛽𝛽([𝑃𝑃𝑖𝑖+] + [π‘ƒπ‘ƒπ‘–π‘–βˆ’]) + 𝛾𝛾 𝑅𝑅0

𝑛𝑛

Note that as per the assumption 5, 𝑓𝑓(𝑃𝑃𝑃𝑃) determines the number of intrinsically colocalized 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖

W W W. N A T U R E . C O M / N A T U R E | 1 5

SUPPLEMENTARY INFORMATION RESEARCH

16

polonies found on the array. The other fractions will be used in calculation of the number of randomly

colocalized polonies below. First we will compute the numbers of polonies of the various sorts, and then

we will calculate random colocalization.

Let it now be assumed that 𝑁𝑁𝑖𝑖 polonies are detected for the protein 𝑃𝑃𝑖𝑖. These 𝑁𝑁𝑖𝑖 polonies may

be apportioned as

𝑛𝑛(𝑃𝑃𝑃𝑃) =𝑁𝑁𝑖𝑖𝑓𝑓(𝑃𝑃𝑃𝑃)

𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓𝑝𝑝

Polonies of 𝑃𝑃𝑖𝑖 intrinsically

colocalized with polonies of 𝑅𝑅𝑖𝑖

𝑛𝑛(𝑃𝑃𝑃𝑃) =𝑁𝑁𝑖𝑖𝑓𝑓(𝑃𝑃𝑃𝑃)

𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓𝑝𝑝

Polonies of 𝑃𝑃𝑖𝑖 intrinsically

colocalized with polonies of other

probes 𝑅𝑅𝑗𝑗 (𝑗𝑗 β‰  𝑖𝑖)

𝑛𝑛𝑝𝑝 = 𝑁𝑁𝑖𝑖𝑓𝑓𝑃𝑃𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓𝑝𝑝

Polonies of 𝑃𝑃𝑖𝑖 that are not

intrinsically colocalized with probe

polonies.

It follows from the frequencies derived above that the following numbers of polonies are detected for

the probe 𝑅𝑅𝑖𝑖 apart that are not counted with the 𝑁𝑁𝑖𝑖 𝑃𝑃𝑖𝑖 protein polonies above (the only 𝑅𝑅𝑖𝑖 polonies

considered with the 𝑁𝑁𝑖𝑖 polonies above are the 𝑛𝑛(𝑃𝑃𝑃𝑃) instances of 𝑅𝑅𝑖𝑖 polonies colocalized with 𝑃𝑃𝑖𝑖

polonies).

𝑛𝑛(𝑃𝑃𝑃𝑃) =𝑁𝑁𝑖𝑖𝑓𝑓(𝑃𝑃𝑃𝑃)

𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓𝑝𝑝

Polonies of 𝑅𝑅𝑖𝑖 intrinsically

colocalized with polonies of other

proteins 𝑃𝑃𝑗𝑗 (𝑗𝑗 β‰  𝑖𝑖)

𝑛𝑛𝑃𝑃 = 𝑁𝑁𝑖𝑖𝑓𝑓𝑃𝑃𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓(𝑃𝑃𝑃𝑃) + 𝑓𝑓𝑝𝑝

Polonies of 𝑅𝑅𝑖𝑖 that are not

intrinsically colocalized with protein

polonies.

In preparing to compute random colocalization and the final PCCF statistic, a question arises in the

context of our highly multiplexed SM assay as to whether 𝑃𝑃𝑖𝑖 polonies from both uncomplexed 𝑃𝑃𝑖𝑖+ and

π‘ƒπ‘ƒπ‘–π‘–βˆ’ objects vs. 𝑃𝑃𝑖𝑖 polonies formed from 𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗≠𝑖𝑖 complexes should be treated equivalently regarding

whether they can be randomly colocalized (and similarly for 𝑅𝑅𝑖𝑖 polonies). It could be the case that 𝑃𝑃𝑖𝑖

polonies formed within complexes cannot be colocalized with 𝑅𝑅𝑖𝑖 polonies to the degree that 𝑃𝑃𝑖𝑖

polonies formed from uncomplexed 𝑃𝑃𝑖𝑖 objects can due to steric constraints or other factors. In non-

multiplexed assays, such as those considered in 39, this question never arises because the non-targeting

partners in 𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑗𝑗≠𝑖𝑖 and 𝑃𝑃𝑗𝑗≠𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖 complexes would never be surveyed for detection, and the resulting

SUPPLEMENTARY INFORMATION

1 6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

17

𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖 polonies would all be considered isolated objects that could appear near each other by chance

in the same way. A broader issue concerns the fact that the PCCF is specifically a Pair Cross-Correlation

Function 39, and the question arises whether for multiplexed assays it might be better to develop and

employ a higher-order multi-variate statistic that compares actual vs. expected random colocalization for

many kinds of objects at once, somewhat like multi-variate ANOVAs analyze variances of many

variables and interactions at once. However, in this initial model, we will in fact treat polonies derived

from free probe and protein molecules vs. complexes equivalently in terms of their potential for random

colocalization within the constraints indicated in the assumption 5. Notably, even when only considering

pairwise colocalization, such as the application of PCCF in 39, where objects are labeled antibodies, the

prima facie distinction between objects colocalized by virtue of targeting physical interactions and

isolated objects that appear as random background is an idealization, since the apparently isolated objects

are likely interacting non-specifically with many other kinds of unsurveyed molecules and complexes in

the cell matrix, and PCCF remains a useful statistic even though these interactions are ignored.

Random colocalization

As noted in assumption 5 and discussed in the comment above, random colocalization will be

considered between 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖 polonies that do not arise from intrinsic colocalization from 𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖

complexes. We now know the number of such polonies to be 𝑛𝑛(𝑃𝑃𝑃𝑃) + 𝑛𝑛𝑃𝑃 for 𝑃𝑃𝑖𝑖, and 𝑛𝑛(𝑃𝑃𝑋𝑋) + 𝑛𝑛𝑋𝑋 for

𝑅𝑅𝑖𝑖. Given imaged array area A and polony radius r, we can estimate the density of these 𝑅𝑅𝑖𝑖 polonies

that could appear anywhere on the array by chance as

πœŒπœŒπ‘‹π‘‹ =𝑛𝑛𝑋𝑋 + 𝑛𝑛(𝑃𝑃𝑋𝑋)

𝐴𝐴and the probability of a probe 𝑅𝑅𝑖𝑖 polony appearing in the vicinity of a 𝑃𝑃𝑖𝑖 protein polony by chance

would then be

πœ‹πœ‹(2π‘Ÿπ‘Ÿ)2πœŒπœŒπ‘‹π‘‹

Thus, the expected number of the 𝑛𝑛(𝑃𝑃𝑃𝑃) + 𝑛𝑛𝑃𝑃 𝑃𝑃𝑖𝑖 polonies that will have an 𝑅𝑅𝑖𝑖 polony localized nearby

by chance will be

𝑛𝑛(𝑃𝑃𝑋𝑋)π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ = πœ‹πœ‹(2π‘Ÿπ‘Ÿ)2πœŒπœŒπ‘‹π‘‹ (𝑛𝑛(𝑃𝑃𝑃𝑃) + 𝑛𝑛𝑃𝑃)

Thus, the total number of 𝑃𝑃𝑖𝑖 polonies colocalized with 𝑅𝑅𝑖𝑖complexes will be

𝑛𝑛(𝑃𝑃𝑋𝑋)𝑑𝑑𝑑𝑑𝑑𝑑 = 𝑛𝑛(𝑃𝑃𝑋𝑋) + 𝑛𝑛(𝑃𝑃𝑋𝑋)

π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ

PCCF statistic

To complete the PCCF statistic as specified in assumption 5, we must divide 𝑛𝑛(𝑃𝑃𝑋𝑋)𝑑𝑑𝑑𝑑𝑑𝑑 by the expected

number of 𝑃𝑃𝑖𝑖 polonies colocalized with 𝑅𝑅𝑖𝑖 polonies assuming that all of these individual polonies

W W W. N A T U R E . C O M / N A T U R E | 1 7

SUPPLEMENTARY INFORMATION RESEARCH

18

(including the ones in 𝑃𝑃𝑖𝑖 βˆ™ 𝑅𝑅𝑖𝑖 complexes) could be colocalized by chance. Similar to the logic above,

the total density of 𝑅𝑅𝑖𝑖 objects will now be

𝑅𝑅 =𝑛𝑛𝑅𝑅 + 𝑛𝑛(𝑃𝑃𝑅𝑅) + 𝑛𝑛(𝑋𝑋𝑅𝑅)

𝐴𝐴and the probability of a probe 𝑅𝑅𝑖𝑖 polony appearing in the vicinity of a 𝑃𝑃𝑖𝑖 protein polony will then be

πœ‹πœ‹(2π‘Ÿπ‘Ÿ)2𝑅𝑅𝑁𝑁𝑖𝑖

and, therefore

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝑃𝑃#, 𝑅𝑅#, 𝑅𝑅0, 𝑛𝑛, 𝛼𝛼, 𝜏𝜏, 𝐾𝐾𝐷𝐷, π‘ˆπ‘ˆ, 𝛽𝛽, 𝛾𝛾, 𝑁𝑁𝑖𝑖, 𝐴𝐴, π‘Ÿπ‘Ÿ) =𝑛𝑛(𝑃𝑃𝑅𝑅)

𝑑𝑑𝑑𝑑𝑑𝑑

πœ‹πœ‹(2π‘Ÿπ‘Ÿ)2𝑅𝑅𝑁𝑁𝑖𝑖

Random simulations

To estimate the degree of variation to which the PCCF statistic may be subject under a given set

parameters, we compute a distribution of PCCF values using the formula above assuming that the six

terms 𝑛𝑛𝑃𝑃, 𝑛𝑛(𝑃𝑃𝑅𝑅), 𝑛𝑛(𝑃𝑃𝑋𝑋), 𝑛𝑛(𝑋𝑋𝑅𝑅), 𝑛𝑛𝑅𝑅 , and 𝑛𝑛(𝑃𝑃𝑅𝑅)π‘Ÿπ‘Ÿπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž are all randomly drawn from Poisson distributions

whose means are the values computed above within the model. Because these simulations do not take

into account variation in actual samples or assay conditions, and because Poisson error may itself

underrepresent the variability inherent in the underlying system vs. the model, these estimates must be

considered lower bounds for the variance that will be encountered in actual assays.

Detection of specific vs. non-specific binding as a function of 𝑲𝑲𝑫𝑫 and 𝒏𝒏

As an application of the model, we compare the PCCF values computed for a mixture of 𝑛𝑛 proteins

and targeting probes that specifically interact with dissociation constant 𝐾𝐾𝐷𝐷, where 𝑛𝑛 is allowed to

vary over a large range, with the PCCF for mixtures of the same numbers of proteins and probes, in

which all the proteins and probes interact only non-specifically with dissociation constant π‘ˆπ‘ˆ . In

particular, we assume an array in which 5Γ—108 protein polonies can be detected, and that these are divided

equally among the 𝑛𝑛 proteins, where 𝑛𝑛 is allowed to range between 500 and 500,000 (so that the

number of detected polonies per protein species 𝑁𝑁𝑖𝑖 correspondingly varies between 1,000,000 and

1,000). We consider three specific dissociation constants 𝐾𝐾𝐷𝐷 , and compute non-specific PCCFs by

letting 𝐾𝐾𝐷𝐷 β†’ ∞. All parameters other than 𝐾𝐾𝐷𝐷 , 𝑛𝑛, and 𝑁𝑁𝑖𝑖 are assigned the following fixed values

consistent with literature and experimental data.

𝑃𝑃# = 20 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 /100 πœ‡πœ‡πœ‡πœ‡

Approximate values which can be used in the assay 𝑅𝑅# = 200 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 /100 πœ‡πœ‡πœ‡πœ‡

𝑅𝑅0 = 100 𝑛𝑛𝑛𝑛

𝐴𝐴 = 75 Γ— 25 𝑝𝑝𝑝𝑝2 Standard microscope slide area

π‘Ÿπ‘Ÿ = 0.7 πœ‡πœ‡π‘π‘ Colocalization threshold distance used in our experiments

SUPPLEMENTARY INFORMATION

1 8 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

19

𝛼𝛼 = 0.8 Approximate values based on our test of a few proteins

𝜏𝜏 = 0.8

π‘ˆπ‘ˆ = 10 πœ‡πœ‡πœ‡πœ‡ Assumed non-specific protein–probe complex dissociation

constant

𝛽𝛽 = 0.75 Approximate value based on this study and our previous

measure7 𝛾𝛾 = 0.75

Results are summarized in the figure below. In this figure, error bars span the range of the 1st and

99th percentiles of randomly simulated PCCF distributions as described above, with the following

exception(s): (i) For large values of 𝑛𝑛, the 99th percentile of the non-specific PCCF distribution was no

more than the central value computed by the model so that the upper error bar could be at or below the

central value. In these cases the maximum value observed in the PCCF distribution was used to set the

upper error bar instead of the 99th percentile, and the upper end of the error bar was marked with an

asterisk (*). This situation arises because the number of 𝑃𝑃𝑖𝑖 and 𝑅𝑅𝑖𝑖 polonies becomes very small so that

simulations result in no or very few colocalized polonies except for a small number of outliers. (ii)

Because PCCFs are presented below via their log10 values, PCCF values of 0 cannot be portrayed directly.

However, in some cases the 1st percentiles of PCCF values were 0, and this is indicated by the use of a

downward pointing arrowhead on the lower error bars. Note that markers and error bars are slightly

jittered in order to allow overlapping error bars to be seen clearly. For each set of 𝐾𝐾𝐷𝐷, 𝑛𝑛, and 𝑁𝑁𝑖𝑖 values,

10,000 random simulations were performed.

A conclusion that may be drawn from these simulations is that order-of-magnitude differences

between specific 𝐾𝐾𝐷𝐷s can be clearly distinguished from each and from non-specific binding in mixtures

of up to ~63,000 distinct protein and probe species under the conditions assumed in the model. Note,

however, that while the lack of overlap between error bars that indicate 1st and 99th percentiles implies

that the PCCF distributions for these different 𝐾𝐾𝐷𝐷s overlap with P < 0.0001, these probabilities are not

corrected for multiple hypotheses.

W W W. N A T U R E . C O M / N A T U R E | 1 9

SUPPLEMENTARY INFORMATION RESEARCH

20

SUPPLEMENTARY INFORMATION

2 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

21

SUPPLEMENTARY DISCUSSION

Comparison of protein interaction profiling technologies based on nucleic acid barcoding and

high-throughput sequencing

A large set of techniques have been developed to study protein–protein interaction and their features

and applications have been well covered by numerous reviews53-57. These techniques are built on a

variety of protein detection methods (e.g., mass spectrometry, immunostaining, spectrophotometry,

etc.) and some of them are also applicable to other types of interactions, such as protein–nucleic acid

and protein–small molecule interactions. Many techniques can be categorized broadly as β€œprotein

barcoding” technologies, including in vivo and in vitro approaches using DNA sequences, such as

protein coding sequences (CDS) or non-CDSs, to identify proteins of interest. Of note, some

techniques have been successfully adapted for use with massively parallel sequencing technologies to

improve throughput and cost-effectiveness16,27,40,42-44,58-60. Here, we compare them with SMI-Seq with

focus on differences in the protein barcoding and decoding methods employed (Extended Data Table

3). Their applications demonstrated for compound screening16,30,45-50,61 are also included in the table.

Protein barcoding methods. A variety of protein barcoding methods can be grouped into two general

categories. The first category includes those that couple proteins and DNAs in natural or synthetic

compartments. Yeast two-hybrid (Y2H)62-64 and protein-fragment complementation assay (PCA)65-68

are well-established in vivo techniques in which proteins and barcoding DNAs (CDSs) are paired in

cellular compartments. Protein–protein interactions are detected in intracellular environments with the

help of a transcriptional or spectroscopic reporter. They are relatively easy to implement and have

successfully been applied to screen Gateway-compatible ORFeome libraries for interactome

mapping2,31,69,70. Other prominent examples belonging to this category are cell or virus-based protein

display where proteins of interest are presented on the surface of cells or viral particles and can directly

be subjected to binding assays. Cell-based displays can happen in nature, e.g., immunoglobulin

expression on B lymphocyte surface, or can be engineered in various expression systems (e.g., phage

display71,72, yeast display73,74, bacterial display75,76 and mammalian cell display77,78). Similarly,

coupling of proteins to their DNA templates can be achieved in non-biological compartments, such as

water-in-oil emulsions, via in vitro transcription and translation (e.g., bead surface display79,80).

Although all these techniques are of great utility for screening protein–protein interactions, it is

difficult to use them to obtain quantitative measures of protein binding, partly because each

compartment contains different numbers of protein molecules and their effective concentrations are

thus variables that are difficult to control and measure.

W W W. N A T U R E . C O M / N A T U R E | 2 1

SUPPLEMENTARY INFORMATION RESEARCH

22

The other category of protein barcoding is to molecularly attach DNAs (or RNAs) to proteins.

Molecular junctions can be obtained simply by non-covalent binding, e.g., formation of biotinylated

protein–streptavidin–biotinylated DNA complexes, or by covalent chemical crosslinking (refer to

Pierce crosslinking reagents technical handbook81) or enzymatic conjugation (e.g. sortase82, SNAP

tag30, etc.). In principle, these methods are applicable to almost all proteins and complexes that can be

functionally produced in available expression systems; however, because proteins and DNAs need to

be individually coupled, the cost scales almost linearly with library size. In contrast, cell-free protein

display techniques, such as ribosome display6, mRNA display83 and DNA display84, enable one-pot

barcoding of a whole library (up to 1015 proteins) and the time and effort required for each assay are

independent of library size. Nevertheless, the choice of proteins which can be synthesized in a

functional form by in vitro display systems can be limited by the lack of factors that assist protein

synthesis, folding, modification and assembly. mRNA display was found to only work efficiently for

small proteins (≀ 300 amino acids)85. In addition to above methods, proteins can be indirectly barcoded

by binding to barcoded antibodies or nucleic acid aptamers (e.g., proximity ligation assay (PLA)13,14

and proximity extension assay (PEA)15). The use of capture reagents allows direct analyses of proteins

from biological samples and has very versatile applications. These techniques have been used to

measure protein abundance13,44,86-88 and to detect protein–protein and protein–DNA interactions14,89-91

and post-translational modifications92,93, as well as to screen compounds50. However, a limitation of

these techniques is that they require capture reagents of both high affinity and specificity that can be

difficult to produce, and this can be especially constraining in the context of multiplexed binding assays

with large libraries. In general, compared with compartmentation, DNA-attached proteins can be

precisely quantitated by measuring the abundance of their DNA barcodes, thus providing a basis for

the quantification of protein interactions.

Quantification of protein interactions by high-throughput sequencing. High-throughput protein

interaction screening involves detection and quantitation of barcoding DNAs of interacting proteins.

DNA barcodes can be quantified by real-time PCR, microarray hybridization or next-generation

sequencing (NGS). However, for large-scale measurements, NGS technologies hold distinct

throughput and cost advantages and are quickly coming into wide use (Extended Data Table 3). For

example, NGS has been applied to Y2H to quantitate the enrichment level of each positive interactor

(QIS-Seq)40. In a library vs. library screening, genes of each interacting pair need to be individually

joined together by PCR prior to sequencing (Stitch-Seq)27, thus imposing a limit on the throughput.

NGS has widely been used for in-depth profiling of complex antibody repertoires by simultaneously

analyzing immunoglobulin genes from millions of B cells (Ig-seq, recently reviewed by Georgiou et

SUPPLEMENTARY INFORMATION

2 2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

23

al.41). Likewise, it has also been coupled with phage display for autoantigen discovery (PhIP-Seq)42

and in vitro antibody selection60, mRNA display for screening proteins generated from random cDNA

fragments (IVV-HiTSeq)43 and ribosome display for the interaction profiling of full-length human

ORFeome (PLATO)16,61. Moreover, NGS was applied to PLA for simultaneous quantitation of 35

proteins in blood plasma (ProteinSeq)44, and PEA for a one-pot binding assay with three barcoded

proteins and 262 barcoded small molecules (IDUP)30.

While NGS techniques can provide true digital quantification of protein molecules through their

DNA barcodes, the ability to precisely quantitate protein interactions can be affected when it is

necessary to separate interacting from non-interacting proteins and sequence the interacting protein

barcodes alone. This is because the sequencing data do not contain those of the quantities of the protein

molecules that did not interact with the baits or probes that are required to calculate binding affinities.

Separations of this sort include growth selection in medium (Y2H), flow cytometry sorting (cell-based

protein display) and affinity enrichment (cell-free protein display). In principle, this problem can be

alleviated by pre-controlling protein concentrations or measuring them in an additional assay. This is

not possible for all detection methods (such as Y2H, see above), but even when possible, these methods

can introduce biases and extra sources of variance compared to in situ sequencing of a whole mixtures,

in which the abundances of both free and interacting proteins can be measured in the same assay.

In situ SM quantification. In addition to sharing advantages of other techniques, such as highly

efficient barcoded library construction conferred by ribosome display, SMI-Seq presents a fundamental

new advantage in its use of in situ SM sequencing to simultaneously identify and count both bound

and unbound proteins in solution. In situ counting of numerous different SM proteins in solution can

lead to ultimate sensitivity and accuracy5,94 and has been a major goal for modern analytical techniques

because it can dramatically increase assay throughput and multiplexity. This is demonstrated by our

ability to conduct a 200Γ—55 library-by-library screen, much larger than the 5Γ—5 demonstration provided

by its most similar non-SM method14, and our mathematical modeling suggests that theoretically,

interactions of tens of thousands of proteins with tens of thousands probe proteins could be

quantitatively measured in a one-pot assay based on half billion polony reads, a throughput within the

capability of current NGS platforms. Even though the assays were performed in a library vs. library

format in this work, this technique holds the promise of direct molecular counting of all-by-all pairwise

or even higher-order interactions in a complex mixture. SMI-Seq, as well as recent in situ sequencing

techniques95-98, represents a further extension of how imaging-based sequencing technology can glean

new and valuable information by analyzing the spatial patterning as well as the sequence content and

numbers of arrayed DNAs.

W W W. N A T U R E . C O M / N A T U R E | 2 3

SUPPLEMENTARY INFORMATION RESEARCH

24

Supplementary Figure 1 | Expression vectors used in this study. pRD-NHA-SecM was used to

generate ribosome display-barcoded proteins; pEco-CSBPHis, pEco-CHaloFlagHis and pEco-NHalo-

CHis were applied to E. coli in vivo and in vitro protein expression; pBac-NFlagHA was applied to

Baculovirus expression of GPCRs; pIRES-CHaloFlagHis and pIRES-CHaloFlagHis-Gateway were

used to express HaloTagged proteins in the human IVT system. T7 pro., T7 promoter sequence; T7

term., T7 terminator sequence; polyhedrin pro., polyhedrin promoter sequence; RBS, ribosomal

binding site; IRES, internal ribosome entry site. DNA sequences can be found in Supplementary Table

5.

SUPPLEMENTARY INFORMATION

2 4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

25

SUPPLEMENTARY REFERENCES

2 Dreze, M. et al. High-quality binary interactome mapping. Methods Enzymol. 470, 281-315 (2010).

5 Weiss, S. Fluorescence spectroscopy of single biomolecules. Science 283, 1676-1683 (1999). 6 Hanes, J. & Pluckthun, A. In vitro selection and evolution of functional proteins by using

ribosome display. Proc. Natl. Acad. Sci. U.S.A. 94, 4937-4942 (1997). 7 Mitra, R. D. & Church, G. M. In situ localized amplification and contact replication of many

individual DNA molecules. Nucleic Acids Res. 27, e34 (1999). 10 Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator

chemistry. Nature 456, 53-59 (2008). 11 Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome.

Science 309, 1728-1732 (2005). 13 Fredriksson, S. et al. Protein detection using proximity-dependent DNA ligation assays. Nat.

Biotechnol. 20, 473-477 (2002). 14 Hammond, M., Nong, R. Y., Ericsson, O., Pardali, K. & Landegren, U. Profiling cellular

protein complexes by proximity ligation with dual tag microarray readout. PLoS One 7 (2012).

15 Lundberg, M., Eriksson, A., Tran, B., Assarsson, E. & Fredriksson, S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res. 39, e102 (2011).

16 Zhu, J. et al. Protein interaction discovery using parallel analysis of translated ORFs (PLATO). Nat. Biotechnol. 31, 331-334 (2013).

27 Yu, H. et al. Next-generation sequencing to generate interactome datasets. Nat. Methods 8, 478-480 (2011).

30 McGregor, L. M., Jain, T. & Liu, D. R. Identification of ligand-target pairs from combined libraries of small molecules and unpurified protein targets in cell lysates. J. Am. Chem. Soc. 136, 3264-3270 (2014).

31 Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nature Methods 8, 659-661 (2011).

39 Philimonenko, A. A., Janacek, J. & Hozak, P. Statistical evaluation of colocalization patterns in immunogold labeling experiments. J. Struct. Biol. 132, 201-210 (2000).

40 Lewis, J. D. et al. Quantitative interactor screening with next-generation sequencing (QIS-Seq) identifies Arabidopsis thaliana MLO2 as a target of the Pseudomonas syringae type III effector HopZ2. BMC Genomics 13 (2012).

41 Georgiou, G. et al. The promise and challenge of high-throughput sequencing of the antibody repertoire. Nat. Biotechnol. 32, 158-168 (2014).

42 Larman, H. B. et al. Autoantigen discovery with a synthetic human peptidome. Nat. Biotechnol. 29, 535-541 (2011).

43 Fujimori, S. et al. Next-generation sequencing coupled with a cell-free display technology for high-throughput production of reliable interactome data. Sci. Rep. 2, 691-691 (2012).

44 Darmanis, S. et al. ProteinSeq: high-performance proteomic analyses by proximity ligation and next generation sequencing. PLoS One 6 (2011).

45 Young, K. et al. Identification of a calcium channel modulator using a high throughput yeast two-hybrid screen. Nat. Biotechnol. 16, 946-950 (1998).

46 Nishihara, T. et al. Estrogenic activities of 517 chemicals by yeast two-hybrid assay. J. Health Sci. 46, 282-298 (2000).

47 Chidley, C., Haruki, H., Pedersen, M. G., Muller, E. & Johnsson, K. A yeast-based screen reveals that sulfasalazine inhibits tetrahydrobiopterin biosynthesis. Nat. Chem. Biol. 7, 375-383 (2011).

W W W. N A T U R E . C O M / N A T U R E | 2 5

SUPPLEMENTARY INFORMATION RESEARCH

26

48 Wrighton, N. C. et al. Small peptides as potent mimetics of the protein hormone erythropoietin. Science 273, 458-463 (1996).

49 Lowman, H. B. Bacteriophage display and discovery of peptide leads for drug development. Annu. Rev. Biophys. Biomol. Struct. 26, 401-424 (1997).

50 Leuchowius, K. J. et al. High content screening for inhibitors of protein interactions and post-translational modifications in primary cells by proximity ligation. Mol. Cell. Proteomics. 9, 178-183 (2010).

51 Porreca, G. J., Shendure, J. & Church, G. M. Polony DNA sequencing. Curr. Protoc. Mol. Biol. 7.8 (2006).

52 Hanisch, K. H. & Stoyan, D. Formulas for second-order analysis of marked point processes. Math. Operationsforsch. Statist., Ser. Statitics 14, 559 (1979).

53 Phizicky, E. M. & Fields, S. Protein-protein interactions: methods for detection and analysis. Microbiol. Rev. 59, 94-123 (1995).

54 Zhu, H. & Snyder, M. Protein chip technology. Curr. Opin. Chem. Biol. 7, 55-63 (2003). 55 Piehler, J. New methodologies for measuring protein interactions in vivo and in vitro. Curr.

Opin. Struct. Biol. 15, 4-14 (2005). 56 Boozer, C., Kim, G., Cong, S., Guan, H. & Londergan, T. Looking towards label-free

biomolecular interaction analysis in a high-throughput format: a review of new surface plasmon resonance technologies. Curr. Opin. Biotechnol. 17, 400-405 (2006).

57 Berggard, T., Linse, S. & James, P. Methods for the detection and analysis of protein-protein interactions. Proteomics 7, 2833-2842 (2007).

58 Dias-Neto, E. et al. Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysis. PLoS One 4 (2009).

59 Mendez-Rios, J. & Uetz, P. Global approaches to study protein-protein interactions among viruses and hosts. Future Microbiol. 5, 289-301 (2010).

60 Ravn, U. et al. By-passing in vitro screening-next generation sequencing technologies applied to antibody display and in silico candidate selection. Nucleic Acids Res. 38 (2010).

61 Larman, H. B., Liang, A. C., Elledge, S. J. & Zhu, J. Discovery of protein interactions using parallel analysis of translated ORFs (PLATO). Nat. Protoc. 9, 90-103 (2014).

62 Fields, S. & Song, O. K. A novel genetic system to detect protein-protein interactions. Nature 340, 245-246 (1989).

63 Chien, C. T., Bartel, P. L., Sternglanz, R. & Fields, S. The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. Proc. Natl. Acad. Sci. U. S. A. 88, 9578-9582 (1991).

64 Rual, J. F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173-1178 (2005).

65 Rossi, F., Charlton, C. A. & Blau, H. M. Monitoring protein-protein interactions in intact eukaryotic cells by beta-galactosidase complementation. Proc. Natl. Acad. Sci. U. S. A. 94, 8405-8410 (1997).

66 Stagljar, I., Korostensky, C., Johnsson, N. & te Heesen, S. A genetic system based on split-ubiquitin for the analysis of interactions between membrane proteins in vivo. Proc. Natl. Acad. Sci. U. S. A. 95, 5187-5192 (1998).

67 Remy, I. & Michnick, S. W. Clonal selection and in vivo quantitation of protein interactions with protein-fragment complementation assays. Proc. Natl. Acad. Sci. U. S. A. 96, 5394-5399 (1999).

68 Pelletier, J. N., Arndt, K. M., Pluckthun, A. & Michnick, S. W. An in vivo library-versus-library selection of optimized protein-protein interactions. Nat. Biotechnol. 17, 683-690 (1999).

69 Rual, J. F., Hill, D. E. & Vidal, M. ORFeome projects: gateway between genomics and omics. Curr. Opin. Chem. Biol. 8, 20-25 (2004).

SUPPLEMENTARY INFORMATION

2 6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

27

70 Yashiroda, Y., Matsuyama, A. & Yoshida, M. New insights into chemical biology from ORFeome libraries. Curr. Opin. Chem. Biol. 12, 55-59 (2008).

71 Clackson, T., Hoogenboom, H. R., Griffiths, A. D. & Winter, G. Making antibody fragments using phage display libraries. Nature 352, 624-628 (1991).

72 Bratkovic, T. Progress in phage display: evolution of the technique and its applications. Cell. Mol. Life Sci. 67, 749-767 (2010).

73 Kieke, M. C., Cho, B. K., Boder, E. T., Kranz, D. M. & Wittrup, K. D. Isolation of anti-T cell receptor scFv mutants by yeast surface display. Protein Eng. 10, 1303-1310 (1997).

74 Gai, A. S. & Wittrup, D. K. Yeast surface display for protein engineering and characterization. Curr. Opin. Struct. Biol. 17, 467-473 (2007).

75 Stahl, S. & Uhlen, M. Bacterial surface display: Trends and progress. Trends Biotechnol. 15, 185-192 (1997).

76 Samuelson, P., Gunneriusson, E., Nygren, P. A. & Stahl, S. Display of proteins on bacteria. J. Biotechnol. 96, 129-154 (2002).

77 Beerli, R. R. et al. Isolation of human monoclonal antibodies by mammalian cell display. Proc. Natl. Acad. Sci. U. S. A. 105, 14336-14341 (2008).

78 Zhou, C., Jacobsen, F. W., Cai, L., Chen, Q. & Shen, W. D. Development of a novel mammalian cell surface antibody display platform. MAbs 2, 508-518 (2010).

79 Diamante, L., Gatti-Lafranconi, P., Schaerli, Y. & Hollfelder, F. In vitro affinity screening of protein and peptide binders by megavalent bead surface display. Protein Eng. Des. Sel. 26, 713-724 (2013).

80 Huang, L. C. et al. Linking genotype to phenotype on beads: high throughput selection of peptides with biological function. Sci. Rep. 3 (2013).

81 http://www.piercenet.com/page/crosslinking-reagents-technical-handbook-1601673. 82 Pritz, S. et al. Synthesis of biologically active peptide nucleic acid-peptide conjugates by

sortase-mediated ligation. J. Org. Chem. 72, 3909-3912 (2007). 83 Wilson, D. S., Keefe, A. D. & Szostak, J. W. The use of mRNA display to select high-affinity

protein-binding peptides. Proc. Natl. Acad. Sci. U.S.A. 98, 3750-3755 (2001). 84 Yonezawa, M., Doi, N., Kawahashi, Y., Higashinakagawa, T. & Yanagawa, H. DNA display

for in vitro selection of diverse peptide libraries. Nucleic Acids Res. 31 (2003). 85 Cotten, S. W., Zou, J. W., Valencia, C. A. & Liu, R. H. Selection of proteins with desired

properties from natural proteome libraries using mRNA display. Nat. Protoc. 6, 1163-1182 (2011).

86 Schallmeiner, E. et al. Sensitive protein detection via triple-binder proximity ligation assays. Nat. Methods 4, 135-137 (2007).

87 Darmanis, S. et al. Sensitive plasma protein analysis by microparticle-based proximity ligation assays. Mol. Cell. Proteomics 9, 327-335 (2010).

88 Ke, R., Nong, R. Y., Fredriksson, S., Landegren, U. & Nilsson, M. Improving precision of proximity ligation assay by amplified single molecule detection. PLoS One 8 (2013).

89 Soderberg, O. et al. Direct observation of individual endogenous protein complexes in situ by proximity ligation. Nat. Methods 3, 995-1000 (2006).

90 Gustafsdottir, S. M. et al. In vitro analysis of DNA-protein interactions by proximity ligation. Proc. Natl. Acad. Sci. U. S. A. 104, 3067-3072 (2007).

91 Soderberg, O. et al. Characterizing proteins and their interactions in cells and tissues using the in situ proximity ligation assay. Methods 45, 227-232 (2008).

92 Jarvius, M. et al. In situ detection of phosphorylated platelet-derived growth factor receptor beta using a generalized proximity ligation method. Mol. Cell. Proteomics 6, 1500-1509 (2007).

93 Leuchowius, K. J., Weibrecht, I., Landegren, U., Gedda, L. & Soderberg, O. Flow cytometric in situ proximity ligation analyses of protein interactions and post-translational modification

W W W. N A T U R E . C O M / N A T U R E | 2 7

SUPPLEMENTARY INFORMATION RESEARCH

28

of the epidermal growth factor receptor family. Cytometry A. 75A, 833-839 (2009). 94 Walt, D. R. Optical methods for single molecule detection and analysis. Anal. Chem. 85,

1258-1263 (2013). 95 Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods

10, 857-860 (2013). 96 Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360-

1363 (2014). 97 Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput

sequencing instrument. Nat. Biotechnol. 29, 659-664 (2011). 98 Buenrostro, J. D. et al. Quantitative analysis of RNA-protein interactions on a massively

parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562-568 (2014).