supplementary information - nature. 4. remove dna templates from mrnas by using a turbo dna -free...
TRANSCRIPT
W W W. N A T U R E . C O M / N A T U R E | 1
SUPPLEMENTARY INFORMATIONdoi:10.1038/nature13761
2
SUPPLEMENTARY METHODS
Ribosome display-based protein barcoding
This section describes the procedure to generate PRMC complexes from barcoded DNA templates:
1. Prepare linear barcoded DNA templates by PCR as follows:
PCR components Volume (Β΅L) Template (e.g., plasmids, refer to Supplementary Table 5 for DNA sequences)
x (~5-10 ng)
Upstream barcoded primer (βTemplate barcoded (F)β, 100 ΞΌM)
0.5
Downstream universal primer (βTemplate (R)β, 100 ΞΌM)
0.5
10Γ PCR buffer (supplied with Taq) 5 MgCl2 (50 mM) 1.5 DMSO 2.5 Betaine (5 M) 10 dNTPs (10 mM each) 1 Platinum Taq DNA polymerase (5U/ ΞΌl) 0.4 dH2O to 50
30 cycles of thermal cycling with an annealing temperature of 54ΛC. Note: 1) All oligos used in this
study were purchased from Integrated DNA Technologies and the sequences can be found in
Supplementary Table 5. 2) DMSO and betaine were found to enhance the yield.
2. Purify PCR products with a QIAquick PCR purification kit (Qiagen) and measure their
concentrations by NanoDrop. Mix barcoded DNA templates for subsequent assays and analyses.
3. In vitro transcribe mRNA templates by using a HiScribe T7 kit (NEB) as follows:
In vitro transcription components Volume (Β΅L) Mixed linear DNA templates x (~6-10 ΞΌg) 10Γ transcription buffer (supplied with the kit) 20 20x ribonucleotide mix (supplied with the kit) 10 20x HMW mix (supplied with the kit) 10 T7 polymerase (500 U/ΞΌl) 10 dH2O to 200
The IVT reaction is incubated at 42ΛC for 2 h. Note: A long incubation time can increase mRNA
hydrolysis.
4. Remove DNA templates from mRNAs by using a TURBO DNA-free kit (Cat# AM1907, Ambion),
and purify transcribed mRNAs with an RNeasy Mini kit (Qiagen). Purified mRNAs can be stored at -
80ΛC for later use.
5. Prepare mRNA-cDNA hybrids via reverse transcription as follows:
SUPPLEMENTARY INFORMATION
2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
3
Reverse transcription components (Mix 1) Volume (Β΅L) mRNA templates x (~0.2 ΞΌM) RT primer (βRT primerβ, 10 ΞΌM) 20 dNTPs (10 mM each) 10 dH2O to 110
Incubate the Mix 1 at 65Β°C for 5 min and then place it on ice for at least 1 min.
Reverse transcription components (Mix 2) Volume (Β΅L) 10Γ RT buffer 20 MgCl2 (25 mM) 40 DTT (0.1 M) 10 RNaseOUT (40 U/Β΅L) 10 SuperScript III RT (200 U/Β΅L) 10
Incubate the mixture of Mix 1 and 2 at 50ΛC for ~30 min. The RT reaction can be scaled up (e.g., β₯1
mL) using multiple tubes. Note: Avoid a prolonged incubation of mRNAs in the presence of Mg2+ to
alleviate mRNA hydrolysis.
6. Precipitate mRNAβcDNA hybrids in the reaction mixture by using isopropanol. For example, add
60 Β΅L ammonium acetate (5 M), 12 Β΅L EDTA (0.5 M) to a 0.5-mL reaction mixture and then mix it
with 0.6 mL isopropanol. After incubation at -20ΛC for 30 min, collect the precipitates by
centrifugation (14,000 g, 4 ΛC) for 15 min and wash them with 70% ethanol (DEPC treated). Quantify
mRNAβcDNA hybrids by measuring their cDNAs via real-time PCR. Note: mRNAs lacking barcoding
DNA can lead to formation of non-barcoded proteins, which can be separated from PRMC complexes
via streptavidin pull-down (see the step 10).
7. In vitro translate and display proteins on PRMC complexes by using a PURExpress Ξ Ribosome kit
(NEB) as follows:
In vitro translation components Volume (Β΅L) Solution A 40 Factor mix 12 Ribosomes (13.3 Β΅M) 2.2 (~0.3 Β΅M) RNase inhibitor (40 U/Β΅L, Cat# M0314, NEB) 2 mRNAβcDNA hybrids x (~0.4 Β΅M) dH2O to 100
Incubate the reaction at 37ΛC for 30 min. Note: mRNAβcDNA templates were added at a higher molar
concentration than that of ribosomes to decrease polysome formation.
8. Quench the reaction by addition of 100 Β΅L ice-cold buffer HKM (50 mM HEPES, pH 7.0, 250 mM
KOAc, 25 mM Mg(OAc)2, 0.25 U/mL RNasin (Promega), 0.5 mg/mL chloramphenicol, 5 mM 2-
mercaptoethanol and 0.1% (v/v) Tween 20). Centrifuge (14,000 g, 4ΛC) the tube for 10 min to remove
insoluble components. Note: PRMC complexes should be kept on ice or in cold room to improve their
W W W. N A T U R E . C O M / N A T U R E | 3
SUPPLEMENTARY INFORMATION RESEARCH
4
stability.
9. Purify PRMC complexes containing full-length proteins of interest by using Flag-tag affinity
purification. Incubate a 200-Β΅L reaction mixture with 40 Β΅L anti-Flag M2 magnetic beads (Sigma-
Aldrich), which are blocked with the buffer HKM supplemented with 100 Β΅g/mL yeast tRNA and 10
mg/mL BSA, with gentle mixing for ~2-4 h in cold room. Elute bound PRMC complexes with the
buffer HKM in the presence of 0.1 mg/mL Flag peptide.
10. To remove ribosome complexes lacking barcoding DNAs, as well as the Flag peptide which might
interfere with following assays, further purify PRMC complexes with streptavidin-coated magnetic
beads (Dynabeads M-270 Streptavidin, Life Technologies). For example, incubate 200 Β΅L eluent (from
the step 9) with 100 Β΅L streptavidin magnetic beads, which were pretreated with 0.1 M NaOH (refer
to the manual of the beads) and blocked with the buffer HKM in the presence of 100 Β΅g/mL yeast
tRNA and 10 mg/mL BSA, for 1 h in cold room with gentle mixing. Elute bound PRMC complexes
with 20 Β΅L buffer HKM containing 5 mM biotin.
11. Quantify PRMC complexes by measuring their cDNAs via real-time PCR. Note: Estimated yields
of PRMC complexes varied from 2.5 to 10.6% of the molar amounts of added mRNAβcDNA hybrids
based on a test of individually displayed proteins of different sizes.
HaloTag-based protein barcoding
Enzymatic tags (e.g., HaloTag, SNAP-tag and CLIP-tag) can be applied to the covalent coupling of
various proteins to a barcoding DNA30. Compared with chemical conjugation methods, they can
improve the binding of proteins to an enzyme ligand-modified DNA and catalyze the bond formation.
This section describes how to prepare conjugates of HaloTagged proteins and barcoding dsDNAs
(Extended Data Fig. 2).
1. Prepare a HaloTag ligand-modified primer. Incubate 100 ΞΌL conjugation reaction containing an
amino modified oligo (100 ΞΌM), a succinimidyl ester (O4) Halo-ligand (10 mM, freshly prepared in
DMSO, Promega) and 50 ΞΌL formamide in 50 mM Na2HPO4, pH 8.0, 150 mM NaCl at room
temperature for 1 h. Purify the ligand-modified oligo by reverse-phase HPLC using a Zorbax Eclipse
XDB-C18 column (5 ΞΌm, 9.4Γ250 mm, Agilent Technologies) and an elution gradient of 5-70%
CH3CN/H2O (0.1 M triethylammonium acetate). Lyophilize the modified oligo for further use. Note:
Formamide denaturation of the oligo was found to improve the conjugation efficiency.
2. Prepare barcoded templates via the first PCR:
PCR components Volume (Β΅L)
SUPPLEMENTARY INFORMATION
4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
5
Universal backbone DNA template (βUniversal template of barcoding DNA 1 or 2β, 1 nM)
1
Upstream barcoded primer (βBarcoding DNA-1 or 2 (F)β, 12.5 ΞΌM)
1
Universal downstream primer (βBarcoding DNA (R)β, 12.5 ΞΌM)
1
Platinum PCR SuperMix (Life Technologies) 22.5
25 cycles of thermal cycling with an annealing temperature of 58ΛC. Barcoded primers were prepared
in 96-well plates.
3. Prepare barcoding dsDNAs with desthiobiotin, acrydite and Halo-ligand modifications via the
secondary PCR:
PCR components Volume (Β΅L) Barcoded template (~0.1-1 nM) 1 Universal upstream modification primer (βBarcoding DNA modification (F)β, 25 ΞΌM)
2
Universal downstream modification primer (βBarcoding DNA modification (R)β, 25 ΞΌM)
2
Platinum PCR SuperMix 45
30 cycles of thermal cycling with an annealing temperature of 60ΛC.
Purify PCR products with AMPure XP beads (Beckman Coulter) and quantify them with NanoDrop.
4. To generate proteinβDNA conjugates, incubate ~0.5-2 ΞΌM barcoding dsDNAs and ~2-5 ΞΌM
HaloTagged proteins in a conjugation buffer (50 mM HEPES, pH 7.5, 150 mM NaCl, 2 mM EDTA and
5% glycerol) with gentle shaking at room temperature for 2-4 h. Note: The yields of proteinβDNA
conjugates were estimated to be above 15% based on a test of proteins of various sizes.
5. To remove free barcoding dsDNAs, purify the conjugates, as well as free proteins, by using the anti-
Flag M2 or His-tag (Dynabeads) magnetic beads, and elute them with 50 mM sodium phosphate, pH
8.0, 300 mM NaCl, 1 mM EDTA, 5% glycerol and 0.1% (v/v) Tween 20, in the presence of 0.1 mg/mL
Flag peptide or 250 mM imidazole.
6. To remove free proteins, purify the conjugates by using the M-270 streptavidin-coated magnetic
beads, and elute them with assay buffers in the presence of 5 mM biotin.
7. Quantify proteinβDNA conjugates by real-time PCR. The conjugated can be stored at -80ΛC for
future use.
Array deposition
This section outlines the protocol to immobilize SM barcoded proteins on the surface of a microscopic
glass slide.
W W W. N A T U R E . C O M / N A T U R E | 5
SUPPLEMENTARY INFORMATION RESEARCH
6
1. Clean glass slides and coverslips (e.g., 24 x 60 mm rectangular, No. 2) by sonication in 5% Contrad
70, 1 M NaOH, 0.1 N HCl and Milli-Q H2O, and air dried in an AirClean PCR hood.
2. Treat the slide surface with Bind-Silane (GE Healthcare). A detailed protocol can be found at
http://arep.med.harvard.edu/polony/polony_protocols/bind_silane.htm.
3. Prepare a gel-casting solution as follows:
Gel-casting solution components Volume (Β΅L) 2Γ deposition buffer (40 mM HEPES, pH 7.0, 100 mM KOAc, 12 mM Mg(OAc)2, 0.5 U/mL RNasin (Promega) and 0.2% Tween 20)
45
40% acrylamide/bis-acrylamide (19:1, molecular grade, Ambion) 15 Bridge amplification primer (F) (βBridge amplification (F)β, 1 mM) 25 Bridge amplification primer (R) (βBridge amplification (R)β, 1 mM) 25 dH2O to 90
Note: Because oxygen trapped in solution or on glass surface can inhibit acrylamide polymerization,
the reagents are degassed with argon and put into an anaerobic chamber (Coy Lab). The reagent mixing
and gel polymerization process are handled in the chamber.
4. Prior to the immobilization, dilute samples with the deposition buffer to a protein concentration
ranging from 0.1 to 1 nM. Note: The protein concentration can be adjusted to optimize polony
densities.
5. Prepare a gel-casting mix by adding 10 Β΅L diluted protein sample to 90 Β΅L gel-casting solution.
6. Add 1 Β΅L 10% (v/v) TEMED and 1 Β΅L 5% (w/v) ammonium persulfate to the gel-casting mix, and
apply β₯ 20 ΞΌL the gel-casting mix to the Bind-Silane-treated slide surface. To form a gel layer of less
than 5-ΞΌm thickness, place a coverslip on the top of the liquid and tightly press it against the slide to
form a liquid layer evenly spread over its surface. Note: A degassed gel-casting mix undergoes a faster
polymerization than usual, so complete this process quickly or otherwise reduce the amounts of
TEMED and ammonium persulfate.
7. Allow the gel to polymerize in the chamber for ~4 h.
8. Gently remove the coverslip under the Milli-Q H2O with the help of a steel blade. Wash the slide
with Milli-Q H2O in a Coplin jar, dry it by a quick spin and place it face up in a PCR hood.
Polony amplification, linearization and blocking
This procedure is to convert barcoding DNAs into linearized and 3β-OH blocked polonies prior to
sequencing. The process is partly similar to the cluster generation applied to Illumina platforms10. To
facilitate changing reagents and buffers during polony amplification, a protein-loaded slide was
assembled into a FC 81 transmission flow cell with a 1.85-mm-thick polycarbonate flow channel
SUPPLEMENTARY INFORMATION
6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
7
(BioSurface Technologies). The flow cell temperature was controlled by a VWR modular heating
block.
1. Prepare the following buffers:
Buffer Components Volume RNA digesting buffer 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2 and
0.1% (v/v) Triton X-100 100 mL
Amplification buffer 20 mM Tris-HCl, pH 8.8, 10 mM ammonium sulfate, 2 mM magnesium sulfate, 0.1% (v/v) Triton X-100, 1.3% (v/v) DMSO, 2M betaine
1,000 mL
Linearization buffer 20 mM Tris-HCl, pH 8.8, 10 mM KCl, 10 mM ammonium sulfate, 2 mM magnesium sulfate and 0.1% (v/v) Triton X-100
100 mL
Blocking buffer 20 mM Tris-acetate, pH 7.9, 50 mM KOAc, 10 mM Mg(OAc)2 and 0.25 mM CoCl2
100 mL
Wash buffer W1 1ΓSSC and 70% formamide 500 mL Wash buffer W2 0.3ΓSSC and 0.1% (v/v) Tween 20 200 mL
Note: Milli-Q H2O and molecular biology grade reagents are used to avoid nuclease contamination.
2. Clean flow cell components including the polycarbonate flow channel and a coverslip by sonication
in 5% Contrad 70 and Milli-Q H2O, and air dried in an AirClean PCR hood.
3. For samples containing PRMC complexes, digest mRNAs by adding the RNA digesting buffer in
the presence of 10 U/mL RNase H (NEB) into the flow cell and incubating it at 37ΛC for 20 min. Wash
the flow cell with the wash buffer W2 (3Γ3 mL).
4. Increase the flow cell temperature to 60ΛC, and maintain it for the polony amplification process
(steps 5-8).
5. Wash the flow cell with deionized formamide (3Γ3 mL, Ambion).
6. Wash the flow cell with the amplification buffer (3Γ3 mL).
7. Add the amplification buffer in the presence of 200 ΞΌM dNTPs and 80 U/mL Bst polymerase (NEB)
into the flow cell and incubate it for 5 min.
8. Repeat the steps 5-7 for additional 31 cycles.
9. Decrease the flow cell temperature to 37ΛC.
10. Wash the flow cell with the wash buffer W2 (3Γ3 mL) and the linearization buffer (3Γ3 mL).
11. To linearize polonies, add the linearization buffer in the presence of 10 U/mL USER enzyme (NEB)
and incubate the flow cell at 37ΛC for 1 h.
12. Wash off the excised strands with the wash buffers W1 (3Γ3 mL) and W2 (3Γ3 mL).
W W W. N A T U R E . C O M / N A T U R E | 7
SUPPLEMENTARY INFORMATION RESEARCH
8
13. Wash the flow cell with the blocking buffer (3Γ3 mL).
14. To block 3β-OH ends of polonies and primers, add the blocking buffer in the presence of 10 ΞΌM
ddNTPs and 250 U/mL terminal transferase (NEB) and incubate the flow cell at 37ΛC for 10 min. To
drive the reaction to completion, refill the flow cell with the fresh reagents and repeat this step twice.
Note: The 3β-OH blocking can prevent nonspecific ligation of labeled oligos to polonies and gel-
anchored primers during sequencing.
15. Wash the flow cell with the wash buffer W2 (3Γ3 mL).
Polony sequencing-by-ligation and colocalization analysis
Polonies generated by our approach are compatible with both sequencing-by-synthesis and
sequencing-by-ligation chemistries. Programmable synthetic barcodes can expand choices of
sequencing strategies. In this work, we modified a sequencing-by-ligation method reported by our lab11
(http://www.polonator.org/protocols/). As detailed protocols of the sequencing method can be found
in our previous reports11,51 (http://arep.med.harvard.edu/Polonator/), this section only focuses on
differences of the current protocol.
1. To facilitate the deconvolution of sequencing signals from colocalized protein and probe polonies,
two rounds of sequencing with different anchor primers (βSequencing 1β and βSequencing 2β,
Supplementary Table 5) were successively conducted for protein and probe libraries.
2. Because polony sequencing was performed with a three-channel fluorescence imaging setup, a
three-color sequencing method was designed to decode synthetic barcodes only composed of A, T and
C. Thus, for each query position (e.g., position 1 to 5), an anchor primer is ligated with three
fluorescently labeled degenerate nonamer pools. As previously described11,51, each sequencing-by-
ligation cycle comprises four steps:
(i) Hybridize an anchor primer (10 ΞΌM) to polonies in a hybridization buffer (5ΓSSC and 0.1% (v/v)
Tween 20) at 60ΛC for 10 min and then decrease the temperature to 40ΛC.
(ii) Ligate polony-bound anchor primers with nonamers (2 ΞΌM each pool) in a ligation buffer (50 mM
Tris-HCl, pH 7.6, 10 mM MgCl2, 1 mM ATP and 5 mM DTT) in the presence of 30 U/Β΅l T4 DNA
ligase (Enzymatics) at room temperature for 20 min, and then increase the temperature to 35ΛC and
maintain it for 40 min.
(iii) Scan the polony slide by using a fluorescence microscope to determine ligated nonamers.
(iv) Strip off polony-bound primers by washing with the buffer W1 at 60ΛC and then with the buffer
W2.
SUPPLEMENTARY INFORMATION
8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
9
Note: To save the ligase and oligos used for each cycle, the hybridization and ligation steps were
performed in a gasket chamber (~0.5 mL) assembled with a polony slide and a microarray gasket slide
(Cat# G2534-60008, Agilent Technologies). The stripping was performed in a Coplin jar.
3. For polony colocalization analysis, reference images constructed for protein and probe polony
sequencing are aligned with the assist of a cross-library reference. Thus, protein and probe polonies
were hybridized with anchor primers labelled with different fluorophores (βSequencing 1-Cy3β and
βSequencing 2-Cy5β, Supplementary Table 5), and their super-imposed images served as the reference.
4. Polony colocalization analysis was performed at each image position. MATLAB scripts analyze all
combinations of protein polony and probe polony positions to identify and count the protein polonies
within a threshold distance (e.g., 0.7 Β΅m) from probe polonies.
W W W. N A T U R E . C O M / N A T U R E | 9
SUPPLEMENTARY INFORMATION RESEARCH
10
SUPPLEMENTARY NOTES
1. Colocalization statistics
To compare degrees of colocalization between different protein and probe pairs in an experiment,
we measured colocalization ratios defined as the percentages of protein polonies colocalized with
corresponding probe polonies, and performed Studentβs t-tests for the measurements at multiple
imaging positions. The contribution from random colocalization can be estimated by calculating the
mean value of pair cross-correlation function (PCCF) over the distance interval of zero to the
colocalization threshold. In addition, the PCCF statistic 39 can be applied to characterize colocalization
patterns of two polony species that were overlapped or partially overlapped. Below is how the PCCF
values were calculated.
Let i and j be two types of objects for colocalization analysis and A be a sampled array area. A
cross-correlation Ripley K-function οΏ½ΜοΏ½πΎ(ππ) can be estimated 52 as
οΏ½ΜοΏ½πΎππ,ππ(ππ) = 1π΄π΄οΏ½ΜοΏ½ππποΏ½ΜοΏ½πππ
β β Ο(ππππ, ππππ)I(ππππππ,ππππ < ππ)ππππ
where ππππππ,ππππ is the distance between the centroids of kβth location of type i objects and the lβth location
of type j objects, and I(ππππππ,ππππ < ππ) is the indicator function with the value 1 if ππππππ,ππππ < ππ is true and
0 otherwise. The density of type i objectives οΏ½ΜοΏ½π can be estimated as
οΏ½ΜοΏ½πππ = πππππ΄π΄
where ππππ is the total number of i objects. The weight function, Ο(ππππ, ππππ) provides an edge correction
but was here ignored (Ο(ππππ, ππππ) β 1). The function οΏ½ΜοΏ½πΎππ,ππ(ππ) can be interpreted as the ratio of the
number of i and j objects localized within radius r of each other, over the number that would be
expected by chance. Following 39, we also computed a PCCF that considered colocalization within a
radial interval [ππ, ππ + βππ) via
1π΄π΄οΏ½ΜοΏ½ππποΏ½ΜοΏ½πππ(2ππππβππ + ππβππ2)
β β πΌπΌ(ππ β€ ππππππ,ππππ < ππ + βππ)ππππ
where β β πΌπΌ(ππ β€ ππππππ,ππππ < ππ + βππ)ππππ and π΄π΄οΏ½ΜοΏ½ππποΏ½ΜοΏ½πππ(2ππππβππ + ππβππ2) are, respectively, an actual count of
colocalized objects i and an average number of objects i that are colocalized with objects j by chance.
The PCCF mean values were calculated over the interval of 0 to the colocalization threshold (ππ = 0
and βππ = the colocalization threshold). In computing a PCCF value for an experiment in which Q
images were analyzed, colocalization events were aggregated over all images and divided by Q times
the expected number of random colocalization per image. By definition, randomly colocalized objects
should have PCCF values of 1. However, to assess whether PCCFs derived in actual experiments were
SUPPLEMENTARY INFORMATION
1 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
11
statistically significantly different from 1, following 39 we estimated 95% confidence intervals of the
PCCFs of randomly colocalized objects using Monte-Carlo simulations. Specifically, each simulation
assumed Q images, and within each image, Ni and Nj polony and probe objects, respectively, where Q
was the number of images analyzed in the experiment whose PCCF was being evaluated, and Ni and
Nj were the mean numbers of polony and probe objects observed in the actual experiment. Coordinates
for the protein and probe polonies were randomly picked using uniform locations. All dimensions were
scaled to actual image dimensions in pixels. For each simulation, a PCCF was computed in the same
manner as in the actual experiment by aggregating colocalization events over Q random images.
Finally, means and confidence intervals for these random PCCFs were obtained from 1,000 simulations.
2. Initial mathematical model of SM-based protein library vs. probe library binding assay
This note describes a mathematical model whose aim is to assist understanding of the sensitivity and
specificity of detection of proteinβprobe interactions in complex mixtures. The following items are
assumed:
1) ππ species of barcoded proteins ππ1, ππ2, β¦ , ππππ are allowed to interact with ππ species of
barcoded probes π π 1, π π 2, β¦ , π π ππ in a one-pot assay. It is assumed that each protein is present in
the same concentration and that the total protein concentration is ππ#. Similarly, it is assumed that
the total concentration of probes is π π # and the concentration of each π π ππ is π π #/ππ. It is assumed
that probe concentrations are titratable and that π π #/ππ β« ππ#/ππ. For simplicity, we will assume
here that ππ = ππ and that for each protein ππππ one probe π π ππ (denoted with the same index) has
been chosen or designed to specifically target the protein.
2) Due to folding and other issues relating to the efficiency of ribosome display, only a fraction Ξ±
of each protein is in an active form that is capable of binding specifically to their targeting probes.
The active and inactive forms of the protein ππππ will be denoted ππππ+ and ππππ
β , with total
concentrations πΌπΌππ#ππ and (1βπΌπΌ)ππ#
ππ , respectively. For similar reasons, only a fraction of probes
are active and can specifically bind to their targeted proteins, and their active and inactive forms
will similarly be denoted π π ππ+ and π π ππ
β, with concentrations πππ π #ππ and (1βππ)π π #
ππ . These fractions are
assumed to be stable throughout the assay, and active and inactive forms of the proteins and
probes are assumed not to be able to interconvert. The fractions Ξ± and will be assumed to apply
to all proteins and all probes, respectively.
3) For ππ = 1,2, β¦ ππ, the active forms of protein ππππ and its specifically targeting probe π π ππ will
interact according to the reaction
W W W. N A T U R E . C O M / N A T U R E | 1 1
SUPPLEMENTARY INFORMATION RESEARCH
12
(S1) ππππ+ + π π ππ+πΎπΎπ·π·β (ππππ+π π ππ+)ππ
where (ππππ+π π ππ+)ππ denotes the complex formed from the specific interaction, and πΎπΎπ·π· the
dissociation constant of this complex, and where πΎπΎπ·π· applies equally to each such proteinβprobe
pair. All forms of protein ππππ will also interact non-specifically with all forms of all probes,
including with specific probe π π ππ . This leads to four reactions between the active or inactive
protein ππππ and each of the n probes π π ππ (ππ = 1,2, β¦ ππ) , all of which are assumed to be
characterized by the same non-specific dissociation constant U:
(U1) ππππ+ + π π ππ+ππβ (ππππ+π π ππ+)ππ (j=1,..,n)
(U2) ππππ+ + π π ππβππβ (ππππ+π π ππβ)ππ (j=1,..,n)
(U3) ππππβ + π π ππ+ππβ (ππππβπ π ππ+)ππ (j=1,..,n)
(U4) ππππβ + π π ππβππβ (ππππβπ π ππβ)ππ (j=1,..,n)
It will also be assumed that (i) non-specific interactions between probes and proteins are always
binary, and we can therefore neglect the possibility of ternary or higher complexes, and (ii) probes
only non-specifically interact with proteins, and proteins only with probes, and thus that probes
and probes, and proteins and proteins, will not interact.
4) After these reactions reach equilibrium, proteinβprobe complexes of all of these sorts are
irreversibly captured by chemical crosslinking, and free probes are removed from the solution,
leaving a residual concentration π π 0. It is assumed that both free and complexed protein and
probe molecules are then deposited on the surface of the array in proportion to their solution
concentrations, and then immobilized on the array. Of these, it is assumed that only a fraction Ξ²
of protein and a fraction Ξ³ of probe molecules bear barcoding DNAs that can be successfully
amplified into polonies and detected on the array, and that amplifiability of protein and probe
DNAs is independent of whether the proteins and probes are free or in complex.
5) The following simplifications will be made regarding computation of PCCF statistics (see above):
Instead of computing PCCFs by counting all pairs of ππππ and π π ππ polonies within a specified
distance threshold, PCCFs will be calculated from the numbers of ππππ polonies that are found
colocalized with π π ππ polonies in either of the following ways: (i) specific and non-specifically
SUPPLEMENTARY INFORMATION
1 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
13
bound ππππ β π π ππ complexes in which both components form polonies (as per the assumption 4)
will be counted as intrinsically colocalized polonies; (ii) ππππ and π π ππ polonies that are formed on
the array by other means may be found to be randomly colocalized. A central value for random
colocalization will be computed as the number of non-ππππ β π π ππ -derived ππππ polonies that are
expected to be found by chance within the distance threshold from non-ππππ β π π ππ -derived π π ππ polonies, given the numbers of these polonies obtained from 4 above. The sum of (i) and (ii) will
be used to compute a central PCCF for ππππ and π π ππ on the array, and variation from this central
value will be estimated by random simulations described below. This calculation of PCCF differs
from the formal definition given above and in 39 by being non-symmetrical in ππππ and π π ππ. Also,
in counting ππππ polonies that are near π π ππ polonies instead of counting all pairs of neighboring
ππππ and π π ππ polonies, it ignores the extra pairs that would be taken into account in the PCCF as
formally defined should a ππππ polony be found near multiple π π ππ polonies, and is thus
conservative regarding colocalization counts compared to its formal definition.
The equilibriums of the five reactions in the assumption 3, and the assumption 1 that π π #/ππ β« ππ#, yield
2ππ + 1 equations involving the concentration [ππππ+] of free ππππ+ and 2ππ equations involving the
concentration [ππππβ] of free ππππβ
(S1β²) [ππππ+]πππ π #πππΎπΎπ·π·
= [(ππππ+π π ππ+)ππ]
(U1β²) [ππππ+]πππ π #ππππ = [(ππππ+π π ππ+)ππ] (j=1,..,n)
(U2β²) [ππππ+](1 β ππ)π π #
ππππ = [(ππππ+π π ππβ)ππ] (j=1,..,n)
and
(U3β²) [ππππβ]πππ π #ππππ = [(ππππβπ π ππ+)ππ] (j=1,..,n)
(U4β²) [ππππβ](1 β ππ)π π #
ππππ = [(ππππβπ π ππβ)ππ] (j=1,..,n)
Note that here there is a single (S1β²) equation involving the one specifically targeting probe π π ππ, but n
instances each of (U1β²)-(U4β²), one for each π π ππ for j=1,..,n.
From the assumption 2 and the equations (S1β²), (U1β²) and (U2β²), we get
W W W. N A T U R E . C O M / N A T U R E | 1 3
SUPPLEMENTARY INFORMATION RESEARCH
14
[ππππ+] + [(ππππ
+π π ππ+)ππ] + β [(ππππ
+π π ππ+)ππ]
ππ
ππ=1+ β [(ππππ
+π π ππβ)ππ]
ππ
ππ=1= πΌπΌππ#
ππ
or
[ππππ+] (1 + πππ π #
πππΎπΎπ·π·+ β πππ π #
ππππ
ππ
ππ=1+ β (1 β ππ)π π #
ππππ
ππ
ππ=1) = [ππππ
+] (1 + πππ π #πππΎπΎπ·π·
+ π π #ππ ) = πΌπΌππ#
ππ
which leads in turn to
[ππππ+] = πΌπΌππ#
ππ + π π # ( πππΎπΎπ·π·
+ ππππ)
= πΌπΌππ#
ππ + π π #οΏ½ΜοΏ½πΎπ·π·
= πΌπΌππ#οΏ½ΜοΏ½πΎπ·π·πποΏ½ΜοΏ½πΎπ·π· + π π #
where οΏ½ΜοΏ½πΎ can be interpreted as an adjusted specific dissociation constant
οΏ½ΜοΏ½πΎπ·π· = 1ππ
πΎπΎπ·π·+ ππ
ππ
Similarly from the assumption 2 and equations (U3β²) and (U4β²) we get
[ππππβ] =
(1 β πΌπΌ)ππ#
ππ + πππ π #ππ
=(1 β πΌπΌ)ππ#ππππ(ππ + π π #)
Using equations (S1β²) and (U1β²)-(U4β²), the total concentration [(ππππ β π π ππ)] of (ππππ β π π ππ) complexes
between the protein ππππ and its specifically targeting probe π π ππ in any of their active and inactive forms
is
[(ππππ β π π ππ)] = πΌπΌππ#οΏ½ΜοΏ½πΎπ·π·πποΏ½ΜοΏ½πΎπ·π· + π π #
(πππ π #πππΎπΎπ·π·
+ π π #ππππ) +
(1 β πΌπΌ)ππ#ππππ(ππ + π π #) (π π #
ππππ)
= ππ#π π #ππ (
πΌπΌ (1 β (ππ β 1)οΏ½ΜοΏ½πΎπ·π·ππ )
πποΏ½ΜοΏ½πΎπ·π· + π π #+
(1 β πΌπΌ)ππ(ππ + π π #))
Total free protein concentration can also be computed as
[ππππ+] + [ππππ
β] = πΌπΌππ#οΏ½ΜοΏ½πΎπ·π·πποΏ½ΜοΏ½πΎπ·π· + π π #
+(1 β πΌπΌ)ππ#ππππ(ππ + π π #) = ππ# ( πΌπΌοΏ½ΜοΏ½πΎπ·π·
πποΏ½ΜοΏ½πΎπ·π· + π π #+
(1 β πΌπΌ)ππππ(ππ + π π #))
We also have a total concentration [(ππππ β π π ππβ ππ)] of (ππππ β π π ππ) complexes between ππππ and π π ππ probes (j
i) that are not targeted to ππππ, in any of their active and inactive forms. This is simplified as
[(ππππ β π π ππβ ππ)] = ([ππππ+] + [ππππ
β]) (ππ β 1)π π #ππππ
Finally, we must also consider that probe π π ππ will be in non-specific complexes with other proteins ππππβ ππ
than its specific target. By our assumptions above, since all proteins ππππ (ππ β ππ) behave identically with
SUPPLEMENTARY INFORMATION
1 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
15
respect to their targeting and non-targeting probes to ππππ, we have [ππππ+] + [ππππβ] = [ππππ+] + [ππππβ] for all
ππ β ππ, and therefore that
[(ππππβ ππ β π π ππ)] = ([ππππ+] + [ππππβ])(ππ β 1)π π #
ππππ
Arraying, polony formation, and colocalization statistics
It is now assumed that the mixture is arrayed for SM assaying, and that polonies are formed on the
array. Following the assumption 4, the fractions of polonies relevant to evaluation of ππππ and π π ππ colocalization can be computed as follows:
ππ(ππππ) =π½π½πΎπΎ[(ππππ β π π ππ)]
πΆπΆ
Fraction of (ππππ β π π ππ) complexes between ππππ and its specifically targeting probe π π ππ that are detectable on the array as intrinsically colocalized polonies
ππ(ππππ) =π½π½πΎπΎ[(ππππ β π π ππβ ππ)]
πΆπΆ
Fraction of (ππππ β π π ππ) complexes between ππππ and other probes π π ππ (j i) that are detectable on the array as polonies of ππππ that are intrinsically colocalized with those of other probes.
ππππ=π½π½(1 β πΎπΎ)([(ππππ β π π ππ)] + [(ππππ β π π ππβ ππ)]) + π½π½([ππππ+] + [ππππβ])
πΆπΆ
Fraction of ππππ polonies that do not appear intrinsically colocalized with probe polonies
ππ(ππππ) =π½π½πΎπΎ[(ππππβ ππ β π π ππ)]
πΆπΆ
Fraction of (ππππ β π π ππ) complexes between probe π π ππ and other proteins ππππ (j i) that are detectable on the array as polonies of π π ππ that are intrinsically colocalized with the other proteins.
ππππ =(1 β π½π½)πΎπΎ([(ππππ β π π ππ)] + [(ππππβ ππ β π π ππ)]) + πΎπΎ π π
0
πππΆπΆ
Fraction of π π ππ polonies that do not appear intrinsically colocalized with protein polonies
where
πΆπΆ = (1 β (1 β π½π½)(1 β πΎπΎ))[(ππππ β π π ππ)] + π½π½[(ππππ β π π ππβ ππ)] + πΎπΎ[(ππππβ ππ β π π ππ)] + π½π½([ππππ+] + [ππππβ]) + πΎπΎ π π 0
ππ
Note that as per the assumption 5, ππ(ππππ) determines the number of intrinsically colocalized ππππ and π π ππ
W W W. N A T U R E . C O M / N A T U R E | 1 5
SUPPLEMENTARY INFORMATION RESEARCH
16
polonies found on the array. The other fractions will be used in calculation of the number of randomly
colocalized polonies below. First we will compute the numbers of polonies of the various sorts, and then
we will calculate random colocalization.
Let it now be assumed that ππππ polonies are detected for the protein ππππ. These ππππ polonies may
be apportioned as
ππ(ππππ) =ππππππ(ππππ)
ππ(ππππ) + ππ(ππππ) + ππππ
Polonies of ππππ intrinsically
colocalized with polonies of π π ππ
ππ(ππππ) =ππππππ(ππππ)
ππ(ππππ) + ππ(ππππ) + ππππ
Polonies of ππππ intrinsically
colocalized with polonies of other
probes π π ππ (ππ β ππ)
ππππ = ππππππππππ(ππππ) + ππ(ππππ) + ππππ
Polonies of ππππ that are not
intrinsically colocalized with probe
polonies.
It follows from the frequencies derived above that the following numbers of polonies are detected for
the probe π π ππ apart that are not counted with the ππππ ππππ protein polonies above (the only π π ππ polonies
considered with the ππππ polonies above are the ππ(ππππ) instances of π π ππ polonies colocalized with ππππ
polonies).
ππ(ππππ) =ππππππ(ππππ)
ππ(ππππ) + ππ(ππππ) + ππππ
Polonies of π π ππ intrinsically
colocalized with polonies of other
proteins ππππ (ππ β ππ)
ππππ = ππππππππππ(ππππ) + ππ(ππππ) + ππππ
Polonies of π π ππ that are not
intrinsically colocalized with protein
polonies.
In preparing to compute random colocalization and the final PCCF statistic, a question arises in the
context of our highly multiplexed SM assay as to whether ππππ polonies from both uncomplexed ππππ+ and
ππππβ objects vs. ππππ polonies formed from ππππ β π π ππβ ππ complexes should be treated equivalently regarding
whether they can be randomly colocalized (and similarly for π π ππ polonies). It could be the case that ππππ
polonies formed within complexes cannot be colocalized with π π ππ polonies to the degree that ππππ
polonies formed from uncomplexed ππππ objects can due to steric constraints or other factors. In non-
multiplexed assays, such as those considered in 39, this question never arises because the non-targeting
partners in ππππ β π π ππβ ππ and ππππβ ππ β π π ππ complexes would never be surveyed for detection, and the resulting
SUPPLEMENTARY INFORMATION
1 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
17
ππππ and π π ππ polonies would all be considered isolated objects that could appear near each other by chance
in the same way. A broader issue concerns the fact that the PCCF is specifically a Pair Cross-Correlation
Function 39, and the question arises whether for multiplexed assays it might be better to develop and
employ a higher-order multi-variate statistic that compares actual vs. expected random colocalization for
many kinds of objects at once, somewhat like multi-variate ANOVAs analyze variances of many
variables and interactions at once. However, in this initial model, we will in fact treat polonies derived
from free probe and protein molecules vs. complexes equivalently in terms of their potential for random
colocalization within the constraints indicated in the assumption 5. Notably, even when only considering
pairwise colocalization, such as the application of PCCF in 39, where objects are labeled antibodies, the
prima facie distinction between objects colocalized by virtue of targeting physical interactions and
isolated objects that appear as random background is an idealization, since the apparently isolated objects
are likely interacting non-specifically with many other kinds of unsurveyed molecules and complexes in
the cell matrix, and PCCF remains a useful statistic even though these interactions are ignored.
Random colocalization
As noted in assumption 5 and discussed in the comment above, random colocalization will be
considered between ππππ and π π ππ polonies that do not arise from intrinsic colocalization from ππππ β π π ππ
complexes. We now know the number of such polonies to be ππ(ππππ) + ππππ for ππππ, and ππ(ππππ) + ππππ for
π π ππ. Given imaged array area A and polony radius r, we can estimate the density of these π π ππ polonies
that could appear anywhere on the array by chance as
ππππ =ππππ + ππ(ππππ)
π΄π΄and the probability of a probe π π ππ polony appearing in the vicinity of a ππππ protein polony by chance
would then be
ππ(2ππ)2ππππ
Thus, the expected number of the ππ(ππππ) + ππππ ππππ polonies that will have an π π ππ polony localized nearby
by chance will be
ππ(ππππ)ππππππππ = ππ(2ππ)2ππππ (ππ(ππππ) + ππππ)
Thus, the total number of ππππ polonies colocalized with π π ππcomplexes will be
ππ(ππππ)π‘π‘π‘π‘π‘π‘ = ππ(ππππ) + ππ(ππππ)
ππππππππ
PCCF statistic
To complete the PCCF statistic as specified in assumption 5, we must divide ππ(ππππ)π‘π‘π‘π‘π‘π‘ by the expected
number of ππππ polonies colocalized with π π ππ polonies assuming that all of these individual polonies
W W W. N A T U R E . C O M / N A T U R E | 1 7
SUPPLEMENTARY INFORMATION RESEARCH
18
(including the ones in ππππ β π π ππ complexes) could be colocalized by chance. Similar to the logic above,
the total density of π π ππ objects will now be
π π =πππ π + ππ(πππ π ) + ππ(πππ π )
π΄π΄and the probability of a probe π π ππ polony appearing in the vicinity of a ππππ protein polony will then be
ππ(2ππ)2π π ππππ
and, therefore
ππππππππ = ππππππππ(ππ#, π π #, π π 0, ππ, πΌπΌ, ππ, πΎπΎπ·π·, ππ, π½π½, πΎπΎ, ππππ, π΄π΄, ππ) =ππ(πππ π )
π‘π‘π‘π‘π‘π‘
ππ(2ππ)2π π ππππ
Random simulations
To estimate the degree of variation to which the PCCF statistic may be subject under a given set
parameters, we compute a distribution of PCCF values using the formula above assuming that the six
terms ππππ, ππ(πππ π ), ππ(ππππ), ππ(πππ π ), πππ π , and ππ(πππ π )ππππππππ are all randomly drawn from Poisson distributions
whose means are the values computed above within the model. Because these simulations do not take
into account variation in actual samples or assay conditions, and because Poisson error may itself
underrepresent the variability inherent in the underlying system vs. the model, these estimates must be
considered lower bounds for the variance that will be encountered in actual assays.
Detection of specific vs. non-specific binding as a function of π²π²π«π« and ππ
As an application of the model, we compare the PCCF values computed for a mixture of ππ proteins
and targeting probes that specifically interact with dissociation constant πΎπΎπ·π·, where ππ is allowed to
vary over a large range, with the PCCF for mixtures of the same numbers of proteins and probes, in
which all the proteins and probes interact only non-specifically with dissociation constant ππ . In
particular, we assume an array in which 5Γ108 protein polonies can be detected, and that these are divided
equally among the ππ proteins, where ππ is allowed to range between 500 and 500,000 (so that the
number of detected polonies per protein species ππππ correspondingly varies between 1,000,000 and
1,000). We consider three specific dissociation constants πΎπΎπ·π· , and compute non-specific PCCFs by
letting πΎπΎπ·π· β β. All parameters other than πΎπΎπ·π· , ππ, and ππππ are assigned the following fixed values
consistent with literature and experimental data.
ππ# = 20 ππππππππ /100 ππππ
Approximate values which can be used in the assay π π # = 200 ππππππππ /100 ππππ
π π 0 = 100 ππππ
π΄π΄ = 75 Γ 25 ππππ2 Standard microscope slide area
ππ = 0.7 ππππ Colocalization threshold distance used in our experiments
SUPPLEMENTARY INFORMATION
1 8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
19
πΌπΌ = 0.8 Approximate values based on our test of a few proteins
ππ = 0.8
ππ = 10 ππππ Assumed non-specific proteinβprobe complex dissociation
constant
π½π½ = 0.75 Approximate value based on this study and our previous
measure7 πΎπΎ = 0.75
Results are summarized in the figure below. In this figure, error bars span the range of the 1st and
99th percentiles of randomly simulated PCCF distributions as described above, with the following
exception(s): (i) For large values of ππ, the 99th percentile of the non-specific PCCF distribution was no
more than the central value computed by the model so that the upper error bar could be at or below the
central value. In these cases the maximum value observed in the PCCF distribution was used to set the
upper error bar instead of the 99th percentile, and the upper end of the error bar was marked with an
asterisk (*). This situation arises because the number of ππππ and π π ππ polonies becomes very small so that
simulations result in no or very few colocalized polonies except for a small number of outliers. (ii)
Because PCCFs are presented below via their log10 values, PCCF values of 0 cannot be portrayed directly.
However, in some cases the 1st percentiles of PCCF values were 0, and this is indicated by the use of a
downward pointing arrowhead on the lower error bars. Note that markers and error bars are slightly
jittered in order to allow overlapping error bars to be seen clearly. For each set of πΎπΎπ·π·, ππ, and ππππ values,
10,000 random simulations were performed.
A conclusion that may be drawn from these simulations is that order-of-magnitude differences
between specific πΎπΎπ·π·s can be clearly distinguished from each and from non-specific binding in mixtures
of up to ~63,000 distinct protein and probe species under the conditions assumed in the model. Note,
however, that while the lack of overlap between error bars that indicate 1st and 99th percentiles implies
that the PCCF distributions for these different πΎπΎπ·π·s overlap with P < 0.0001, these probabilities are not
corrected for multiple hypotheses.
SUPPLEMENTARY INFORMATION
2 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
21
SUPPLEMENTARY DISCUSSION
Comparison of protein interaction profiling technologies based on nucleic acid barcoding and
high-throughput sequencing
A large set of techniques have been developed to study proteinβprotein interaction and their features
and applications have been well covered by numerous reviews53-57. These techniques are built on a
variety of protein detection methods (e.g., mass spectrometry, immunostaining, spectrophotometry,
etc.) and some of them are also applicable to other types of interactions, such as proteinβnucleic acid
and proteinβsmall molecule interactions. Many techniques can be categorized broadly as βprotein
barcodingβ technologies, including in vivo and in vitro approaches using DNA sequences, such as
protein coding sequences (CDS) or non-CDSs, to identify proteins of interest. Of note, some
techniques have been successfully adapted for use with massively parallel sequencing technologies to
improve throughput and cost-effectiveness16,27,40,42-44,58-60. Here, we compare them with SMI-Seq with
focus on differences in the protein barcoding and decoding methods employed (Extended Data Table
3). Their applications demonstrated for compound screening16,30,45-50,61 are also included in the table.
Protein barcoding methods. A variety of protein barcoding methods can be grouped into two general
categories. The first category includes those that couple proteins and DNAs in natural or synthetic
compartments. Yeast two-hybrid (Y2H)62-64 and protein-fragment complementation assay (PCA)65-68
are well-established in vivo techniques in which proteins and barcoding DNAs (CDSs) are paired in
cellular compartments. Proteinβprotein interactions are detected in intracellular environments with the
help of a transcriptional or spectroscopic reporter. They are relatively easy to implement and have
successfully been applied to screen Gateway-compatible ORFeome libraries for interactome
mapping2,31,69,70. Other prominent examples belonging to this category are cell or virus-based protein
display where proteins of interest are presented on the surface of cells or viral particles and can directly
be subjected to binding assays. Cell-based displays can happen in nature, e.g., immunoglobulin
expression on B lymphocyte surface, or can be engineered in various expression systems (e.g., phage
display71,72, yeast display73,74, bacterial display75,76 and mammalian cell display77,78). Similarly,
coupling of proteins to their DNA templates can be achieved in non-biological compartments, such as
water-in-oil emulsions, via in vitro transcription and translation (e.g., bead surface display79,80).
Although all these techniques are of great utility for screening proteinβprotein interactions, it is
difficult to use them to obtain quantitative measures of protein binding, partly because each
compartment contains different numbers of protein molecules and their effective concentrations are
thus variables that are difficult to control and measure.
W W W. N A T U R E . C O M / N A T U R E | 2 1
SUPPLEMENTARY INFORMATION RESEARCH
22
The other category of protein barcoding is to molecularly attach DNAs (or RNAs) to proteins.
Molecular junctions can be obtained simply by non-covalent binding, e.g., formation of biotinylated
proteinβstreptavidinβbiotinylated DNA complexes, or by covalent chemical crosslinking (refer to
Pierce crosslinking reagents technical handbook81) or enzymatic conjugation (e.g. sortase82, SNAP
tag30, etc.). In principle, these methods are applicable to almost all proteins and complexes that can be
functionally produced in available expression systems; however, because proteins and DNAs need to
be individually coupled, the cost scales almost linearly with library size. In contrast, cell-free protein
display techniques, such as ribosome display6, mRNA display83 and DNA display84, enable one-pot
barcoding of a whole library (up to 1015 proteins) and the time and effort required for each assay are
independent of library size. Nevertheless, the choice of proteins which can be synthesized in a
functional form by in vitro display systems can be limited by the lack of factors that assist protein
synthesis, folding, modification and assembly. mRNA display was found to only work efficiently for
small proteins (β€ 300 amino acids)85. In addition to above methods, proteins can be indirectly barcoded
by binding to barcoded antibodies or nucleic acid aptamers (e.g., proximity ligation assay (PLA)13,14
and proximity extension assay (PEA)15). The use of capture reagents allows direct analyses of proteins
from biological samples and has very versatile applications. These techniques have been used to
measure protein abundance13,44,86-88 and to detect proteinβprotein and proteinβDNA interactions14,89-91
and post-translational modifications92,93, as well as to screen compounds50. However, a limitation of
these techniques is that they require capture reagents of both high affinity and specificity that can be
difficult to produce, and this can be especially constraining in the context of multiplexed binding assays
with large libraries. In general, compared with compartmentation, DNA-attached proteins can be
precisely quantitated by measuring the abundance of their DNA barcodes, thus providing a basis for
the quantification of protein interactions.
Quantification of protein interactions by high-throughput sequencing. High-throughput protein
interaction screening involves detection and quantitation of barcoding DNAs of interacting proteins.
DNA barcodes can be quantified by real-time PCR, microarray hybridization or next-generation
sequencing (NGS). However, for large-scale measurements, NGS technologies hold distinct
throughput and cost advantages and are quickly coming into wide use (Extended Data Table 3). For
example, NGS has been applied to Y2H to quantitate the enrichment level of each positive interactor
(QIS-Seq)40. In a library vs. library screening, genes of each interacting pair need to be individually
joined together by PCR prior to sequencing (Stitch-Seq)27, thus imposing a limit on the throughput.
NGS has widely been used for in-depth profiling of complex antibody repertoires by simultaneously
analyzing immunoglobulin genes from millions of B cells (Ig-seq, recently reviewed by Georgiou et
SUPPLEMENTARY INFORMATION
2 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
23
al.41). Likewise, it has also been coupled with phage display for autoantigen discovery (PhIP-Seq)42
and in vitro antibody selection60, mRNA display for screening proteins generated from random cDNA
fragments (IVV-HiTSeq)43 and ribosome display for the interaction profiling of full-length human
ORFeome (PLATO)16,61. Moreover, NGS was applied to PLA for simultaneous quantitation of 35
proteins in blood plasma (ProteinSeq)44, and PEA for a one-pot binding assay with three barcoded
proteins and 262 barcoded small molecules (IDUP)30.
While NGS techniques can provide true digital quantification of protein molecules through their
DNA barcodes, the ability to precisely quantitate protein interactions can be affected when it is
necessary to separate interacting from non-interacting proteins and sequence the interacting protein
barcodes alone. This is because the sequencing data do not contain those of the quantities of the protein
molecules that did not interact with the baits or probes that are required to calculate binding affinities.
Separations of this sort include growth selection in medium (Y2H), flow cytometry sorting (cell-based
protein display) and affinity enrichment (cell-free protein display). In principle, this problem can be
alleviated by pre-controlling protein concentrations or measuring them in an additional assay. This is
not possible for all detection methods (such as Y2H, see above), but even when possible, these methods
can introduce biases and extra sources of variance compared to in situ sequencing of a whole mixtures,
in which the abundances of both free and interacting proteins can be measured in the same assay.
In situ SM quantification. In addition to sharing advantages of other techniques, such as highly
efficient barcoded library construction conferred by ribosome display, SMI-Seq presents a fundamental
new advantage in its use of in situ SM sequencing to simultaneously identify and count both bound
and unbound proteins in solution. In situ counting of numerous different SM proteins in solution can
lead to ultimate sensitivity and accuracy5,94 and has been a major goal for modern analytical techniques
because it can dramatically increase assay throughput and multiplexity. This is demonstrated by our
ability to conduct a 200Γ55 library-by-library screen, much larger than the 5Γ5 demonstration provided
by its most similar non-SM method14, and our mathematical modeling suggests that theoretically,
interactions of tens of thousands of proteins with tens of thousands probe proteins could be
quantitatively measured in a one-pot assay based on half billion polony reads, a throughput within the
capability of current NGS platforms. Even though the assays were performed in a library vs. library
format in this work, this technique holds the promise of direct molecular counting of all-by-all pairwise
or even higher-order interactions in a complex mixture. SMI-Seq, as well as recent in situ sequencing
techniques95-98, represents a further extension of how imaging-based sequencing technology can glean
new and valuable information by analyzing the spatial patterning as well as the sequence content and
numbers of arrayed DNAs.
W W W. N A T U R E . C O M / N A T U R E | 2 3
SUPPLEMENTARY INFORMATION RESEARCH
24
Supplementary Figure 1 | Expression vectors used in this study. pRD-NHA-SecM was used to
generate ribosome display-barcoded proteins; pEco-CSBPHis, pEco-CHaloFlagHis and pEco-NHalo-
CHis were applied to E. coli in vivo and in vitro protein expression; pBac-NFlagHA was applied to
Baculovirus expression of GPCRs; pIRES-CHaloFlagHis and pIRES-CHaloFlagHis-Gateway were
used to express HaloTagged proteins in the human IVT system. T7 pro., T7 promoter sequence; T7
term., T7 terminator sequence; polyhedrin pro., polyhedrin promoter sequence; RBS, ribosomal
binding site; IRES, internal ribosome entry site. DNA sequences can be found in Supplementary Table
5.
SUPPLEMENTARY INFORMATION
2 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
25
SUPPLEMENTARY REFERENCES
2 Dreze, M. et al. High-quality binary interactome mapping. Methods Enzymol. 470, 281-315 (2010).
5 Weiss, S. Fluorescence spectroscopy of single biomolecules. Science 283, 1676-1683 (1999). 6 Hanes, J. & Pluckthun, A. In vitro selection and evolution of functional proteins by using
ribosome display. Proc. Natl. Acad. Sci. U.S.A. 94, 4937-4942 (1997). 7 Mitra, R. D. & Church, G. M. In situ localized amplification and contact replication of many
individual DNA molecules. Nucleic Acids Res. 27, e34 (1999). 10 Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator
chemistry. Nature 456, 53-59 (2008). 11 Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome.
Science 309, 1728-1732 (2005). 13 Fredriksson, S. et al. Protein detection using proximity-dependent DNA ligation assays. Nat.
Biotechnol. 20, 473-477 (2002). 14 Hammond, M., Nong, R. Y., Ericsson, O., Pardali, K. & Landegren, U. Profiling cellular
protein complexes by proximity ligation with dual tag microarray readout. PLoS One 7 (2012).
15 Lundberg, M., Eriksson, A., Tran, B., Assarsson, E. & Fredriksson, S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res. 39, e102 (2011).
16 Zhu, J. et al. Protein interaction discovery using parallel analysis of translated ORFs (PLATO). Nat. Biotechnol. 31, 331-334 (2013).
27 Yu, H. et al. Next-generation sequencing to generate interactome datasets. Nat. Methods 8, 478-480 (2011).
30 McGregor, L. M., Jain, T. & Liu, D. R. Identification of ligand-target pairs from combined libraries of small molecules and unpurified protein targets in cell lysates. J. Am. Chem. Soc. 136, 3264-3270 (2014).
31 Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nature Methods 8, 659-661 (2011).
39 Philimonenko, A. A., Janacek, J. & Hozak, P. Statistical evaluation of colocalization patterns in immunogold labeling experiments. J. Struct. Biol. 132, 201-210 (2000).
40 Lewis, J. D. et al. Quantitative interactor screening with next-generation sequencing (QIS-Seq) identifies Arabidopsis thaliana MLO2 as a target of the Pseudomonas syringae type III effector HopZ2. BMC Genomics 13 (2012).
41 Georgiou, G. et al. The promise and challenge of high-throughput sequencing of the antibody repertoire. Nat. Biotechnol. 32, 158-168 (2014).
42 Larman, H. B. et al. Autoantigen discovery with a synthetic human peptidome. Nat. Biotechnol. 29, 535-541 (2011).
43 Fujimori, S. et al. Next-generation sequencing coupled with a cell-free display technology for high-throughput production of reliable interactome data. Sci. Rep. 2, 691-691 (2012).
44 Darmanis, S. et al. ProteinSeq: high-performance proteomic analyses by proximity ligation and next generation sequencing. PLoS One 6 (2011).
45 Young, K. et al. Identification of a calcium channel modulator using a high throughput yeast two-hybrid screen. Nat. Biotechnol. 16, 946-950 (1998).
46 Nishihara, T. et al. Estrogenic activities of 517 chemicals by yeast two-hybrid assay. J. Health Sci. 46, 282-298 (2000).
47 Chidley, C., Haruki, H., Pedersen, M. G., Muller, E. & Johnsson, K. A yeast-based screen reveals that sulfasalazine inhibits tetrahydrobiopterin biosynthesis. Nat. Chem. Biol. 7, 375-383 (2011).
W W W. N A T U R E . C O M / N A T U R E | 2 5
SUPPLEMENTARY INFORMATION RESEARCH
26
48 Wrighton, N. C. et al. Small peptides as potent mimetics of the protein hormone erythropoietin. Science 273, 458-463 (1996).
49 Lowman, H. B. Bacteriophage display and discovery of peptide leads for drug development. Annu. Rev. Biophys. Biomol. Struct. 26, 401-424 (1997).
50 Leuchowius, K. J. et al. High content screening for inhibitors of protein interactions and post-translational modifications in primary cells by proximity ligation. Mol. Cell. Proteomics. 9, 178-183 (2010).
51 Porreca, G. J., Shendure, J. & Church, G. M. Polony DNA sequencing. Curr. Protoc. Mol. Biol. 7.8 (2006).
52 Hanisch, K. H. & Stoyan, D. Formulas for second-order analysis of marked point processes. Math. Operationsforsch. Statist., Ser. Statitics 14, 559 (1979).
53 Phizicky, E. M. & Fields, S. Protein-protein interactions: methods for detection and analysis. Microbiol. Rev. 59, 94-123 (1995).
54 Zhu, H. & Snyder, M. Protein chip technology. Curr. Opin. Chem. Biol. 7, 55-63 (2003). 55 Piehler, J. New methodologies for measuring protein interactions in vivo and in vitro. Curr.
Opin. Struct. Biol. 15, 4-14 (2005). 56 Boozer, C., Kim, G., Cong, S., Guan, H. & Londergan, T. Looking towards label-free
biomolecular interaction analysis in a high-throughput format: a review of new surface plasmon resonance technologies. Curr. Opin. Biotechnol. 17, 400-405 (2006).
57 Berggard, T., Linse, S. & James, P. Methods for the detection and analysis of protein-protein interactions. Proteomics 7, 2833-2842 (2007).
58 Dias-Neto, E. et al. Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysis. PLoS One 4 (2009).
59 Mendez-Rios, J. & Uetz, P. Global approaches to study protein-protein interactions among viruses and hosts. Future Microbiol. 5, 289-301 (2010).
60 Ravn, U. et al. By-passing in vitro screening-next generation sequencing technologies applied to antibody display and in silico candidate selection. Nucleic Acids Res. 38 (2010).
61 Larman, H. B., Liang, A. C., Elledge, S. J. & Zhu, J. Discovery of protein interactions using parallel analysis of translated ORFs (PLATO). Nat. Protoc. 9, 90-103 (2014).
62 Fields, S. & Song, O. K. A novel genetic system to detect protein-protein interactions. Nature 340, 245-246 (1989).
63 Chien, C. T., Bartel, P. L., Sternglanz, R. & Fields, S. The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. Proc. Natl. Acad. Sci. U. S. A. 88, 9578-9582 (1991).
64 Rual, J. F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173-1178 (2005).
65 Rossi, F., Charlton, C. A. & Blau, H. M. Monitoring protein-protein interactions in intact eukaryotic cells by beta-galactosidase complementation. Proc. Natl. Acad. Sci. U. S. A. 94, 8405-8410 (1997).
66 Stagljar, I., Korostensky, C., Johnsson, N. & te Heesen, S. A genetic system based on split-ubiquitin for the analysis of interactions between membrane proteins in vivo. Proc. Natl. Acad. Sci. U. S. A. 95, 5187-5192 (1998).
67 Remy, I. & Michnick, S. W. Clonal selection and in vivo quantitation of protein interactions with protein-fragment complementation assays. Proc. Natl. Acad. Sci. U. S. A. 96, 5394-5399 (1999).
68 Pelletier, J. N., Arndt, K. M., Pluckthun, A. & Michnick, S. W. An in vivo library-versus-library selection of optimized protein-protein interactions. Nat. Biotechnol. 17, 683-690 (1999).
69 Rual, J. F., Hill, D. E. & Vidal, M. ORFeome projects: gateway between genomics and omics. Curr. Opin. Chem. Biol. 8, 20-25 (2004).
SUPPLEMENTARY INFORMATION
2 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
27
70 Yashiroda, Y., Matsuyama, A. & Yoshida, M. New insights into chemical biology from ORFeome libraries. Curr. Opin. Chem. Biol. 12, 55-59 (2008).
71 Clackson, T., Hoogenboom, H. R., Griffiths, A. D. & Winter, G. Making antibody fragments using phage display libraries. Nature 352, 624-628 (1991).
72 Bratkovic, T. Progress in phage display: evolution of the technique and its applications. Cell. Mol. Life Sci. 67, 749-767 (2010).
73 Kieke, M. C., Cho, B. K., Boder, E. T., Kranz, D. M. & Wittrup, K. D. Isolation of anti-T cell receptor scFv mutants by yeast surface display. Protein Eng. 10, 1303-1310 (1997).
74 Gai, A. S. & Wittrup, D. K. Yeast surface display for protein engineering and characterization. Curr. Opin. Struct. Biol. 17, 467-473 (2007).
75 Stahl, S. & Uhlen, M. Bacterial surface display: Trends and progress. Trends Biotechnol. 15, 185-192 (1997).
76 Samuelson, P., Gunneriusson, E., Nygren, P. A. & Stahl, S. Display of proteins on bacteria. J. Biotechnol. 96, 129-154 (2002).
77 Beerli, R. R. et al. Isolation of human monoclonal antibodies by mammalian cell display. Proc. Natl. Acad. Sci. U. S. A. 105, 14336-14341 (2008).
78 Zhou, C., Jacobsen, F. W., Cai, L., Chen, Q. & Shen, W. D. Development of a novel mammalian cell surface antibody display platform. MAbs 2, 508-518 (2010).
79 Diamante, L., Gatti-Lafranconi, P., Schaerli, Y. & Hollfelder, F. In vitro affinity screening of protein and peptide binders by megavalent bead surface display. Protein Eng. Des. Sel. 26, 713-724 (2013).
80 Huang, L. C. et al. Linking genotype to phenotype on beads: high throughput selection of peptides with biological function. Sci. Rep. 3 (2013).
81 http://www.piercenet.com/page/crosslinking-reagents-technical-handbook-1601673. 82 Pritz, S. et al. Synthesis of biologically active peptide nucleic acid-peptide conjugates by
sortase-mediated ligation. J. Org. Chem. 72, 3909-3912 (2007). 83 Wilson, D. S., Keefe, A. D. & Szostak, J. W. The use of mRNA display to select high-affinity
protein-binding peptides. Proc. Natl. Acad. Sci. U.S.A. 98, 3750-3755 (2001). 84 Yonezawa, M., Doi, N., Kawahashi, Y., Higashinakagawa, T. & Yanagawa, H. DNA display
for in vitro selection of diverse peptide libraries. Nucleic Acids Res. 31 (2003). 85 Cotten, S. W., Zou, J. W., Valencia, C. A. & Liu, R. H. Selection of proteins with desired
properties from natural proteome libraries using mRNA display. Nat. Protoc. 6, 1163-1182 (2011).
86 Schallmeiner, E. et al. Sensitive protein detection via triple-binder proximity ligation assays. Nat. Methods 4, 135-137 (2007).
87 Darmanis, S. et al. Sensitive plasma protein analysis by microparticle-based proximity ligation assays. Mol. Cell. Proteomics 9, 327-335 (2010).
88 Ke, R., Nong, R. Y., Fredriksson, S., Landegren, U. & Nilsson, M. Improving precision of proximity ligation assay by amplified single molecule detection. PLoS One 8 (2013).
89 Soderberg, O. et al. Direct observation of individual endogenous protein complexes in situ by proximity ligation. Nat. Methods 3, 995-1000 (2006).
90 Gustafsdottir, S. M. et al. In vitro analysis of DNA-protein interactions by proximity ligation. Proc. Natl. Acad. Sci. U. S. A. 104, 3067-3072 (2007).
91 Soderberg, O. et al. Characterizing proteins and their interactions in cells and tissues using the in situ proximity ligation assay. Methods 45, 227-232 (2008).
92 Jarvius, M. et al. In situ detection of phosphorylated platelet-derived growth factor receptor beta using a generalized proximity ligation method. Mol. Cell. Proteomics 6, 1500-1509 (2007).
93 Leuchowius, K. J., Weibrecht, I., Landegren, U., Gedda, L. & Soderberg, O. Flow cytometric in situ proximity ligation analyses of protein interactions and post-translational modification
W W W. N A T U R E . C O M / N A T U R E | 2 7
SUPPLEMENTARY INFORMATION RESEARCH
28
of the epidermal growth factor receptor family. Cytometry A. 75A, 833-839 (2009). 94 Walt, D. R. Optical methods for single molecule detection and analysis. Anal. Chem. 85,
1258-1263 (2013). 95 Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods
10, 857-860 (2013). 96 Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360-
1363 (2014). 97 Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput
sequencing instrument. Nat. Biotechnol. 29, 659-664 (2011). 98 Buenrostro, J. D. et al. Quantitative analysis of RNA-protein interactions on a massively
parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562-568 (2014).