nmeth.2099 si titles - nature research · supplementary table 17 3xquest search of the gst [d 0 /d...

nature|methods

Identification of cross-linked peptides from complex samples Bing Yang, Yan-Jie Wu, Ming Zhu, Sheng-Bo Fan, Jinzhong Lin, Kun Zhang, Shuang Li, Hao Chi, Yu-Xin Li, Hai-Feng Chen, Shu-Kun Luo, Yue-He Ding, Le-Heng Wang, Zhiqi Hao, Li-Yun Xiu, She Chen, Keqiong Ye, Si-Min He & Meng-Qiu Dong

Supplementary File Title

Supplementary Figure 1 Cross-linking products and BS3 the cross-linker.

Supplementary Figure 2 The spectrum of a pair of cross-linked peptides is much more complex than that of a single linear peptide.

Supplementary Figure 3 Optimization of cross-linking conditions.

Supplementary Figure 4 HPLC separation of cross-linking isoforms.

Supplementary Figure 5 Estimation of non-specific cross-linking.

Supplementary Figure 6 Effectiveness of the spectral quality score (SQS).

Supplementary Figure 7 Unmatched peaks in 1 m/z bins collected from 1,030 HCD spectra.

Supplementary Figure 8 Open search mode and FDR estimation.

Supplementary Figure 9 Monte Carlo simulation of the probability distribution of T.

Supplementary Figure 10 Illustration of Pre_score calculation.

Supplementary Figure 11 Illustration of fragment ions found only in cross-link spectra.

Supplementary Figure 12 Usage of different fragment-ion types in KSDP scoring.

Supplementary Figure 13 Optimization of ion type usage improves the separation between target and random matches.

Supplementary Figure 14 Consideration of cross-link specific ion types improves identification.

Supplementary Figure 15 FDR estimation.

Supplementary Figure 16 A cross-link spectrum annotated by pLabel.

Supplementary Figure 17 Cα-Cα distance distribution of any lysine pairs, cross-linkable lysine pairs and observed cross-linked lysine pairs in GST.

Supplementary Figure 18 CXMS analysis of E. coli and C. elegans lysates.

Nature Methods: doi:10.1038/nmeth.2099

Supplementary Table 1 Cross-link search space of E. coli, C. elegans,and human proteome.

Supplementary Table 2 Sequences of 38 synthetic peptides.

Supplementary Table 3 Datasets used for training and testing pLink.

Supplementary Table 4 Features and their weights in SQS calculation.

Supplementary Table 5 Significantly matched fragments after spectrum pre-processing.

Supplementary Table 6 Usage of fragment ions in pLink fine scoring.

Supplementary Table 7 CXMS analysis of GST.

Supplementary Table 8 CXMS analysis of the CNGP (Cbf5-Nop10-Gar1-Nhp2) complex.

Supplementary Table 9 CXMS result of the six-subunit, 550 kDa UTP-B complex.

Supplementary Table 10 CXMS analysis of C. elegans FIB-1::GFP IP.

Supplementary Table 11 Inter-linked peptides identified from E. coli lysates.

Supplementary Table 12 Inter-linked peptides identified from a C. elegans lysate.

Supplementary Table 13 Cross-linking analysis of the CNGP complex using DSS.

Supplementary Table 14 Cross-linking analysis of the CNGP complex using EDC.

Supplementary Table 15 Cross-linking analysis of the CNGP complex using AMAS.

Supplementary Table 16 Cross-linking analysis of the CNGP complex using Sulfo-GMBS.

Supplementary Table 17 xQuest search of the GST [d0/d4]-BS3 cross-linking data.

Supplementary Table 18 Comparison of the GST [d0/d4]-BS3 cross-links identified by xQuest and pLink.

Supplementary Table 19 xQuest search of the GST [d0]-BS3 cross-linking data.

Supplementary Table 20 Comparison of the GST [d0]-BS3 cross-links identified by xQuest and pLink.

Supplementary Note The pLink algorithm and supplementary discussion.


Regular

Mono-linked (Type 0)

Loop-linked (Type 1)

Inter-linked (Type 2)

Supplementary Figure 1. Cross-linking products and BS3 the cross-linker. (A) Digestion of chemically cross-linked proteins generates mono-, loop-, and inter-linked peptides besides regular peptides unmodified by the linker. (B) Chemical structures of [d0]- and [d4]-BS3 and (C) the expected pattern of cross-linked peptides in full MS scans.

A B

Light [d0]-BS3

Heavy [d4]-BS3

D D

D D

4.025 Da

Expected L:H intensity ratio 1:1

C

Supplementary Figure 1. Cross-linking products and BS3 the cross-linker. (D-G) Representative full MS scans showing that [d0]- and [d4]-BS3 modified peptides co-elute, with the heavy cross-link slightly ahead of the light one.

D

UTP-b-XL-2500ng-HCD #2195 RT: 34.64 AV: 1 NL: 2.14E6F: FTMS + p NSI Full ms [300.00-2000.00]

406.8 407.0 407.2 407.4 407.6 407.8 408.0 408.2 408.4 408.6 408.8 409.0 409.2 409.4m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

407.986

408.236

408.487

408.737406.979

407.230407.735 409.227407.482 408.986

34.64 min

L H L

L L

H

H

H

H

H

E


406.8 407.0 407.2 407.4 407.6 407.8 408.0 408.2 408.4 408.6 408.8 409.0 409.2 409.4m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

407.986

408.236

408.487406.979

407.230

407.481 408.738

408.989407.732

34.70 min

L L

L

L H

H

H

H

H


406.8 407.0 407.2 407.4 407.6 407.8 408.0 408.2 408.4 408.6 408.8 409.0 409.2 409.4m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

406.980

407.986

407.230408.237

407.481

408.487

408.738407.732

408.990

F 34.82 min L L

L

L H

H

H

H

H


406.8 407.0 407.2 407.4 407.6 407.8 408.0 408.2 408.4 408.6 408.8 409.0 409.2 409.4m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

406.980

407.230

407.986407.481408.237

408.488

407.732406.717 407.197

G 34.93 min L

L

L

L

H

H

H

Supplementary Figure 2. The spectrum of a pair of cross-linked peptides is much more complex than that of a single linear peptide.

Supplementary Figure 3. Optimization of cross-linking conditions.(A) 10 µM GST was cross-linked with indicated amount of BS3 in 50 mM HEPES, pH 7.5, 100 mM KCl at room temperature for 1 hour. (B) 10 µM GST was cross-linked with 0.5 mM BS3 (50x) in the same buffer at RT for indicated amount of time.

M 0 10x 20x 50x 100x 200x 400x M 0 5m 15m 30m 1h 2h 4h

GST

X-linked GST dimer

A B

* *

Supplementary Figure 4. HPLC separation of cross-linking isoforms. (A) Different isoforms of cross-links, along with mono- and loop-links (not shown), are generated from a pair of peptides and can be separated by reverse phase HPLC.

A

Supplementary Figure 4. HPLC separation of cross-linking isoforms. (B) A pair of cross-link isoforms from BS3 treated E. coli lysate.

B

Supplementary Figure 5. Estimation of non-specific cross-linking. Ovalbumin, BSA, and a hetero-dimeric protein complex F15E11.13/F15E11.14 were mixed at indicated concentrations and processed for CXMS analysis. Non-specific cross-linking between ovalbumin and BSA, BSA and F15E11.13/14, or ovalbumin and F15E11.13/14 were observed 0.3 times per sample per experiment. In contrast, within protein/complex cross-links were observed 168.2 times per sample per experiment. BSA cross-links were more readily detected than others. Plotted are the average of two sets of experiments.

Protein (µg/µl)

ovalbumin 1.0 4.0 9.5 15.0 18.0

BSA 18.0 15.0 9.5 4.0 1.0

F15E11.13/14 1.0 1.0 1.0 1.0 1.0

0

50

100

150

200

250

#Cross-‐link spe

ctra

Oval-‐Oval

BSA-‐BSA

F15E11.13-‐F15E11.14

Non-‐Specific-‐Total

Supplementary Figure 6. Effectiveness of the Spectral Quality Score (SQS).With a 4.2 SQS cutoff (grey vertical lines), 93% of the non-cross-link spectra are removed while 99.5% of the cross-link spectra are retained.

Supplementary Figure 7. Unmatched peaks in 1 m/z bins collected from 1030 HCD spectra. Some low m/z noise peaks, especially 108, 153, and 200, occur in nearly every spectrum.

108108

153

200

1000

15001000500

10000

9000

8000

7000

6000

5000

4000

3000

2000

200000

Peaks in HCD spectra (m/z)

Rel

ativ

e In

tens

ity

Supplementary Figure 8. Open search mode and FDR estimation.(A) The open search mode for large databases. (B) A –log(pre_score) cutoff of 3.0 is effective in removing 89% of the non-cross-link spectra.

Treat Δmass as modificaMon on K

Open Database Search

PreScore against any pep. w/ mass <

precursor

K

K

K K K

K

…

Pep mass (w/o modificaMon) ≥ or ≤ ½ precursor?

α pepMdes β pepMdes

K K

K … K

K

K

…

≥ ≤

Pair up top 500 α and β pepMdes:

α + β + linker = precursor

Fine scoring against the candidate pairs

A

!

B

Supplementary Figure 9. Monte Carlo simulation of the probability distribution of T. Statistic T is the normalized length of the longest sequence tag from a spectrum. For a candidate peptide of 5, 8, 10, 15, 18, 20, or 25 aa, the probability of T (by chance) > t (observed) is shown.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

581015182025

Prob

ability

Normalized Tag Length

Supplementary Figure 10. Illustration of Pre_score calculation. See supplemental methods for explanation.

N=MH/w =int(1538.637002/0.5) = 1538

4 1376 1532

17

1538

7

( ) 1.5*10

( ) 1.5*10

x l x

n N n

l

N

P x

Hyper X x

C C C C

C C

2

22

(2 /17 3 /17 4 /17 10 /17) / 4 0.2794;;

0.5;;

1/ (12* ) 1/ (12*4) 0.1443;;

( )1( ) 0.8593;;2

( ) 0.0632

r

x

r

P r

Norm R r

e

9( , , ) ( )* ( )* ( ) 3.3*10pvalue x r t Hyper X x Norm R r Taglen T t

t=2/7=0.28

Taglen(t>0.28)=0.37

x=4

ACDEKGK l=17

n=6

A AC ACDACDE

ACDEK

1.0 r=2/173/174/17

10/17

ACDEKG

Supplementary Figure 11. Illustration of fragment ions found only in cross-link spectra. (A-B) examples of yb-type ions; (C) example of a K-linked ion; (D) fragments resulting from cleavage of the linker.

A B

C D

Supplementary Figure 12. Usage of different fragment ion types in KSDP scoring. See Supplemental Methods for explanation.

Supplementary Figure 13. Optimization of ion type usage improves the separation between target and random matches. A total of 1030 cross-link spectra are searched against a positive database which contains the target sequences (solid lines) and a negative database which does not (dashes lines) using the initial (green) or optimized (red) ion type setting.

#Spe

ctra, cum

ula/

ve

–ln(E-‐value)

old b1+, b2+, y1+, y2+

new b1+, b2+, y1+, y2+, a1+, a2+, yb1+, ya1+, KLα(KLβ), Lα/Lβ1+, Lα/Lβ2+

Supplementary Figure 14. Consideration of cross-link specific ion types improves identification. A total of 1030 cross-link spectra are searched against a positive database which contains the target sequences (solid lines) and a negative database which does not (dashes lines), using only the basic ion types (green) or the basic plus cross-link specific ion types (red).

Basic b1+, b2+, y1+, y2+, a1+, a2+

All b1+, b2+, y1+, y2+, a1+, a2+, yb1+, ya1+, KLα(KLβ), Lα/Lβ1+, Lα/Lβ2+

Refined_Score

#Spe

ctra, cum

ula/

ve

F R + F-‐F R-‐R F-‐R R-‐F

Cross-‐link in silico

T U F

Supplementary Figure 15. FDR estimation.(A) FDR estimation based on a modified reverse database strategy. T, U, and F, for true, union, and false, are possible outcomes of in silico cross-linking of forward (F) and reversed (R) protein/peptide sequences. By random match, spectra fall into T, U and F at a 1:2:1 ratio. (B) When the CXMS data of the yeast UTP-B complex are searched against three databases (human, archaea, and random) that have no UTP-B subunit sequences in them, the percentages of spectra that match to T, U, and F are about 25%, 50%, and 25%, respectively. When searched against a target database containing UTP-B subunit sequences, the percentage of spectra that match to T increases, while those to U and F decrease.

0 5 10 15 20 25 30 35 40 45 50

human archaea random target

% Spe

ctra

T

U

F

B

A

Supplementary Figure 15. FDR estimation.(C) Reliable estimation of FDR for cross-link identification. Only when FDR exceeds 45%, which is much greater than the conventional cutoff values, the estimated FDR begins to deviate significantly from the real FDR.

C

-30 -25 -20 -15 -10 -5 0 50

0.1

0.2

0.3

0.4

0.5

log10 eValue

FDR

Estimated and Real FDR on all charges

Estimated FDRReal FDR

Es/mated FDR = (U-‐F)/T

Log10(E-‐value)

FDR

Supplementary Figure 16. A cross-link spectrum annotated by pLabel. The inset shows double cleavage products.

Supplementary Figure 17. Cα-Cα distance distribution of any lysine pairs, cross-linkable lysine pairs, and observed cross-linked lysine pairs in GST.

GST

0

10

20

30

40

50

60

70

5 10 15 20 25 30 35 40 45 50 55 60 65

No. of lysine pa

irs

Cα-‐Cα distance (Å)

any K-‐K pair

cross-‐linkable pair

observed corss-‐link

Supplementary Figure 18. CXMS analysis of E. coli and C. elegans lysates. (A) 394 inter-linked peptide pairs were identified from E. coli lysates. (B) 75.5% of the E. coli cross-links are compatible with the structures of corresponding proteins/complexes deposited in the PDB database (179/237). (C) Y2H verified interactions represented by five E. coli inter-links. (D) 39 inter-linked peptide pairs were identified from a C. elegans lysate.

Intra-‐molecular,

270

Inter-‐molecular,

124 Compa/ble 179

Incompa/ble 58

Structure unavailable

157

A B

C

E. coli E. coli

Inter-‐molecular,

10 intra-‐

molecular, 29

D

C. elegans

3 4 5

6

7 8 1

2

1.  posiMve control 2.  negaMve control 3.  AD-‐AAA97042.1 + BD-‐NP_416801.2 (#91) 4.  AD-‐AAC73200.1 + BD-‐AAC75219.1 (#98) 5.  AD-‐NP_416518.2 + BD-‐AAC73708.1 (#115) 6.  AD-‐YP_026243.1 + BD-‐AAA58136.1 (#71) 7.  AD-‐YP_025307.1 + BD-‐AAA58136.1 (#69) 8.  AD-‐AAC74522.1 + BD-‐AAA58136.1 (#70)

– LW – LWH

Yang et al. 6/7/12 2:08 PM 1

Supplementary Tables for “Identification of Cross-linked Peptides from Complex Samples” by Yang et al.

Supplementary Table 1: Cross-link search space of E. coli, C. elegans, and human proteome

Database Proteins Regular search space

(#candidate peptides)

Cross-‐link search space (#candidate

peptide pairs)

E. coli 6126 3.07 x105 4.72 x1010

C. elegans 24652 2.14 x106 2.31 x1012

Human 87069 3.67 x106 6.74 x1012

Note: For n number of candidate linear peptides, the number of possible cross-linked peptide pairs is 0.5*n*(n+1) for cross-

link search.

Supplementary Table 2: Sequences of 38 synthetic peptides Peptide Sequence Length (aa) Mass of [M+H]+ AR-9 AILVNFKAR 9 1031.6360 AR-9-1 AQFKTVSTR 9 1037.5738 DK-10 DGMIKLWDLK 10 1218.6551 DR-7 DMKLWQR 7 976.5033 DR-19 DPTSPAPLKTHTIELQGQR 19 2089.1036 DK-7 DQEAQKK 7 846.4315 DR-14 DWNTNAKTHTIAQR 14 1655.8248 ER-28 EGSQSKDYSSLLATLINFSPAAVDLEIR 28 3024.5524 ER-13 EKQFLNALVMAFR 13 1566.8461 FK-12 FILTTSKDLSAK 12 1323.7518 FR-9 FVKQQWNLR 9 1218.6742 GK-5 GKNSK 5 533.3042 GK-21 GLWDVSFCQYDKLLATSSGDK 21 2390.1332 IR-9 IHVLKNIHR 9 1129.6952 KR-14 KCLHTLQEHTSAVR 14 1679.8646 KR-11 KDAQSQEMSQR 11 1307.6008 KK-20 KLGEAPIKPQGNAVLIAVNK 20 2060.2226 KR-7 KMRPEVR 7 915.5193 KK-10 KNIAAIDLNK 10 1099.6469 KR-7 KVEEDVR 7 874.4628 LK-11 LDAEEQMNKFK 11 1352.6514 LK-12 LDIDQLQLSVKK 12 1399.8155 LR-7 LFNVLKR 7 889.5618 LK-12 LLATSSGDKTVK 12 1219.6892 LK-17 LNEEYLINKVYEAIPIK 17 2049.1266 LR-20 LSEIPGMVKIVDAIIPYTQR 20 2243.2467 NR-10 NEELKLQINR 10 1256.6957 NK-8 NKFLPVLK 8 958.6084 NP-9 NSKIFSPFR 9 1095.5945 SR-14 SDFKFSNLLGTVYR 14 1646.8536 SK-22 SHKDSITGFWCQGEDWLISTSK 22 2582.1980 SK-26 SLNSFEPFDEIVWFIDALTQGLKSNK 26 2998.5196 TR-8 TPDVNKDR 8 944.4796 VK-9 VGGSTIKSK 9 876.5149 VR-11 VKCVTFHPATR 11 1315.6939 VR-6 VKTELR 6 745.4566 VR-7 VWDLVKR 7 915.5410 YK-14 YFAYISKLDSASVK 14 1591.8366

Note: All peptide are ≥ 98% pure, with a free N-terminal NH2 and a C-terminal COOH. Cysteine residues were

carbamidomethylated.


Yang et al. 6/7/12 2:08 PM 2

Supplementary Table 3: Datasets used for training and testing pLink

Dataset Source of data #Spectra

A Non-‐redundant inter-‐link spectra from 741 peptide pairs

(Note: the highest scoring spectrum was kept from the ones identifying the same cross-‐link)

2077

Sub_A1 A subset of A, containing only light [d0]-‐BS3 cross-‐links 1030

Sub_A2 A subset of A, containing only heavy [d4]-‐BS3 cross-‐links 1047

B All inter-‐link spectra from 741 peptide pairs, including redundant ones 13267

Sub_B1 A subset of B, containing only light [d0]-‐BS3 cross-‐links 7706

Sub_B2 A subset of B, containing only heavy [d4]-‐BS3 cross-‐links 5561

C Spectra that failed to identify inter-‐links from the 741 peptide pair

experiments

153368

D HCD spectra of regular peptides identified from C. elegans lysates 21116

Supplementary Table 4: Features and their weights in SQS calculation

No. Feature Weight

1 Number of peaks 0.306 2 Number of peaks with known charge states 0.256

3 Fraction of known-‐charge-‐state peaks in all peaks 0.007 4 Average Peak Intensity 0.029

5 Standard deviation of peak intensities 0.043 6 Smallest m/z range containing 95% of the total peak intensity 0.008

7 Smallest m/z range containing 50% of the total peak intensity 0.024 8 Total ion current per m/z 0.045

9 Standard deviation of the consecutive m/z gaps between all peaks -‐0.023 10 Average number of neighbor peaks within a 2-‐Da interval around any peak -‐0.005

11 Length of the longest tag 0. 257 12 #Tags (number of peak pairs that differ by an amino acid residue mass) 0.205

13 #Tags per peak (#Tags / #peaks) 0.097 14 Fractional peak intensity of tags (summed geometric means of peak intensities

of every peak pair in feature 12 divided by the summed geometric means of all peak pairs)

0.0004

All features except 2 and 3 are calculated as described before by Nesvizhskii2.


Yang et al. 6/7/12 2:08 PM 3

Supplementary Table 5. Significantly matched fragments after spectrum pre-processing

Ion type Match Significance

Match count ratio

Match gain ratio

Average intensity

Ave. mass deviation (m/z)

Cont. ratio

Ave. # of residues

y1+ 0.034 0.218 0.671 0.232 -‐0.00062 0.617 4.583

y2+ 0.004 0.093 0.314 0.143 -‐0.00061 0.270 9.270 b1+ 0.001 0.047 0.161 0.119 -‐0.00013 0.143 4.330

yb1+ 7.627 x10-‐4 0.109 0.096 0.073 -‐0.00042 n/a 3.226 b2+ 5.865 x10-‐4 0.046 0.164 0.077 -‐0.00054 0.133 8.591

ya1+ 2.797 x10-‐4 0.060 0.053 0.089 -‐0.00038 n/a 1.772 a1+ 2.279 x10-‐4 0.020 0.065 0.179 -‐0.00024 0.051 1.737

y3+ 8.825 x10-‐5 0.019 0.069 0.069 -‐0.00046 0.058 7.008 αL/βL 7.403 x10-‐5 0.021 0.092 0.037 -‐0.00049 n/a 4.988

b3+ 5.976 x10-‐5 0.025 0.086 0.028 -‐0.00021 0.071 7.881 [y-‐H2O]

1+ 4.256 x10-‐5 0.009 0.031 0.152 -‐0.00007 0.032 0.574

a2+ 3.921 x10-‐5 0.014 0.056 0.049 -‐0.00036 0.048 4.903 KLα/KLβ 3.772 x10-‐5 0.007 0.140 0.036 -‐0.00029 n/a 2.562

[M+5H]5+ 3.001 x10-‐5 0.003 0.192 0.053 -‐0.00006 n/a 5.842 a3+ 1.867 x10-‐5 0.010 0.042 0.043 -‐0.00032 0.038 5.321

b4+ 1.656 x10-‐5 0.014 0.050 0.023 -‐0.00043 0.040 8.576 [y-‐NH3-‐H2O]

5+ 1.620 x10-‐5 0.005 0.015 0.200 0.00046 0.015 1.400

[M+4H]4+ 1.207 x10-‐5 0.002 0.110 0.055 0.00011 n/a 2.710 a4+ 1.081 x10-‐5 0.008 0.050 0.027 -‐0.00018 0.050 3.300

[M+5H-‐NH3-‐H2O]

5+ 8.058 x10-‐6 0.002 0.175 0.022 -‐0.00062 n/a 4.350

Note: Ions types colored in red are only found in spectra of cross-linked peptides. The αL/βL and KLα/KLβ categories contain all charges states found in the spectra. KLα/KLβ ions include neutral loss species KLα-‐17 and KLβ-‐17, too.


Yang et al. 6/7/12 2:08 PM 4

Supplementary Table 6: Usage of fragment ions in pLink fine scoring

Ion type Sub-‐type (S/X/B) weighted for ion continuity（Y/N） Initial setting

b1+ B Y b2+ B Y y1+ B Y y2+ B Y

Optimized b1+ S Y b2+ X N b3+ X N y1+ B Y y2+ B Y y3+ X N a1+ S N a2+ B N yb1+ B N ya1+ B N

KLα(KLβ) X N αL(βL) B N

S for simple ions: of a fragment ion type in question, only the cross-link-free sub-type is considered;

X for Xlink ions: of a fragment ion type in question, only the cross-link-containing sub-type is considered;

B for both: all ions of the indicated type are considered, containing a linker or not.


Yang et al. 6/7/12 2:08 PM 5

Supplementary Table 7. CXMS analysis of GST Cross-‐linking sites

Cross-‐linked peptides Cα-‐Cα (Å)

#total spec

#spec exp-‐1

#spec exp-‐2

#spec exp-‐3

Best E-‐value

Spec qual. manual

evaluation

1:1 d0:d4 pair?

Agree with struc.?

GST26-‐0 LLLEYLEEKYEEHLYER(9)-‐MSPILGYWK(1)

~7.24 30 13 12 5 6.89E-‐21 high Yes Yes

GST26-‐1 LLLEYLEEKYEEHLYER(9)-‐SPILGYWK(1)

7.24 24 13 10 1 2.59E-‐18 high Yes Yes

GST124-‐112 VDFLSKLPEMLK(6)-‐IAYSKDFETLK(5)

20.39/ *22.02

12 7 1 4 6.68E-‐05

mid Yes Yes

GST180-‐193 KRIEAIPQIDK(1)-‐YLKSSK(3) 14.24 13 3 0 10 8.83E-‐12 high Yes Yes

GST217-‐10 YIAWPLQGWQATFGGGDHPPKSDLVPR(21)-‐IKGLVQPTR(2)

~15.85 16 6 2 8 1.99E-‐09 high Yes Yes

GST26-‐39 LLLEYLEEKYEEHLYER(9)-‐DEGDKWR(5)

24.57 19 2 5 12 2.38E-‐04 low Yes No

GST63-‐0 KFELGLEFPNLPYYIDGDVKLTQSMAIIR(20)-‐MSPILGYWK(1)

~12.10 3 1 2 0 7.54E-‐05

low Yes Yes

GST217-‐112 YIAWPLQGWQATFGGGDHPPKSDLVPR(21)-‐IAYSKDFETLK(5)

~16.46 5 0 0 5 2.63E-‐07 mid Yes Yes

Note: Experiment #1 and #2 were carried out on LTQ-orbitrap-ETD and experiment #3 on LTQ-orbitrap Velos. Cross-link identifications were filtered by requiring 10 ppm mass accuracy, FDR < 5%, E-value < 0.01, and ≥ 3 spectral observations. Those detected in 2 out of 3 experiments and have > 3 spectral copies are shown in Fig. 2a. The starting amino acid Met0 and amino acids C-terminal to Pro216 are not visible in the X-ray structure, so the distance involving Met0 or Lys217 is measured using the nearest residue Ser1 or Pro216 instead. All cross-links except GST26-39 have an intra-subunit Cα-‐Cα distance of less than 24 Å, therefore are structurally sound. The cross-link GST124-112 can be either intra-subunit or inter-subunit (labeled with *).


Yang et al. 6/7/12 2:08 PM 6

Supplementary Table 8. CXMS analysis of the CNGP (Cbf5-Nop10-Gar1-Nhp2) complex No. Inter-‐linked

lysine pairs #total spec (#pair)

#spec exp-‐1

#spec exp-‐2

Best E-‐value

Cα-‐Cα Distance

(Å)

Manual eval. of

spec qual.

Compatible with

structure?

1 Cbf5 161-‐Gar1 115 15 (2) 0 15 1.77E-‐10 18.85 high Yes

2 Cbf5 180-‐Cbf5 134 21 (2) 4 17 1.59E-‐24 10.99 high Yes

3 Cbf5 180-‐Nop10 18 9 (1) 3 6 6.98E-‐17 12.98 high Yes

4 Cbf5 180-‐Nop10 19 17 (1) 10 7 2.20E-‐14 15.46 high Yes

5 Cbf5 267-‐Cbf5 31 9 (1) 0 9 3.68E-‐13 11.38 high Yes

6 Gar1 77-‐Gar1 115 23 (2) 11 12 9.73E-‐20 11.01 high Yes

7 Nop10 1-‐Nop10 19 31 (2) 15 16 1.48E-‐19 11.48 high Yes

8 Nop10 40-‐Nhp2 69 5 (1) 1 4 1.82E-‐03 17.18 mid Yes

9 Nop10 40-‐Nhp2 65 14 (1) 7 7 1.82E-‐03 13.96 high Yes

10 Nop10 40-‐Nhp2 61 11 (1) 6 5 1.07E-‐09 12.30 high Yes

11 Gar1 115-‐Gar1 104 9 (1) 7 2 1.61E-‐16 24.75 high No

12 Nop10 40-‐Nop10 19 6 (1) 1 5 7.09E-‐08 30.57 high No

13 Cbf5 87-‐Cbf5 114 3 (1) 2 1 1.83E-‐03 14.71 low Yes

14 Nhp2 65 -‐Nhp2 69 3 (1) 3 0 6.59E-‐03 5.79 high Yes

15 Nop10 40-‐Cbf5 114 3 (1) 3 0 2.89E-‐08 34.61 high No

Note: The results came from two experiments; a low-flow 50-µm ID column was used in experiment #1 and a high-flow 100-µm ID column was used in experiment #2. Inter-link identifications were filtered by requiring 10 ppm mass accuracy, FDR < 5%, E-value < 0.01, and ≥ 3 spectral copies. Only those with 4 or more spectra are illustrated in Fig. 2b. Some inter-links were identified by two peptide pairs due to missed cleavage of trypsin, e.g., Cbf5161-Gar1115 was supported by SLENLTGALFQRPPLISAVKR(20)-EGDKFYIAADKLLPIER(11) and SLENLTGALFQRPPLISAVKR(20)-FYIAADKLLPIER(7).


Yang et al. 6/7/12 2:08 PM 7

Supplementary Table 9. CXMS result of the six-subunit, 550 kDa UTP-B complex

No. Protein 1-‐Protein 2 #total spec (#pairs)

#spec exp_1

#spec exp_2

#spec exp_3

Best E-‐value

Manual eval. of spec qual.

Domain of protein 1

Domain of protein 2

UTP13-‐UTP13 1 UTP13(86)-‐UTP13(94) 7 5 0 2 7.77E-‐05 high U13-‐WD1 U13-‐WD1 2 UTP13(91)-‐UTP13(51) 5 5 0 0 1.61E-‐09 high U13-‐WD1 U13-‐WD1 3 UTP13(181)-‐UTP13(179) 19 12 0 7 3.97E-‐05 mid U13-‐WD1 U13-‐WD1 4 UTP13(228)-‐UTP13(181) 4 (2) 0 3 1 5.75E-‐10 high U13-‐WD1 U13-‐WD1 5 UTP13(533)-‐UTP13(555) 17 10 0 7 7.95E-‐08 high U13-‐WD2 U13-‐WD2 6 UTP13(741)-‐UTP13(780) 18 7 4 7 7.43E-‐18 high U13-‐CTD U13-‐CTD 7 UTP13(751)-‐UTP13(699) 129 (2) 77 12 40 5.53E-‐17 high U13-‐CTD U13-‐CTD UTP12-‐UTP12 8 UTP12(163)-‐UTP12(187) 24 15 3 6 1.61E-‐20 high U12-‐WD1 U12-‐WD1 9 UTP12(279)-‐UTP12(337) 10 7 3 0 4.11E-‐06 high U12-‐WD1 U12-‐WD1 10 UTP12(318)-‐UTP12(230) 34 20 8 6 6.56E-‐17 high U12-‐WD1 U12-‐WD1 11 UTP12(318)-‐UTP12(237) 46 22 6 18 8.94E-‐18 high U12-‐WD1 U12-‐WD1 12 UTP12(337)-‐UTP12(253) 16 11 0 5 1.26E-‐05 high U12-‐WD1 U12-‐WD1 13 UTP12(381)-‐UTP12(279) 4 4 0 0 7.19E-‐06 high U12-‐WD2 U12-‐WD1 14 UTP12(404)-‐UTP12(420) 15 11 2 2 1.66E-‐09 high U12-‐WD2 U12-‐WD2 15 UTP12(486)-‐UTP12(503) 26 8 8 10 6.23E-‐17 high U12-‐WD2 U12-‐WD2 16 UTP12(780)-‐UTP12(774) 19 8 3 8 2.31E-‐08 high U12-‐CTD U12-‐CTD 17 UTP12(877)-‐UTP12(884) 8 4 4 0 1.59E-‐06 high U12-‐CTD U12-‐CTD UTP21-‐UTP21

18 UTP21(9)-‐UTP21(9) 5 3 2 0 4.27E-‐05 high n/a n/a 19 UTP21(9)-‐UTP21(19) 19 (2) 16 3 0 2.36E-‐05 high U21-‐WD2 U21-‐WD2 20 UTP21(22)-‐UTP21(661) 3 2 0 1 5.29E-‐11 high U21-‐WD2 U21-‐CTD 21 UTP21(148)-‐UTP21(102) 31 23 0 8 2.07E-‐05 high U21-‐WD1 U21-‐WD1 22 UTP21(288)-‐UTP21(408) 45 30 5 10 4.23E-‐14 high U21-‐WD1 U21-‐WD2 23 UTP21(435)-‐UTP21(408) 6 6 0 0 2.18E-‐07 high U21-‐WD2 U21-‐WD2 24 UTP21(539)-‐UTP21(502) 6 0 1 5 1.38E-‐03 mid U21-‐WD2 U21-‐WD2 25 UTP21(661)-‐UTP21(9) 3 0 1 2 1.20E-‐08 high U21-‐CTD U21-‐WD2 26 UTP21(661)-‐UTP21(19) 30 22 4 4 1.27E-‐05 high U21-‐CTD U21-‐WD2 27 UTP21(819)-‐UTP21(806) 12 2 3 7 3.24E-‐11 high U21-‐CTD U21-‐CTD 28 UTP21(828)-‐UTP21(873) 18 10 3 5 5.24E-‐23 high U21-‐CTD U21-‐CTD 29 UTP21(873)-‐UTP21(917) 7 (2) 0 7 0 6.18E-‐08 high U21-‐CTD U21-‐CTD 30 UTP21(873)-‐UTP21(918) 26 (2) 21 5 0 2.39E-‐07 high U21-‐CTD U21-‐CTD 31 UTP21(873)-‐UTP21(828) 7 6 0 1 1.35E-‐08 high U21-‐CTD U21-‐CTD 32 UTP21(804)-‐UTP21(794) 3 0 3 0 5.15E-‐07 high U21-‐CTD U21-‐CTD UTP1-‐UTP1

33 UTP1(27)-‐UTP1(85) 4 4 0 0 4.34E-‐03 high U1-‐WD1 U1-‐WD1 34 UTP1(46)-‐UTP1(6) 11 5 3 3 5.71E-‐17 high U1-‐WD1 U1-‐WD2 35 UTP1(56)-‐UTP1(674) 4 1 2 1 4.87E-‐06 mid U1-‐WD1 U1-‐WD2 36 UTP1(96)-‐UTP1(129) 14 7 0 7 2.59E-‐12 high U1-‐WD1 U1-‐WD1 37 UTP1(166)-‐UTP1(264) 29 16 4 9 2.33E-‐12 high U1-‐WD1 U1-‐WD1 38 UTP1(180)-‐UTP1(96) 17 4 0 13 7.27E-‐15 high U1-‐WD1 U1-‐WD1 39 UTP1(180)-‐UTP1(129) 6 4 2 0 9.12E-‐08 high U1-‐WD1 U1-‐WD1 40 UTP1(211)-‐UTP1(264) 16 7 2 7 2.95E-‐14 high U1-‐WD1 U1-‐WD1 41 UTP1(264)-‐UTP1(674) 9 0 0 9 3.26E-‐08 high U1-‐WD1 U1-‐WD2 42 UTP1(536)-‐UTP1(557) 4 0 1 3 2.55E-‐14 high U1-‐WD2 U1-‐WD2 43 UTP1(536)-‐UTP1(572) 9 8 1 0 1.78E-‐09 high U1-‐WD2 U1-‐WD2 44 UTP1(572)-‐UTP1(557) 22 11 2 9 2.52E-‐16 high U1-‐WD2 U1-‐WD2 45 UTP1(572)-‐UTP1(674) 55 31 6 18 4.65E-‐10 mid U1-‐WD2 U1-‐WD2 46 UTP1(753)-‐UTP1(733) 64 26 9 29 8.07E-‐32 high U1-‐CTD U1-‐CTD 47 UTP1(46)-‐UTP1(27) 3 0 0 3 1.64E-‐07 low U1-‐WD1 U1-‐WD1 48 UTP1(46)-‐UTP1(85) 3 0 0 3 9.56E-‐03 low U1-‐WD1 U1-‐WD1 UTP18-‐UTP18

49 UTP18(154)-‐UTP18 (134) 7 7 0 0 8.89E-‐13 high U18-‐NTD U18-‐NTD 50 UTP18(154)-‐UTP18(170) 13 7 0 6 4.17E-‐12 high U18-‐NTD U18-‐NTD 51 UTP18(585)-‐UTP18(538) 31 14 6 11 3.36E-‐08 high U18-‐WD U18-‐WD UTP6-‐UTP6

52 UTP6(389)-‐UTP6(439) 21 12 2 7 4.12E-‐12 high U6-‐CTD U6-‐CTD 53 UTP6(397)-‐UTP6(389) 9 5 0 4 6.78E-‐18 high U6-‐CTD U6-‐CTD 54 UTP6(321)-‐UTP6(361) 3 0 3 0 1.60E-‐09 high U6-‐CTD U6-‐CTD UTP13-‐UTP12


Yang et al. 6/7/12 2:08 PM 8

No. Protein 1-‐Protein 2 #total spec (#pairs)

#spec exp_1

#spec exp_2

#spec exp_3

Best E-‐value


Domain of protein 1

Domain of protein 2

55 UTP12(381)-‐UTP13(555) 14 (2) 8 1 5 2.29E-‐10 high U12-‐WD2 U13-‐WD2 56 UTP12(855)-‐UTP13(751) 16 (2) 0 7 9 1.30E-‐14 high U12-‐CTD U13-‐CTD 57 UTP13(533)-‐UTP12(381) 11 6 0 5 2.99E-‐08 high U13-‐WD2 U12-‐WD2 58 UTP13(546)-‐UTP12(515) 22 9 6 7 1.41E-‐11 high U13-‐WD2 U12-‐WD2 59 UTP13(699)-‐UTP12(843) 18 11 7 0 2.39E-‐05 high U13-‐CTD U12-‐CTD 60 UTP13(780)-‐UTP12(909) 44 29 6 9 5.51E-‐11 high U13-‐CTD U12-‐CTD 61 UTP13(815)-‐UTP12(866) 4 0 1 3 1.42E-‐12 high U13-‐CTD U12-‐CTD 62 UTP13(815)-‐UTP12(884) 9 0 6 3 2.34E-‐06 high U13-‐CTD U12-‐CTD 63 UTP13(741)-‐UTP12(111) 17 (2) 4 10 3 5.22E-‐11 high U13-‐CTD U12-‐WD1 64 UTP13(815)-‐UTP12(877) 3 0 3 0 2.90E-‐08 high U13-‐CTD U12-‐CTD UTP12-‐UTP21

65 UTP21(890)-‐UTP12(906) 4 0 0 4 5.87E-‐07 high U21-‐CTD U12-‐CTD UTP21-‐UTP1

66 UTP1(6)-‐UTP21(661) 11 6 3 2 2.48E-‐12 high U1-‐WD2 U21-‐CTD 67 UTP1(27)-‐UTP21(730) 9 4 3 2 1.09E-‐27 high U1-‐WD2 U21-‐CTD 68 UTP1(96)-‐UTP21(382) 49 23 11 15 6.73E-‐24 high U1-‐WD1 U21-‐WD2 69 UTP1(129)-‐UTP21(9) 18 7 3 8 7.37E-‐13 high U1-‐WD1 U21-‐WD2 70 UTP1(129)-‐UTP21(102) 23 11 4 8 2.30E-‐06 high U1-‐WD1 U21-‐WD1 71 UTP21(382)-‐UTP1(129) 25 6 4 15 9.83E-‐19 high U21-‐WD2 U1-‐WD1 72 UTP21(661)-‐UTP1(85) 7 1 0 6 3.39E-‐16 high U21-‐CTD U1-‐WD1 UTP1-‐UTP18

73 UTP1(572)-‐UTP18(418) 6 4 0 2 1.09E-‐03 mid U1-‐WD2 U18-‐WD UTP21-‐UTP18

74 UTP21(245)-‐UTP18(341) 5 3 2 0 2.12E-‐06 high U21-‐WD1 U18-‐WD 75 UTP21(288)-‐UTP18(341) 10 10 0 0 2.16E-‐09 high U21-‐WD1 U18-‐WD 76 UTP21(408)-‐UTP18(538) 3 3 0 0 3.24E-‐07 high U21-‐WD2 U18-‐WD UTP1-‐UTP12

77 UTP1(800)-‐UTP12(884) 24 24 0 0 5.67E-‐08 high U1-‐CTD U12-‐CTD UTP1-‐UTP6

78 UTP6(65)-‐UTP1(572) 4 4 0 0 1.94E-‐05 low U6-‐NTD U1-‐WD2 UTP6-‐UTP13

79 UTP13(751)-‐UTP6(72) 5 0 0 5 4.09E-‐03 low U13-‐CTD U6-‐NTD

Note: Cross-link identifications were filtered by requiring 10 ppm mass accuracy, FDR < 5%, E-value < 0.01, and ≥ 3 spectral observations. Highlighted in grey are eight cross-links that are observed in only one experiment and are associated with either low-quality spectra or only 3 spectral copies; they are not shown in Fig. 2c. WD1: WD domain 1, WD2: WD domain 2; NTD: N-terminal domain; CTD: C-terminal domain. Domains of each protein were predicted. Utp1, Utp12, Utp13 and Utp21 contain tandem WD domains that were modeled according to the AIP1 structure (PDB code: 1PI6), where the first strand of WD repeat 1 is paired with the last strand of WD repeat 14 in WD domain 2. Utp6 is composed of NTD (residues 1-206) and CTD (207-440). Utp18 is composed of NTD (1-224) and WD domain (225-594). Utp1 is composed of WD1 (18-346), WD2 (1-17, 347-707) and CTD (708-923). Utp21 is composed of WD1 (33-352), WD2 (1-32, 353-656) and CTD (657-939). Utp12 is composed of WD1 (20-361), WD2 (1-19, 362-687) and CTD (688-943). Utp13 is composed of WD1 (19-339), WD2 (1-18, 340-648) and CTD (649-817).


Yang et al. 6/7/12 2:08 PM 9

Supplementary Table 10. CXMS analysis of C. elegans FIB-1::GFP IP

No. Protein1-‐Protein2

Peptide1-‐Peptide2 #total spec

#spec exp_1

#spec exp_2

Cα-‐Cα distance

(Å)

Best E-‐value

Manual eval. of

spec qual. 1 ce_Fib1(115)-‐

ce_Fib1(133) GGKTVVVEPHR(3)-‐GKEDALATK(2) 10 5 5 ~8.7 5.30E-‐12 high

2 ce_Snu13(21)-‐ce_Snu13(118)

AFPLADTNLSQKLMDLVQQAMNYK(12)-‐SQIQKIKEDVEK(5)

13 5 8 9.3 4.80E-‐10 high

3 ce_Nop56(161)-‐ce_Fib1(236)

VKFDVHR(2)-‐DLLGVAKK(7) 7 4 3 17.1 5.34E-‐08 high

SKVKFDVHR(4)-‐DLLGVAKK(7) 3 0 3 17.1 1.61E-‐11 high

4 ce_Nop58(172)-‐ce_Nop56(183)

IDTMIVQAVSLLDDLDKELNNYVMR(17)-‐VDNMVIQSIALLDQLDKDINLFGMR(17)

3 3 0 11.4 4.34E-‐09 high

Note: Cross-link identifications were filtered by requiring 10 ppm mass accuracy, FDR < 5%, E-value < 0.01, and ≥ 3 spectral observations. Highlighted in yellow are cross-links between Nop58 and Nop56, and between Nop56 and FIB-1. The Cα-Cα distances were measured on the equivalent residues in an archaeal C/D RNP structure (PDB code 3PLA), assuming that Nop56 and Nop58 form a heterodimer. Residue 115 of FIB-1 is not present in the archaeal structure and is approximated by the equivalent archaeal residue of FIB-1 residue 117.The gene names for C. elegans FIB-1, Nop56, Nop58, and Snu13 are T01C3.7, K07C5.4, W01B11.3/nol-5, and M28.5.


Yang et al. 6/7/12 2:08 PM 10

Supplementary Table 11. Inter-linked peptides identified from E. coli lysates. Filtering criteria: mass accuracy 10 ppm, FDR < 5%, E-value < 0.01. Cross-links that may be interpreted as either intra-molecular or inter-molecular are taken as intra-molecular. In column “Note”, Yes-TAP denotes interactions verified by affinity purification/mass spec analysis (experimental datasets in reference 23), while Yes-Y2H or NO-Y2H indicates positive or negative Y2H test result (this paper). No. ID_protein1-‐protein2 Name_protein1-‐protein2 Sequence_pep1-‐pep2

#Spec exp_1

#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

Inter-‐molecular Cross-‐links

1 gb|AAC73894.2|(278)-‐ref| NP_417064.1|(2)/ ref| NP_418081.1|(2)/ gb| AAB40481.1 |(108)

23S rRNA mA1618 methyltransferase, SAM-‐dependent |(278)-‐ back-‐

translocating Elongation Factor EF4, GTPase(2)/ lipopolysaccharide core biosynthesis protein|(2)/ ORF_o346 predicted oxidoreductase|(108)

KEMAQGQK(1)-‐KNIR(1) 1 4.88E-‐05 NO

2 gb|AAC73820.1|(476)-‐gb|AAC74882.1|(274)

2-‐oxoglutarate decarboxylase, thiamin-‐requiring |(476)-‐aminodeoxychorismate

synthase, subunit I |(274)

HGHNEADEPSATQPLMYQKIK(19)-‐PIKGTLPR(3)

1 2.59E-‐05 NO

3 gb|AAC73820.1|(476)-‐gb|AAC74659.1|(748)

2-‐oxoglutarate decarboxylase, thiamin-‐requiring |(476)-‐probable selenate

reductase, periplasmic |(748)

HGHNEADEPSATQPLMYQKIK(19)-‐LPAKVTPR(4)

1 5.11E-‐07 NO

4 gb|AAC73997.1|(1)-‐gb|AAC73280.1|(11)

30S ribosomal subunit protein S1 |(1)-‐30S ribosomal subunit protein S2 |(11)

MTESFAQLFEESLK(1)-‐DMLKAGVHFGHQTR(4)

2 2.68E-‐05 NO

5 gb|AAC73997.1|(158)-‐gb|AAA97098.1|(9)


VIKLDQK(3)-‐KFCR(1) 1 4.50E-‐08 NO

6 gb|AAC73997.1|(162)-‐gb|AAA97098.1|(9)


LDQKR(4)-‐KFCR(1) 1 5.88E-‐06 NO

7 gb|AAA58139.1|(108)-‐gb|AAC74323.1|(18)

30S ribosomal subunit protein S12 |(108)-‐fused acetaldehyde-‐CoA

dehydrogenase/iron-‐dependent alcohol dehydrogenase/pyruvate-‐formate lyase

deactivase |(18)

GALDCSGVKDR(9)-‐KAQR(1)

1 3.87E-‐08 NO

8 gb|AAA58139.1|(108)-‐gb|AAC75516.1|(440)

30S ribosomal subunit protein S12 |(108)-‐fused malic enzyme predicted oxidoreductase/predicted

phosphotransacetylase |(440)

GALDCSGVKDR(9)-‐KAPKR(1)

2 7.22E-‐05 NO

9 gb|AAA58139.1|(44)-‐gb|AAC73134.1|(19)


KPNSALR(1)-‐KHNASR(1) 7 5.21E-‐08 YES NO

10 gb|AAA58139.1|(44)-‐gb|AAA57987.1|(85)

30S ribosomal subunit protein S12 |(44)-‐50S ribosomal subunit protein L21 |(85)

KPNSALR(1)-‐KQQGHR(1) 3 2.51E-‐07 NO

11 gb|AAC75658.1|(13)-‐gb|AAA69236.1|(24)

30S ribosomal subunit protein S16 |(13)-‐CG Site no. 33104: hypA, hydrogenase nickel incorporation protein ORF_o116

|(24)

KRPFYQVVVADSR(1)-‐HGAKR(4)

3 9.73E-‐08 NO

12 gb|AAA97098.1|(30)-‐gb|AAA89145.1|(5)


DIATLKNYITESGK(6)-‐PVIKVR(4)

1 7.37E-‐07 YES YES 16.9 Å

13 gb|AAA97098.1|(30)-‐gb|AAA97096.1|(104)


DIATLKNYITESGK(6)-‐HAVTEASPMVKAK(11)

1 3.09E-‐04 YES NO

14 gb|AAA58113.1|(21)-‐ref|NP_416245.2|(2)

30S ribosomal subunit protein S19 |(21)-‐cell division modulator |(2)

VEKAVESGDK(3)-‐KKPLR(1) 1 3.24E-‐04 YES YES 10.2 Å

15 gb|AAA58113.1|(21)-‐gb|AAC74441.1|(214)

30S ribosomal subunit protein S19 |(21)-‐Rac prophage; conserved protein |(214)

KVEKAVESGDK(4)-‐KPIR(1) 3 8.57E-‐09 NO

16 gb|AAC73134.1|(16)-‐gb|AAC75676.1|(214)/gb

|AAA79797.1|(213)

30S ribosomal subunit protein S20 |(16)-‐CP4-‐57 prophage; predicted protein

|(214)/ORF_f538|(213)

AIQSEKAR(6)-‐KGEHSR(1) 1 1.18E-‐04 NO

17 gb|AAA89145.1|(25)-‐gb|AAA97098.1|(9)


SCEKAGVLAEVR(4)-‐KFCR(1)

3 1.03E-‐08 YES NO

18 gb|AAA89145.1|(54)-‐gb|AAA57986.1|(4)/ref|N

P_415615.1|(182)

30S ribosomal subunit protein S21 |(54)-‐50S ribosomal subunit protein L27 |(4)/

predicted aminodeoxychorismate lyase|(182)

ASAVKR(5)-‐AHKK(3) 2 1.65E-‐05 NO

19 gb|AAA58111.1|(108)-‐gb|AAA58112.1|(16)

30S ribosomal subunit protein S3 |(108)-‐50S ribosomal subunit protein L22 |(16)

KPELDAK(1)-‐SSAQKVR(5) 3 1.76E-‐08 NO

20 gb|AAA97096.1|(93)-‐gb|AAA97098.1|(50)


TKHAVTEASPMVK(2)-‐AKYQR(2)

2 8.87E-‐07 YES YES 11.7 Å

21 gb|AAA58138.1|(11)-‐gb|AAA58032.1|(100)


KILPDPK(1)-‐KAGFVTR(1) 7 30 4.36E-‐15 YES YES 17.2 Å

22 gb|AAA58138.1|(131)-‐ref|YP_025294.2|(197)

30S ribosomal subunit protein S7 |(131)-‐acetolactate synthase III, large subunit

|(197)

LANELSDAAENKGTAVKK(12)-‐GQIKR(4)

1 5.25E-‐07 NO

23 gb|AAA58138.1|(131)-‐ 30S ribosomal subunit protein S7 |(131)-‐ LANELSDAAENKGTAVK(12) 1 6.41E-‐06 NO


Yang et al. 6/7/12 2:08 PM 11

No. ID_protein1-‐protein2 Name_protein1-‐protein2 Sequence_pep1-‐pep2 #Spec exp_1

#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

gb|AAA69125.1|(122) ORF_f239; was ORF_f191 and ORF_f194 before splice |(122)

-‐LTKLDAQLK(3)

24 gb|AAA58138.1|(131)-‐gb|AAC74772.1|(667)

30S ribosomal subunit protein S7 |(131)-‐phosphoenolpyruvate synthase |(667)

LANELSDAAENKGTAVKK(12)-‐QGLKR(4)

1 1.88E-‐06 NO

25 gb|AAA58138.1|(131)-‐gb|AAC73932.1|(11)

30S ribosomal subunit protein S7 |(131)-‐predicted transporter |(11)

LANELSDAAENKGTAVKK(12)-‐NALKR(4)

9 1.33E-‐10 NO

26 gb|AAA58138.1|(131)-‐gb|AAC74957.1|(56)

30S ribosomal subunit protein S7 |(131)-‐purine-‐binding chemotaxis protein |(56)

LANELSDAAENKGTAVKK(12)-‐IANTPAFIKGVTNLR(9)

2 2.05E-‐10 NO

27 gb|AAA58138.1|(131)-‐gb|AAB18602.1|(181)

30S ribosomal subunit protein S7 |(131)-‐rfaY; lipopolysaccharide core biosynthesis

protein |(181)

LANELSDAAENKGTAVKK(12)-‐IIDLSGKR(7)

1 3.89E-‐06 NO

28 gb|AAA58138.1|(136)-‐gb|AAC74957.1|(56)

30S ribosomal subunit protein S7 |(136)-‐purine-‐binding chemotaxis protein |(56)

LANELSDAAENKGTAVKK(17)-‐IANTPAFIKGVTNLR(9)

1 4.79E-‐06 NO

29 gb|AAA58103.1|(41)-‐gb|AAC75616.1|(1)/ gb|AAA79825.1|(1)

30S ribosomal subunit protein S8 |(41)-‐holo-‐[acyl-‐carrier-‐protein] synthase 1

|(1)/ dpj|(1)

VAIANVLKEEGFIEDFK(8)-‐MAILGLGTDIVEIAR(1)

1 1.72E-‐05 NO

30 gb|AAA58103.1|(50)-‐gb|AAC73418.1|(226)

30S ribosomal subunit protein S8 |(50)-‐c-‐di-‐GMP-‐specific phosphodiesterase |(226)

VAIANVLKEEGFIEDFKVEGDTK(17)-‐LGNDKIK(5)

5 1.50E-‐05 NO

31 gb|AAA58098.1|(29)-‐gb|AAA57987.1|(85)

50S ribosomal subunit protein L15 |(29)-‐50S ribosomal subunit protein L21 |(85)

GIGSGLGKTGGR(8)-‐KQQGHR(1)

2 1.94E-‐09 YES YES 13.1 Å

32 gb|AAA58114.1|(183)-‐gb|AAA58118.1|(1)

50S ribosomal subunit protein L2 |(183)-‐30S ribosomal subunit protein S10 |(1)

KVEADCR(1)-‐MQNQR(1) 9 6.83E-‐08 NO

33 gb|AAA58114.1|(207)-‐gb|AAC73134.1|(19)


VLGKAGAAR(4)-‐KHNASR(1) 1 8.95E-‐09 NO

34 gb|AAA58114.1|(207)-‐gb|AAA97170.1|(6)

50S ribosomal subunit protein L2 |(207)-‐ORF_o111 |(6)

VLGKAGAAR(4)-‐NKWLR(2) 1 1.22E-‐05 NO

35 gb|AAA58114.1|(71)-‐gb|AAC75376.1|(84)

50S ribosomal subunit protein L2 |(71)-‐acetyl-‐CoA carboxylase, beta

(carboxyltransferase) subunit |(84)

NKDGIPAVVER(2)-‐DVLKFR(4)

1 9.01E-‐06 NO

36 gb|AAA57987.1|(85)-‐gb|AAA58118.1|(1)


KQQGHR(1)-‐MQNQR(1) 2 3.99E-‐07 NO

37 gb|AAA58112.1|(16)-‐gb|AAC73134.1|(19)


SSAQKVR(5)-‐KHNASR(1) 1 1.58E-‐08 NO

38 gb|AAA58112.1|(16)-‐gb|AAA57987.1|(85)


SSAQKVR(5)-‐KQQGHR(1) 2 1.60E-‐08 YES NO

39 gb|AAA58112.1|(16)-‐gb|AAA96986.1|(277)

50S ribosomal subunit protein L22 |(16)-‐ORF_f510; fused D-‐allose transporter

subunits of ABC superfamily: ATP-‐binding components|(277)

SSAQKVR(5)-‐KKVR(1) 1 3.62E-‐05 YES NO

40 gb|AAA58112.1|(16)-‐gb|AAA96986.1|(278)

50S ribosomal subunit protein L22 |(16)-‐ORF_f510; fused D-‐allose transporter

subunits of ABC superfamily: ATP-‐binding components|(278)

SSAQKVR(5)-‐KKVR(2) 1 3.26E-‐07 YES NO

41 gb|AAA57986.1|(19)-‐gb|AAC74308.1|(459)

50S ribosomal subunit protein L27 |(19)-‐nitrate reductase 1, alpha subunit |(459)

DSEAKR(5)-‐LPVKR(4) 4 2.50E-‐12 NO

42 gb|AAC76661.1|(10)-‐gb|AAC76752.1|(231)

50S ribosomal subunit protein L28 |(10)-‐L-‐glutamine:D-‐fructose-‐6-‐phosphate

aminotransferase |(231) VCQVTGKR(7)-‐TGAEVKR(6) 1 1.02E-‐08 NO

43 gb|AAC76661.1|(26)-‐gb|AAA58114.1|(207)


SHALNATKR(8)-‐VLGKAGAAR(4)

1 4.99E-‐09 YES NO

44 gb|AAA58109.1|(9)-‐gb|AAB18601.1|(132)

50S ribosomal subunit protein L29 |(9)-‐rfaZ; lipopolysaccharide core biosynthesis

protein |(132)

ELREKSVEELNTELLNLLR(5)-‐IKFNILR(2)

1 5.69E-‐05 NO

45 gb|AAA58117.1|(7)-‐gb|AAC75655.1|(2)


MIGLVGKK(7)-‐SNIIK(1) 2 3.82E-‐06 YES YES 13.6 Å

46 gb|AAA58096.1|(32)-‐gb|AAC77492.1|(298)

50S ribosomal subunit protein L36 |(32)-‐threonine deaminase |(298)

VICSAEPKHK(8)-‐KYIALHNIR(1)

1 9.72E-‐05 NO

47 gb|AAC43084.1|(82)-‐gb|AAC74011.1|(308)

50S ribosomal subunit protein L7/L12 |(82)-‐murein L,D-‐transpeptidase |(308)

GATGLGLKEAKDLVESAPAALK(8)-‐SKPAPAVR(2)

1 2.81E-‐03 NO

48 gb|AAC43084.1|(82)-‐gb|AAA58134.1|(13)

50S ribosomal subunit protein L7/L12 |(82)-‐ORF_f64; bacterioferritin-‐associated

ferredoxin|(13)

GATGLGLKEAKDLVESAPAALK(8)-‐KIRQAVR(1)

2 6.50E-‐04 NO

49 gb|AAC43084.1|(85)-‐gb|AAA58134.1|(13)

50S ribosomal subunit protein L7/L12 |(85)-‐ORF_f64; bacterioferritin-‐associated

ferredoxin|(13)

GATGLGLKEAKDLVESAPAALK(11)-‐KIRQAVR(1)

1 1.73E-‐03 NO

50 gb|AAA97099.1|(42)-‐gb|AAC76661.1|(44)


KNIEFFEAR(1)-‐FWVESEKR(7)

6 1.51E-‐12 YES YES 21.2 Å

51 gb|AAA97099.1|(89)-‐gb|AAA58114.1|(183)


AGDEGKLFGSIGTR(6)-‐KVEADCR(1)

2 1.66E-‐10 YES YES 22.7 Å

52 gb|AAC73576.1|(141)-‐ref|NP_415483.2|(1)

adenylate kinase |(141)-‐methylglyoxal synthase |(1)

FNPPKVEGKDDVTGEELTTR(5)-‐MELTTR(1)

4 2.82E-‐06 NO NO-‐Y2H

53 gb|AAC73576.1|(145)-‐gb|AAA57978.1|(2)

adenylate kinase |(145)-‐dihydropteroate synthase |(2)

VEGKDDVTGEELTTR(4)-‐LRGFFLSIHTR(1)

1 3.00E-‐05 NO

54 gb|AAC73576.1|(145)-‐ref|NP_415483.2|(1)

adenylate kinase |(145)-‐methylglyoxal synthase |(1)

FNPPKVEGKDDVTGEELTTR(9)-‐MELTTR(1)

1 1.91E-‐06 NO


Yang et al. 6/7/12 2:08 PM 12


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

55 gb|AAC73341.1|(389)-‐gb|AAC73134.1|(19)

aminoacyl-‐histidine dipeptidase (peptidase D) |(389)-‐30S ribosomal

subunit protein S20 |(19) LAGAKTEAK(5)-‐KHNASR(1) 1 3.66E-‐05 NO

56 gb|AAC74376.1|(369)-‐gb|AAA58106.1|(97)

antimicrobial peptide transport ABC transporter periplasmic binding protein |(369)-‐50S ribosomal subunit protein L24

|(97)

SREQLKSLGLENLTLK(6)-‐FFKSNSETIK(3)

1 2.26E-‐03 NO

57 gb|AAA97142.1|(179)-‐gb|AAA97141.1|(137)

aspartate carbomoyltransferase catalytic subunit |(179)-‐aspartate

carbomoyltransferase regulatory subunit |(137)

TVHSLTQALAKFDGNR(11)-‐ANDIALKCK(7)

7 5.06E-‐20 YES YES 20.6 Å

58 gb|AAC75632.1|(62)-‐gb|AAC73989.1|(591)

autonomous glycyl radical cofactor |(62)-‐pyruvate formate lyase I |(591)

EVPVEVKPEVR(7)-‐IQKLHTYR(3)

3 5.56E-‐14 NO Yes-‐TAP

59 gb|AAC75632.1|(92)-‐gb|AAC75449.1|(81)

autonomous glycyl radical cofactor |(92)-‐conserved protein |(81)

HPEKYPQLTIR(4)-‐KAYERGYR(1)

1 1.35E-‐04 NO

60 gb|AAC73144.1|(366)-‐ref|NP_415790.1|(186)

carbamoyl-‐phosphate synthase large subunit|(366)-‐DNA topoisomerase I,

omega subunit|(186)

FNFEKFAGANDR(5)-‐KIAR(1)

*KIAR can be mapped to 3 other proteins with no evidence of binding to

AAC73144.1

1 6.36E-‐03 NO Yes-‐TAP

61 gb|AAC73144.1|(504)-‐ ref|NP_415790.1|(186)

carbamoyl-‐phosphate synthase large subunit |(504)-‐DNA topoisomerase I,

omega subunit|(186)

LAKLAGVR(3)-‐KIAR(1)

*same as above 3 5.32E-‐05 NO

Yes-‐TAP

62 gb|AAC73144.1|(940)-‐ref|NP_417006.2|(41)/ gb|AAC76893.1|(122)

carbamoyl-‐phosphate synthase large subunit |(940)-‐GTPase; multicopy

suppressor of ftsJ |(41)/ sensory histidine kinase in two-‐component regulatory

system with CpxR |(122)

AQLGSNSTMKK(10)-‐KYGR(1)

1 5.90E-‐07 NO

63 gb|AAA58092.1|(1)-‐gb|AAC73297.1|(437)

CG Site no. 234; RNA polymerase alpha subunit |(1)-‐lysine decarboxylase 2,

constitutive |(437)

MQGSVTEFLKPR(1)-‐KEVQR(1)

1 9.95E-‐04 NO

64 gb|AAB18435.1|(3)-‐gb|AAC75642.1|(2)

CG Site no. 551; Leu/Ile/Val-‐binding protein |(3)-‐conserved protein, UPF0124

family |(2)

LKNNITTHVITRR(2)-‐SKLIVPQWPQPK(1)

1 4.61E-‐06 NO

65 gb|AAA58136.1|(10)-‐gb|AAC75629.1|(411)

CG Site No. 61; translation elongation factor EF-‐Tu|(10)-‐ATP-‐dependent RNA

helicase |(411)

TKPHVNVGTIGHVDHGK(2)-‐EKEK(2)

2 1.21E-‐04 NO Yes-‐TAP

66 gb|AAA58136.1|(10)-‐gb|AAC73699.1|(2)

CG Site No. 61; translation elongation factor EF-‐Tu|(10)-‐carbon starvation

protein |(2)

TKPHVNVGTIGHVDHGK(2)-‐NKSGK(1)

1 9.92E-‐03 NO

67 gb|AAA58136.1|(57)-‐gb|AAB18579.1|(10)

CG Site No. 61; translation elongation factor EF-‐Tu|57)-‐alternate gene name

yibL |(10)

AFDQIDNAPEEKAR(12)-‐NEIKRLSDR(4)

2 4.14E-‐05 NO

68 gb|AAA58136.1|(57)-‐gb|AAB18493.1|(4) / gb|AAC76542.1|(4)

CG Site No. 61; translation elongation factor EF-‐Tu|(57)-‐GAD alpha protein |(4)/

glutamate decarboxylase A, PLP-‐dependent|(4)

AFDQIDNAPEEKAR(12)-‐DQKLLTDFR(3)

1 2.83E-‐05 NO

69 gb|AAA58136.1|(57)-‐ref|YP_025307.1|(1)

CG Site No. 61; translation elongation factor EF-‐Tu| (57)-‐multidrug efflux system

transporter |(1)

AFDQIDNAPEEKAR(12)-‐MQKYISEAR(1)

2 3.27E-‐06 NO Yes-‐Y2H

70 gb|AAA58136.1|(57)-‐gb|AAC74522.1|(3)

CG Site No. 61; translation elongation factor EF-‐Tu| (57)-‐polyhydroxybutyrate

(PHB) synthase, ABC transporter periplasmic binding protein homolog |(3)

AFDQIDNAPEEKAR(12)-‐MSKTFAR(3)

2 4.30E-‐06 NO Yes-‐Y2H

71 gb|AAA58136.1|(57)-‐ref|YP_026243.1|(62)

CG Site No. 61; translation elongation factor EF-‐Tu| (57)-‐predicted von

Willibrand factor containing protein |(62)

AFDQIDNAPEEKAR(12)-‐SRLKDAR(4)

2 6.84E-‐07 NO Yes-‐Y2H

72 gb|AAA58137.1|(370)/-‐gb|AAC74021.1|(207)

CG Site No. 732; alternate name far; elongation factor EF-‐G|(370)-‐

alkanesulfonate monooxygenase, FMNH(2)-‐dependent |(207)

IVQMHANKR(8)-‐EKIEQVR(2)

1 1.78E-‐05 NO

73 gb|AAA58137.1|(423)-‐gb|AAA58099.1|(6)

CG Site No. 732; alternate name far; elongation factor EF-‐G|(423)-‐50S ribosomal subunit protein L30 |(6)

MEFPEPVISIAVEPKTKADQEK(15)-‐TIKITQTRSAIGR(3)

4 3.37E-‐09 YES NO

74 gb|AAA58137.1|(423)-‐gb|AAC73558.1|(25)

CG Site No. 732; alternate name far; elongation factor EF-‐G|(423)-‐conserved

protein, DUF1428 family |(25)

MEFPEPVISIAVEPKTK(15)-‐EMAAKAAPLFKEFGALR(5)

1 2.06E-‐06 NO

75 gb|AAA58165.1|(2)-‐ref|NP_416491.2|(2)

CG Site no. 893; siroheme synthase |(2)-‐predicted multdrug exporter, MATE family

|(2)

DHLPIFCQLR(1)-‐WFHFLQLR(1)

1 9.48E-‐05 NO

76 gb|AAC75673.1|(141)-‐gb|AAC73702.1|(118)

CP4-‐57 prophage; predicted protein |(141)-‐conserved protein |(118)

LKELLTTNPKAPVR(2)-‐HEIGKGSSSLKLR(11)

1 1.05E-‐05 NO

77 gb|AAC75071.2|(328)-‐gb|AAC75496.1|(17)

D-‐alanyl-‐D-‐alanine carboxypeptidase (penicillin-‐binding protein 6b) |(328)-‐CPZ-‐

AEIPHIKAKYTLDGK(7)-‐INTNKSPR(5)

1 1.79E-‐04 NO


Yang et al. 6/7/12 2:08 PM 13


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

55 prophage; predicted protein |(17)

78 gb|AAC74146.1|(9)-‐

ref|YP_026161.1|(120)

dihydro-‐orotase |(9)-‐RNA chaperone, probable regulator of ProP translation

|(120)

TAPSQVLKIRR(8)-‐AEQQAKK(6)

2 6.16E-‐07 NO

79 gb|AAC43085.1|(236)-‐gb|AAC75621.1|(30)

DNA-‐directed RNA polymerase, beta-‐subunit |(236)-‐leader peptidase (signal

peptidase I) |(30)

DNKLQMELVPER(3)-‐FFFAPKR(6)

1 4.36E-‐04 NO

80 gb|AAC43086.1|(74)-‐gb|AAC75459.1|(412)

DNA-‐directed RNA polymerase, beta'-‐subunit |(74)-‐xanthosine transporter

|(412) DYECLCGKYK(8)-‐IKHR(2) 3 5.81E-‐10 NO

81 gb|AAC76962.1|(781)-‐gb|AAC76961.1|(163)

DNA-‐directed RNA polymerase, beta prime subunit|(781)-‐RNA polymerase,

beta subunit|(163)

KGLADTALK(1)-‐GKTHSSGK(2)

1 1.61E-‐07 YES NO Yes-‐TAP

82 gb|AAC75562.1|(356)-‐gb|AAA58099.1|(19)

exonuclease VII, large subunit |(356)-‐50S ribosomal subunit protein L30 |(19)

LNQQNPQPKIHRAQTR(9)-‐LPKHK(3)

1 2.94E-‐03 NO

83 gb|AAC74462.1|(70)-‐gb|AAA58098.1|(129)

fermentative D-‐lactate dehydrogenase, NAD-‐dependent |(70)-‐50S ribosomal

subunit protein L15 |(129) HGVKYIALR(4)-‐VTKGAR(3) 2 2.76E-‐05 NO

84 gb|AAC73552.1|(380)-‐gb|AAA69082.1|(1)

fused predicted multidrug transporter subunits of ABC superfamily: ATP-‐binding

components |(380)-‐ORF_f76 |(1)

NFVALVGHTGSGKSTLASLLMGYYPLTEGEIR(13)-‐

MHFAQR(1) 1 5.22E-‐03 NO

85 gb|AAC75710.1|(459)-‐gb|AAC76725.1|(388)

gamma-‐aminobutyrate transporter |(459)-‐chromosomal replication initiator protein DnaA, DNA-‐binding transcriptional

dual regulator |(388)

LVLWQKTPVHNTR(6)-‐TVAEYYKIK(7)

1 5.54E-‐05 NO

86 gb|AAC74666.1|(2)-‐

ref|NP_418673.4|(142)/ gb|AAA97148.1|(145)

Global DNA-‐binding transcriptional repressor; autorepressor; required for anaerobic growth on glucosamine |(2)-‐biofilm modulator regulated by toxins

|(142)/ ORF_o153b|(145)

VAENQPGHIDQIKQTNAGAVYR(1)-‐KAVVK(1)

1 2.02E-‐04 NO

87 gb|AAB18608.1|(57)-‐gb|AAA58136.1|(57)

glucosyltransferase I |(57)-‐CG Site No. 61; translation elongation factor Tu;

translation elongation factor Tu |(57)

AFELIQVPVKSHTNHGR(10)-‐AFDQIDNAPEEKAR(12)

1 2.94E-‐06 NO

88 gb|AAC74849.1|(184)-‐gb|AAC75139.1|(2)

glyceraldehyde-‐3-‐phosphate dehydrogenase A |(184)-‐sensory histidine

kinase in two-‐component regulatory system with BaeR |(2)

VINDNFGIIEGLMTTVHATTATQKTVDGPSHK(24)-‐

KFWR(1) 9 2.49E-‐07 NO

89 gb|AAC74849.1|(61)-‐gb|AAA58005.1|(91)/ gb|AAC76235.1|(91)

glyceraldehyde-‐3-‐phosphate dehydrogenase A |(61)-‐ORF_o95 |(91)/ ribosome hibernation promoting factor HPF; stabilizes 70S dimers (100S) |(91)

FDGTVEVKDGHLIVNGKK(8)-‐HKDKLK(4)

1 1.86E-‐04 NO

90

gb|AAB03058.1|(488)-‐ref|NP_416294.4|(293)/ref|YP_026224.1|(1003)/gb|AAB18570.1|(1003)/ gb|AAC73794.1|(1003)/ gb|AAC76675.1|(200)

glycerol kinase |(488)-‐conserved protein |(293)/ RshB|(1003)/ rhsA|(1003)/ RshC

|(1003)/ tRNA mG18-‐2'-‐O-‐methyltransferase, SAM-‐dependen|(200)

YAGWKK(5)-‐VAKR(3) 1 7.51E-‐03 NO

91 gb|AAA97042.1|(117)-‐ref|NP_416801.2|(256)

GroEL protein |(117)-‐predicted inner membrane protein |(256)

AVAAGMNPMDLKR(12)-‐KNPLLSR(1)

2 2.30E-‐10 NO NO-‐Y2H

92 gb|AAA97042.1|(277)-‐gb|AAC74356.1|(659)

GroEL protein |(277)-‐DNA topoisomerase I, omega subunit |(659)

VAAVKAPGFGDR(5)-‐AKRR(2)

1 1.21E-‐05 NO

93 gb|AAA97042.1|(364)-‐gb|AAC75438.1|(1)

GroEL protein |(364)-‐valine-‐pyruvate aminotransferase 3 |(1)

QQIEEATSDYDREKLQER(14)-‐MADTRPER(1)

1 3.54E-‐03 NO

94 gb|AAA97042.1|(51)-‐gb|AAC73887.1|(8)

GroEL protein |(51)-‐predicted family 3 glycosyltransferase |(8)

SFGAPTITKDGVSVAR(9)-‐IIKEIGR(3)

1 5.15E-‐05 NO

95 gb|AAC75369.1|(234)-‐gb|AAA97125.1|(322)

histidine/lysine/arginine/ornithine transporter subunit |(234)-‐ORF_o417a;

ATP-‐binding component of ABC transporter superfamily |(322)

EALNKAFAEMR(5)-‐GKPQNLR(2)

1 24 1.30E-‐12 NO

96 gb|AAA97304.1|(53)/ gb|AAC73117.1|(190)-‐gb|AAC73227.1|(106)

hypothetical protein 126 of GenBank Accession Number D10483 (ECO110K)

|(53)/ Peroxide resistance protein, lowers intracellular iron|(190)-‐lipoamide

dehydrogenase, E3 component is part of three enzyme complexes |(106)

KLNAEIIKPVFLDEK(8)-‐VINQLTGGLAGMAKGR(14)

1 1.05E-‐03 NO

97 gb|AAC74001.1|(189)-‐gb|AAC74635.1|(1)

lipid A 4'kinase |(189)-‐Qin prophage; small toxic polypeptide |(1)

AGRLKSVDAVIVNGGVPR(5)-‐MKQQK(1)

1 6.28E-‐03 NO

98 gb|AAC73200.1|(83)-‐gb|AAC75219.1|(8)

Lipid II flippase; integral membrane protein involved in stabilizing FstZ ring

during cell division |(83)-‐inner membrane protein, UPF0324 family |(8)

LTNDPFFFAKR(10)-‐TNITLQKQHR(7)

2 7.82E-‐06 NO Yes-‐Y2H

99 gb|AAA58215.1|(1)-‐

ref|NP_415376.4|(120)

maltodextrin phosphorylase |(1)-‐putrescine transporter subunit: ATP-‐

binding component of ABC superfamily |(120)

MSQPIFNDKQFQEALSR(1)-‐QDKLPKAEIASR(3)

1 2.77E-‐05 NO


Yang et al. 6/7/12 2:08 PM 14


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

100 gb|AAA58215.1|(254)-‐gb|AAC74429.1|(25)

maltodextrin phosphorylase |(254)-‐Rac prophage; predicted protein |(25)

AEQQGINAEKLTKVLYPNDNHTAGK(13)-‐SKLTK(2)

1 3.27E-‐05 NO

101 gb|AAC75211.1|(81)-‐gb|AAC75438.1|(2)

methyl-‐galactoside transporter subunit |(81)-‐valine-‐pyruvate aminotransferase 3

|(2)

QNDQIDVLLAKGVK(11)-‐ADTR(1)

1 4.50E-‐04 YES NO

102 gb|AAC74310.1|(173)-‐gb|AAC43066.1|(188)

molybdenum-‐cofactor-‐assembly chaperone subunit (delta subunit) of

nitrate reductase 1 |(173)-‐argininosuccinate lyase |(188)

LANTAIDSDKVAEK(10)-‐LQDALKR(6)

1 1.69E-‐06 NO

103 gb|AAC74925.1|(323)-‐gb|AAC73700.1|(361)

myristoyl-‐acyl carrier protein (ACP)-‐dependent acyltransferase |(323)-‐predicted oxidoreductase |(361)

KDLYPIK(7)-‐KVESFKA(6) 1 4.77E-‐05 NO

104 gb|AAA58162.1|(90)-‐gb|AAC75024.1|(138)

NADH-‐nitrate oxidoreductase apoprotein |(90)/-‐conserved inner membrane protein

|(138)

AITINRQEKVIHSSAGR(9)-‐LAAQDPLKFEK(8)

2 2.50E-‐05 NO

105 gb|AAC43111.1|(66)-‐gb|AAA96987.1|(183)

ORF_f728; ankyrin repeat protein |(66)-‐ORF_f311; D-‐allose transporter

subunit|(183)

HIFSNKDFVIK(6)-‐NGATEAFKK(8)

1 9.01E-‐03 NO

106 gb|AAC43111.1|(199)-‐gb|AAA69243.1|(791)

ORF_f728; ankyrin repeat protein |(199)-‐DNA mismatch repair protein |(791)

EALHDSLKR(8)-‐QKLR(1) 1 3.95E-‐05 NO

107

gb|AAC43111.1|(199)-‐gb|AAA69113.1|(2)/ gb|AAC76245.1|(376)/

ref|NP_415636.1|(16)/ref|NP_417354.1|(763)

ORF_f728; ankyrin repeat protein |(199)-‐ORF_o252 |(2)/ glutamate synthase, 4Fe-‐

4S protein, small subunit|(376)/ lipoprotein-‐releasing system

transmembrane protein|(16)/ predicted oxidoreductase, Fe-‐S subunit|(763)

EALHDSLKR(8)-‐GRRR(1) 1 5.18E-‐10 NO

108

gb|AAC43111.1|(199)-‐gb|AAA58124.1|(155)/

ref|NP_417786.1|(155)/gb|AAB18034.1|(191)/gb|

AAC73410.1|(191)

ORF_f728; ankyrin repeat protein |(199)-‐ORF_o398 |(155)/ general secretory pathway component, cryptic|(155)/ hypothetical protein|(191)/ predicted

electron transport protein with ferridoxin-‐like domain|(191)

EALHDSLKR(8)-‐QKIR(2) 1 1.00E-‐03 NO

109 gb|AAC74697.1|(9)-‐gb|AAA58115.1|(1)

oriC-‐binding complex H-‐NS/Cnu; binds 26 bp cnb site; also forms a complex with StpA |(9)-‐50S ribosomal subunit protein

L23 |(1)

TVQDYLLKFR(8)-‐MIREER(1) 1 9.21E-‐06 NO

110 gb|AAA69093.1|(14)-‐gb|AAC74589.1|(2)

phosphoglycerate kinase |(14)-‐autoinducer 2-‐binding protein |(2)

MTDLDLAGKR(9)-‐TLHRFKK(1)

3 8.97E-‐07 NO NO-‐Y2H

111 gb|AAA69093.1|(14)-‐gb|AAC73248.1|(94)

phosphoglycerate kinase |(14)-‐predicted fimbrial-‐like adhesin protein |(94)

MTDLDLAGKR(9)-‐KAQIKLTK(1)

1 6.97E-‐05 NO

112 ref|NP_416391.4|(96)-‐gb|AAA58068.1|(1)/ gb|AAC76296.1|(1)

predicted protein |(96)-‐ORF_f220 |(1)/ DNA-‐binding transcriptional regulator|(1)

KSQRAWLDFR(1)-‐MAKRTK(1)

1 1.45E-‐03 NO

113 gb|AAC74788.1|(77)-‐gb|AAC73245.1|(16)

protein chain initiation factor IF-‐3 |(77)-‐3-‐methyl-‐2-‐oxobutanoate

hydroxymethyltransferase |(16) FLYEKSK(5)-‐QEKK(3) 1 5.50E-‐05 NO

114 gb|AAA97280.1|(57)-‐ gb|AAC76634.1|(2)

purine-‐nucleoside phosphorylase |(57)-‐glutaredoxin 3 |(2)

GRKISVMGHGMGIPSCSIYTK(3)-‐ANVEIYTK(1)

2 4.64E-‐04 NO

115 ref|NP_416518.2|(2)-‐gb|AAC73708.1|(85)

putrescine importer, low affinity |(2)-‐universal stress protein UP12 |(85)

SHNVTPNTSR(1)-‐IKQHVR(2)

2 1.27E-‐13 NO Yes-‐Y2H

116 gb|AAC74647.1|(35)-‐ref|NP_415638.2|(2)/ gb|AAC75629.1|(412)

Qin prophage; cell division inhibition protein |(35)-‐deacetylase of acs and cheY,

regulates chemotaxis |(2)/ ATP-‐dependent RNA helicase|(412)

RKQER(2)-‐EKPR(1) 3 1.20E-‐05 NO

117 gb|AAA97208.1|(90)-‐gb|AAC73235.1|(685)

recombinase involved in phase variation |(90)-‐glucose dehydrogenase |(685)

EVQALKNWLSIR(6)-‐TNEVVWKK(7)

1 1.13E-‐06 NO

118 gb|AAC73823.1|(241)-‐gb|AAC73822.1|(1)

succinyl-‐CoA synthetase, NAD(P)-‐binding, alpha subunit |(241)-‐succinyl-‐CoA synthetase, beta subunit |(1)

EHVTKPVVGYIAGVTAPKGK(18)-‐MNLHEYQAK(1)

2 5.12E-‐14 YES YES 15.4 Å

119 gb|AAC74198.1|(340)-‐gb|AAA58137.1|(370)

transcription-‐repair coupling factor |(340)-‐CG Site No. 732; alternate name

far; elongation factor EF-‐G|(370)

VQLKTEHLPTK(4)-‐IVQMHANKR(8)

1 2.00E-‐06 NO

120 ref|YP_026188.1|(591)-‐gb|AAC74745.1|(48)/ gb|AAB47951.1|(48)

transketolase 1, thiamin-‐binding |(591)-‐predicted protein |(48)/ hypothetical

protein |(48)

VVSMPSTDAFDKQDAAYR(12)-‐TALANKRIQR(6)

1 1.46E-‐06 NO

121 gb|AAB18523.1|(1)-‐gb|AAC73164.1|(394)

unnamed protein product |(1)-‐peptidyl-‐prolyl cis-‐trans isomerase (PPIase) |(394)

MLSPVCPGFVCMR(1)-‐TDAAQKDR(6)

4 2.62E-‐06 NO

122 gb|AAC76834.1|(40)-‐gb|AAA97161.1|(2)

uridine phosphorylase |(40)-‐ORF_f332 |(2); DNA-‐binding transcriptional repressor, 5-‐gluconate-‐binding(2)

IAALMDKPVK(7)-‐RNHR(1) 2 3.99E-‐06 NO

123 ref|NP_416570.2|(151)/ gb|AAC73670.1|(46)/gb|

uridine/cytidine kinase |(151) bacteriophage N4 receptor, inner

RIKR(3)-‐NKNR(2) 1 2.90E-‐06 NO


Yang et al. 6/7/12 2:08 PM 15


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

AAC73678.1|(264)-‐gb|AAC73565.1|(3)

membrane subunit |(46)/mechanosensitive channel protein, miniconductance |(264)-‐multidrug efflux

system |(3)

124 gb|AAA57977.1|(392)-‐gb|AAB18536.1|(674)

yhbF; phosphoglucosamine mutase |(392)-‐glycine-‐tRNA synthetase, beta

subunit |(674)

YTAGSGDPLEHESVKAVTAEVEAALGNR(15)-‐LTMLEKLR(6)

1 7.55E-‐05 NO

Intra-‐molecular Cross-‐links

125 gb|AAC75568.1|(1)-‐gb|AAC75568.1|(11)

1-‐hydroxy-‐2-‐methyl-‐2-‐(E)-‐butenyl 4-‐diphosphate synthase |(1)-‐1-‐hydroxy-‐2-‐methyl-‐2-‐(E)-‐butenyl 4-‐diphosphate

synthase |(11)

MHNQAPIQR(1)-‐KSTR(1) 1 3 7.17E-‐06 NO

126 gb|AAC73277.1|(254)-‐gb|AAC73277.1|(263)

2,3,4,5-‐tetrahydropyridine-‐2-‐carboxylate N-‐succinyltransferase |(254)-‐2,3,4,5-‐tetrahydropyridine-‐2-‐carboxylate N-‐

succinyltransferase |(263)

YSLYCAVIVKK(10)-‐GKVGINELLR(2)

3 2.23E-‐22 NO

127 gb|AAC73277.1|(263)-‐gb|AAC73277.1|(259)

2,3,4,5-‐tetrahydropyridine-‐2-‐carboxylate N-‐succinyltransferase |(263)-‐2,3,4,5-‐tetrahydropyridine-‐2-‐carboxylate N-‐

succinyltransferase |(259)

GKVGINELLR(2)-‐VDAKTR(4)

6 4 3.23E-‐09 NO

128 gb|AAC73820.1|(54)-‐gb|AAC73820.1|(71)

2-‐oxoglutarate decarboxylase, thiamin-‐requiring |(54)-‐2-‐oxoglutarate

decarboxylase, thiamin-‐requiring |(71)

STFQQLPGTGVKPDQFHSQTR(12)-‐LAKDASR(3)

3 7.49E-‐05 YES NO

129 gb|AAC73997.1|(260)-‐gb|AAC73997.1|(247)


VSLGLKQLGEDPWVAIAK(6)-‐VLKFDR(3)

2 1.49E-‐05 NO

130 gb|AAC73997.1|(279)-‐gb|AAC73997.1|(347)


YPEGTKLTGR(6)-‐ISLGLKQCK(6)

3 3.05E-‐12 YES NO

131 gb|AAA58118.1|(59)-‐gb|AAA58118.1|(11)


FTVLISPHVNKDAR(11)-‐LKAFDHR(2)

6 1.96E-‐22 YES NO

132 gb|AAA58118.1|(82)-‐gb|AAA58118.1|(1)


LVDIVEPTEKTVDALMR(10)-‐MQNQR(1)

3 7.54E-‐07 YES NO

133 gb|AAA58118.1|(82)-‐gb|AAA58118.1|(30)


LVDIVEPTEKTVDALMR(10)-‐LIDQATAEIVETAKR(14)

1 6.67E-‐10 YES YES 10.5 Å

134 gb|AAA58139.1|(108)-‐gb|AAA58139.1|(120)


GALDCSGVKDR(9)-‐YGVKRPK(4)

1 2.95E-‐11 YES YES 9.6 Å

135 gb|AAA58139.1|(108)-‐gb|AAA58139.1|(51)


GALDCSGVKDR(9)-‐KVCR(1) 4 8.06E-‐07 YES YES 15.9 Å

136 gb|AAA58139.1|(108)-‐ref|NP_417045.4|(269)

30S ribosomal subunit protein S12 |(108)-‐predicted DNA-‐binding transcriptional

regulator |(269)

GALDCSGVKDR(9)-‐KQAR(1)

13 4.69E-‐07 YES YES 6.1 Å

137 gb|AAA58139.1|(51)-‐ref|NP_417045.4|(269)

30S ribosomal subunit protein S12 |(51)-‐predicted DNA-‐binding transcriptional

regulator |(269) KVCR(1)-‐KQAR(1) 3 3.95E-‐06 YES YES 21.6 Å

138 gb|AAA58093.1|(103)-‐ref|NP_416896.4|(571)

30S ribosomal subunit protein S13 |(103)-‐predicted diguanylate cyclase |(571)

TKTNAR(2)-‐KGPR(1) 1 3 3.69E-‐06 YES YES 12.2 Å

139 gb|AAA58093.1|(110)-‐gb|AAA58093.1|(103)


TRKGPR(3)-‐TKTNAR(2) 5 2.05E-‐06 YES YES 12.2 Å

140 gb|AAA58093.1|(78)-‐gb|AAA58093.1|(103)


EISMSIKR(7)-‐TKTNAR(2) 5 2.83E-‐07 YES YES 22.5 Å

141 gb|AAA58093.1|(78)-‐ref|NP_416896.4|(571)

30S ribosomal subunit protein S13 |(78)-‐predicted diguanylate cyclase |(571)

EISMSIKR(7)-‐KGPR(1) 1 1.13E-‐07 YES YES 19.0 Å

142 gb|AAA97098.1|(50)-‐gb|AAA97098.1|(9)


AKYQR(2)-‐KFCR(1) 3 2.55E-‐11 YES NO

143 gb|AAA58113.1|(29)-‐gb|AAA58113.1|(18)


AVESGDKKPLR(8)-‐KVEK(1) 1 2.58E-‐04 YES YES 14.4 Å

144 gb|AAC73280.1|(11)-‐gb|AAC73280.1|(2)


DMLKAGVHFGHQTR(4)-‐ATVSMR(1)

6 1.14E-‐12 YES NO

145 gb|AAC73280.1|(115)-‐gb|AAC73280.1|(66)


LKDLETQSQDGTFDK(2)-‐KGKILFVGTK(3)

1 2.15E-‐03 YES YES 19.8 Å

146 gb|AAC73280.1|(115)-‐ gb|AAC73280.1|(112)

30S ribosomal subunit protein S2 |(115)-‐30S ribosomal subunit protein S2|(112)

LKDLETQSQDGTFDK(2)-‐QSIKR(4)

1 8.91E-‐05 YES YES 5.4 Å

147 gb|AAC73134.1|(16)-‐gb|AAC73134.1|(19)


AIQSEKAR(6)-‐KHNASR(1) 2 20 3.04E-‐08 YES YES 5.0 Å

148 gb|AAC73134.1|(16)-‐gb|AAC73134.1|(19)


RAIQSEKAR(7)-‐KHNASR(1) 5 4.17E-‐09 YES YES 5.0 Å

149 gb|AAC73134.1|(16)-‐gb|AAC73134.1|(19)


AIQSEKAR(6)-‐KHNASRR(1) 6 1.89E-‐07 YES YES 5.0 Å

150 gb|AAC73134.1|(49)-‐gb|AAC73134.1|(34)


AAAQKAFNEMQPIVDR(5)-‐KVYAAIEAGDK(1)

2 7.28E-‐13 YES YES 9.8 Å

151 gb|AAC73134.1|(5)-‐gb|AAC73134.1|(19)


ANIKSAK(4)-‐KHNASR(1) 2 5.44E-‐08 YES YES 23.0 Å

152 gb|AAC73134.1|(64)-‐gb|AAC73134.1|(71)


QAAKGLIHK(4)-‐NKAAR(2) 3 8.11E-‐05 YES YES 8.6 Å

153 gb|AAA58111.1|(108)-‐gb|AAA58111.1|(147)


KPELDAK(1)-‐LGAKGIK(4) 7 5.45E-‐11 YES YES 15.9 Å

154 gb|AAA58111.1|(49)-‐gb|AAA58111.1|(79)


ELAKASVSR(4)-‐PGIVIGKK(7)

3 1.04E-‐08 YES YES 17.3 Å


Yang et al. 6/7/12 2:08 PM 16


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

155 gb|AAA58094.1|(156)-‐ref|NP_418672.4|(5)

30S ribosomal subunit protein S4 |(156)-‐predicted transcriptional regulator |(5)

VKAALELAEQR(2)-‐KQSR(1) 3 4 7.88E-‐06 YES YES 9.3 Å

156 gb|AAA58094.1|(167)-‐gb|AAA58094.1|(183)


EKPTWLEVDAGK(2)-‐MEGTFKR(6)

1 5 1.41E-‐05 YES YES 11.5 Å

157 gb|AAA58094.1|(83)-‐gb|AAA58094.1|(185)


LKGNTGENLLALLEGR(2)-‐KPER(1)

3 27 1.21E-‐07 YES YES 14.3 Å

158 gb|AAA58094.1|(83)-‐gb|AAA58094.1|(77)


LKGNTGENLLALLEGR(2)-‐NYYKEAAR(4)

2 1.66E-‐06 YES YES 11.7 Å

159 gb|AAA58138.1|(137)-‐gb|AAA58138.1|(110)


KREDVHR(1)-‐KRGDK(1) 1 6.75E-‐06 YES YES 16.6 Å

160 gb|AAA58103.1|(41)-‐gb|AAA58103.1|(56)


VAIANVLKEEGFIEDFK(8)-‐VEGDTKPELELTLK(6)

9 1.44E-‐07 YES YES 23.5 Å

161 gb|AAA58032.1|(100)-‐gb|AAA58032.1|(13)


KAGFVTR(1)-‐RKSSAAR(2) 1 1.43E-‐08 YES YES 18.5 Å

162 gb|AAA58032.1|(2)-‐gb|AAA58032.1|(100)


AENQYYGTGR(1)-‐KAGFVTR(1)

12 2.28E-‐12 YES NO

163 gb|AAA58032.1|(2)-‐gb|AAA58032.1|(13)


AENQYYGTGR(1)-‐RKSSAAR(2)

1 4.33E-‐05 YES NO

164 gb|AAA58032.1|(2)-‐gb|AAA58032.1|(13)


AENQYYGTGRR(1)-‐KSSAAR(1)

13 2.16E-‐12 YES NO

165 gb|AAA58032.1|(2)-‐gb|AAA58032.1|(22)


AENQYYGTGR(1)-‐VFIKPGNGK(4)

5 10 7.98E-‐12 YES NO

166 gb|AAC43082.1|(167)-‐gb|AAC43082.1|(54)


YRNDKNGIIHTTIGK(5)-‐KSDQNVR(1)

3 9.04E-‐21 YES YES 7.3 Å

167 gb|AAC43082.1|(167)-‐gb|AAC43082.1|(54)


NDKNGIIHTTIGK(3)-‐KSDQNVR(1)

3 7.37E-‐17 YES YES 7.3 Å

168 gb|AAC43082.1|(205)-‐gb|AAC43082.1|(54)


AKPTQAKGVYIK(7)-‐KSDQNVR(1)

8 3.12E-‐19 YES YES 17.2 Å

169 gb|AAC43082.1|(205)-‐gb|AAC43082.1|(54)


PTQAKGVYIK(5)-‐KSDQNVR(1)

8 9.67E-‐20 YES YES 17.2 Å

170 gb|AAC43082.1|(54)-‐gb|AAC43082.1|(210)


KSDQNVR(1)-‐GVYIKK(5) 1 2.32E-‐04 YES YES 20.0 Å

171 gb|AAC43083.1|(37)-‐gb|AAC43083.1|(105)


GVTVDKMTELR(6)-‐ANAKFEVK(4)

2 5.59E-‐09 NO

172 gb|AAA58091.1|(78)-‐gb|AAA58091.1|(121)


TRDNEIVAKLFNELGPR(9)-‐SEKAEAAAE(3)

1 3.91E-‐06 YES YES 8.8 Å

173 gb|AAC75655.1|(111)-‐gb|AAC75655.1|(106)


IKERLN(2)-‐TGKAAR(3) 5 1.23E-‐06 YES YES 16.8 Å

174 gb|AAC75655.1|(63)-‐gb|AAC75655.1|(106)


KISNGEGVER(1)-‐TGKAAR(3)

2 6 1.06E-‐06 YES YES 13.5 Å

175 gb|AAC75655.1|(87)-‐gb|AAC75655.1|(106)


VFQTHSPVVDSISVKR(15)-‐TGKAAR(3)

3 2.44E-‐07 YES YES 17.0 Å

176 gb|AAC75655.1|(87)-‐gb|AAC75655.1|(63)


VFQTHSPVVDSISVKR(15)-‐KISNGEGVER(1)

4 25 2.77E-‐26 YES YES 18.1 Å

177 gb|AAA58114.1|(125)-‐gb|AAA58114.1|(108)


AGDQIQSGVDAAIKPGNTLPMR(14)-‐YILAPKGLK(6)

3 1.31E-‐06 YES YES 10.7 Å

178 gb|AAA58114.1|(183)-‐gb|AAA58114.1|(207)


KVEADCR(1)-‐VLGKAGAAR(4)

16 1.17E-‐13 YES NO

179 gb|AAA58114.1|(59)-‐gb|AAA58114.1|(183)


HIGGGHKQAYR(7)-‐KVEADCR(1)

8 7.32E-‐09 YES NO

180 gb|AAA58114.1|(71)-‐gb|AAA58114.1|(68)


NKDGIPAVVER(2)-‐IVDFKR(5)

12 5.83E-‐07 YES YES 9.5 Å

181 gb|AAA58112.1|(70)-‐gb|AAA58112.1|(1)


VLESAIANAEHNDGADIDDLKVTK(21)-‐METIAK(1)

3 2.75E-‐10 YES YES 1.4 Å

182 gb|AAA57986.1|(19)-‐gb|AAA57986.1|(24)


DSEAKR(5)-‐LGVKR(4) 7 1.34E-‐09 YES YES 9.9 Å

183 gb|AAC76661.1|(10)-‐gb|AAC76661.1|(26)


VCQVTGKRPVTGNNR(7)-‐SHALNATKR(8)

1 7.38E-‐07 YES YES 17.3 Å

184 gb|AAC76661.1|(10)-‐gb|AAC76661.1|(54)


VCQVTGKRPVTGNNR(7)-‐VSAKGMR(4)

3 2.92E-‐06 YES YES 10.3 Å

185 gb|AAC76661.1|(26)-‐gb|AAC76661.1|(10)


SHALNATKR(8)-‐VCQVTGKR(7)

6 19 2.76E-‐17 YES YES 17.3 Å

186 gb|AAC76661.1|(26)-‐gb|AAC76661.1|(54)


SHALNATKR(8)-‐VSAKGMR(4)

13 3.05E-‐07 YES YES 24.0 Å

187 gb|AAA58117.1|(38)-‐gb|AAA58117.1|(1)


VTQVKDLANDGYR(5)-‐MIGLVGK(1)

5 2.52E-‐10 YES YES 14.2 Å

188 gb|AAC76660.1|(50)-‐gb|AAC76660.1|(33)


QHVIYKEAK(6)-‐TKPEKLELK(5)

5 7.73E-‐11 YES YES 1.1 Å

189 gb|AAA58116.1|(123)-‐gb|AAA58116.1|(137)


LIVVEKFSVEAPK(6)-‐LLAQKLK(5)

1 3.73E-‐04 YES YES 15.6 Å

190 gb|AAA58116.1|(166)-‐gb|AAA58116.1|(63)


NLHKVDVR(4)-‐QKGTGR(2) 1 5 1.50E-‐08 YES YES 17.4 Å

191 gb|AAA58116.1|(74)-‐gb|AAA58116.1|(63)


SGSIKSPIWR(5)-‐QKGTGR(2)

8 1.64E-‐05 YES YES 20.4 Å

192 gb|AAA58105.1|(47)-‐gb|AAA58105.1|(69)


ITLNMGVGEAIADKK(14)-‐PLITKAR(5)

4 5.72E-‐10 YES YES 12.7 Å

193 gb|AAC43084.1|(109)-‐ 50S ribosomal subunit protein L7/L12 KALEEAGAEVEVK(1)-‐ 1 8.98E-‐05 YES NO


Yang et al. 6/7/12 2:08 PM 17


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

gb|AAC43084.1|(101) |(109)-‐50S ribosomal subunit protein L7/L12 |(101)

EGVSKDDAEALK(5)

194 gb|AAC43084.1|(85)-‐gb|AAC43084.1|(71)

50S ribosomal subunit protein L7/L12 |(85)-‐50S ribosomal subunit protein

L7/L12 |(71)

EAKDLVESAPAALK(3)-‐VAVIKAVR(5)

6 9.19E-‐09 YES NO

195 gb|AAA97099.1|(22)-‐gb|AAA97099.1|(1)


VANLGSLGDQVNVKAGYAR(14)-‐MQVILLDK(1)

2 1.99E-‐15 YES YES 6.0 Å

196 ref|NP_414903.4|(2)-‐ref|NP_414903.4|(2)

5-‐aminolevulinate dehydratase (porphobilinogen synthase) |(2)-‐5-‐

aminolevulinate dehydratase (porphobilinogen synthase) |(2)

TDLIQRPR(1)-‐TDLIQRPR(1) 3 3 2.23E-‐11 YES ? 0.0 Å

197 ref|NP_414903.4|(217)-‐ref|NP_414903.4|(213)

5-‐aminolevulinate dehydratase (porphobilinogen synthase) |(217)-‐5-‐

aminolevulinate dehydratase (porphobilinogen synthase) |(213)

KSYQMNPMNR(1)-‐EAAGSALKGDR(8)

1 1.96E-‐12 YES YES 12.8 Å

198 ref|NP_418346.2|(48)-‐ref|NP_418346.2|(207)

6-‐N-‐hydroxylaminopurine resistance protein |(48)-‐6-‐N-‐hydroxylaminopurine

resistance protein |(207) KVHGGPDR(1)-‐TMQKR(4) 3 2.25E-‐03 YES YES 22.0 Å

199 gb|AAC73296.1|(53)-‐gb|AAC73296.1|(46)

acetyl-‐CoA carboxylase, carboxytransferase, alpha subunit |(53)-‐

acetyl-‐CoA carboxylase, carboxytransferase, alpha subunit |(46)

KIFADLGAWQIAQLAR(1)-‐EKSVELTR(2)

2 7.59E-‐07 YES YES 11.0 Å

200 gb|AAC74358.1|(10)-‐gb|AAC74358.1|(2)

aconitate hydratase 1 |(10)-‐aconitate hydratase 1 |(2)

EASKDTLQAK(4)-‐SSTLR(1) 2 1.06E-‐06 NO

201 gb|AAC74178.1|(10)-‐gb|AAC74178.1|(1)

acyl carrier protein (ACP) |(10)-‐acyl carrier protein (ACP) |(1)

KIIGEQLGVK(1)-‐MSTIEER(1)

7 8 1.34E-‐15 YES NO

202 gb|AAC74178.1|(10)-‐gb|AAC74178.1|(2)

acyl carrier protein (ACP) |(10)-‐acyl carrier protein (ACP) |(2)

KIIGEQLGVK(1)-‐STIEER(1) 4 24 3.10E-‐11 YES YES 9.4 Å

203 gb|AAC73576.1|(50)-‐gb|AAC73576.1|(40)

adenylate kinase |(50)-‐adenylate kinase |(40)

QAKDIMDAGK(3)-‐AAVKSGSELGK(4)

1 5.47E-‐05 YES YES 8.9 Å

204 gb|AAC74215.1|(19)-‐gb|AAC74215.1|(83)

adenylosuccinate lyase |(19)-‐adenylosuccinate lyase |(83)

YGDKVSALR(4)-‐IKTIER(2) 4 6 2.86E-‐12 YES YES 15.9 Å

205 gb|AAC73706.1|(27)-‐gb|AAC73706.1|(7)

alkyl hydroperoxide reductase, C22 subunit |(27)-‐alkyl hydroperoxide

reductase, C22 subunit |(7)

NGEFIEITEKDTEGR(10)-‐SLINTKIKPFK(6)

1 1.71E-‐05 NO

206 gb|AAA97297.1|(195)-‐gb|AAA97297.1|(188)

alternate gene names arcA, fexA, msp, seg, sfrA; CG Site No. 831; alternate gene names arcA, fexA, msp, seg, sfrA; DNA-‐binding response regulator in two-‐

component regulatory system with ArcB or CpxA |(195)-‐alternate gene names arcA, fexA, msp, seg, sfrA; CG Site No. 831; alternate gene names arcA, fexA, msp, seg, sfrA; DNA-‐binding response regulator in two-‐component regulatory

system with ArcB or CpxA |(188)

ELKPHDR(3)-‐KMTGR(1) 2 7.48E-‐06 NO

207 gb|AAC73341.1|(315)-‐gb|AAC73341.1|(315)

aminoacyl-‐histidine dipeptidase (peptidase D) |(315)-‐aminoacyl-‐histidine

dipeptidase (peptidase D) |(315) AALIAKSR(6)-‐AALIAKSR(6) 1 5.58E-‐05 NO

208 gb|AAC73341.1|(389)-‐gb|AAC73341.1|(315)


dipeptidase (peptidase D) |(315)

LAGAKTEAK(5)-‐AALIAKSR(6)

2 1.28E-‐06 NO

209 gb|AAC73341.1|(59)-‐gb|AAC73341.1|(43)


dipeptidase (peptidase D) |(43)

KPATAGMENR(1)-‐EKGFHVER(2)

2 4.41E-‐10 NO

210 gb|AAA97157.1|(80)-‐gb|AAA97157.1|(147)

aminopeptidase A/1 |(80)-‐aminopeptidase A/1 |(147)

ILLIGCGKER(8)-‐TNKSEPR(3) 4 8.51E-‐08 YES YES 18.2 Å

211 gb|AAC74018.1|(843)-‐gb|AAC74018.1|(839)

aminopeptidase N |(843)-‐aminopeptidase N |(839)

QEKMR(3)-‐YDAKR(4) 2 6.67E-‐08 YES YES 6.7 Å

212 gb|AAC74016.1|(286)-‐gb|AAC74016.1|(294)

asparaginyl tRNA synthetase |(286)-‐asparaginyl tRNA synthetase |(294)

ADDMKFFAER(5)-‐VDKDAVSR(3)

28 1.55E-‐10 NO

213 gb|AAC74016.1|(351)-‐gb|AAC74016.1|(294)

asparaginyl tRNA synthetase |(351)-‐asparaginyl tRNA synthetase |(294)

YLAEEHFKAPVVVK(8)-‐VDKDAVSR(3)

1 1.08E-‐10 NO

214 gb|AAC74014.1|(134)-‐gb|AAC74014.1|(121)

aspartate aminotransferase, PLP-‐dependent |(134)-‐aspartate

aminotransferase, PLP-‐dependent |(121)

RVWVSNPSWPNHKSVFNSAGLEVR(13)-‐

VAADFLAKNTSVKR(13) 3 8.25E-‐08 YES YES 18.1 Å

215 gb|AAC74014.1|(276)-‐gb|AAC74014.1|(93)

aspartate aminotransferase, PLP-‐dependent |(276)-‐aspartate

aminotransferase, PLP-‐dependent |(93)

AFSQMKAAIR(6)-‐GSALINDKR(8)

4 4.83E-‐12 YES YES 10.8 Å

216 gb|AAA97142.1|(30)-‐gb|AAA97142.1|(2)


carbomoyltransferase catalytic subunit |(2)

DDLNLVLATAAKLK(12)-‐ANPLYQK(1)

8 8.92E-‐23 YES YES 17.6 Å

217 gb|AAA97142.1|(41)-‐gb|AAA97142.1|(41)


carbomoyltransferase catalytic subunit |(41)

ANPQPELLKHK(9)-‐ANPQPELLKHK(9)

2 5.14E-‐08 YES ?


Yang et al. 6/7/12 2:08 PM 18


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

218 gb|AAC75632.1|(26)-‐gb|AAC75632.1|(1)

autonomous glycyl radical cofactor |(26)-‐autonomous glycyl radical cofactor |(1)

AANDDLLNSFWLLDSEKGEAR(17)-‐MITGIQITK(1)

7 4.32E-‐20 NO

219 gb|AAC75632.1|(62)-‐gb|AAC75632.1|(1)

autonomous glycyl radical cofactor |(62)-‐autonomous glycyl radical cofactor |(1)

EVPVEVKPEVR(7)-‐MITGIQITK(1)

4 2.35E-‐13 NO

220 gb|AAC73229.1|(1)-‐gb|AAC73229.1|(7)

bifunctional aconitate hydratase 2/2-‐methylisocitrate dehydratase |(1)-‐

bifunctional aconitate hydratase 2/2-‐methylisocitrate dehydratase |(7)

MLEEYR(1)-‐KHVAER(1) 8 50 1.74E-‐15 YES YES 10.2 Å

221 gb|AAC73229.1|(396)-‐gb|AAC73229.1|(373)

bifunctional aconitate hydratase 2/2-‐methylisocitrate dehydratase |(396)-‐bifunctional aconitate hydratase 2/2-‐methylisocitrate dehydratase |(373)

ACGVKGIRPGAYCEPK(5)-‐QAKDVAESDR(3)

4 2.92E-‐11 YES YES 19.3 Å

222 gb|AAC73229.1|(759)-‐gb|AAC73229.1|(722)

bifunctional aconitate hydratase 2/2-‐methylisocitrate dehydratase |(759)-‐bifunctional aconitate hydratase 2/2-‐methylisocitrate dehydratase |(722)

MDAAQLTEEGYYSVFGKSGAR(17)-‐AAGKLLDAHK(4)

2 7.04E-‐13 YES YES 12.7 Å

223 gb|AAC73144.1|(412)-‐gb|AAC73144.1|(504)

carbamoyl-‐phosphate synthase large subunit |(412)-‐carbamoyl-‐phosphate

synthase large subunit |(504)

GLEVGATGFDPKVSLDDPEALTK(12)-‐LAKLAGVR(3)

3 12 2.66E-‐07 YES YES 13.9 Å

224 gb|AAC73144.1|(649)-‐gb|AAC73144.1|(366)

carbamoyl-‐phosphate synthase large subunit |(649)-‐carbamoyl-‐phosphate

synthase large subunit |(366)

GVIVQYGGQTPLKLAR(13)-‐FNFEKFAGANDR(5)

7 2.30E-‐14 YES YES 20.6 Å

225 gb|AAA58092.1|(1)-‐gb|AAA58092.1|(145)

CG Site no. 234; RNA polymerase alpha subunit |(1)-‐CG Site no. 234; RNA polymerase alpha subunit |(145)

MQGSVTEFLKPR(1)-‐IKVQR(2)

4 3 7.21E-‐07 YES YES 23.9 Å

226 gb|AAA58092.1|(304)-‐gb|AAA58092.1|(297)


SLTEIKDVLASR(6)-‐TPNLGKK(6)

3 3.12E-‐12 YES NO

227 gb|AAA58092.1|(95)-‐gb|AAA58092.1|(145)


VQGKDEVILTLNK(4)-‐IKVQR(2)

1 4.77E-‐05 YES YES 12.1 Å

228 gb|AAA58136.1|(177)-‐gb|AAA58136.1|(57)

CG Site No. 61; translation elongation factor Tu |(177)-‐CG Site No. 61;


GSALKALEGDAEWEAK(5)-‐AFDQIDNAPEEKAR(12)

1 2.65E-‐16 YES NO

229 gb|AAA58136.1|(209)-‐gb|AAA58136.1|(264)



AIDKPFLLPIEDVFSISGR(4)-‐KLLDEGR(1)

14 1.20E-‐15 YES NO

230 gb|AAA58136.1|(209)-‐gb|AAA58136.1|(57)



AIDKPFLLPIEDVFSISGR(4)-‐AFDQIDNAPEEKAR(12)

3 9.03E-‐15 YES NO

231 gb|AAA58136.1|(253)-‐gb|AAA58136.1|(295)



ETQKSTCTGVEMFR(4)-‐GQVLAKPGTIKPHTK(6)

4 1.73E-‐13 YES YES 9.8 Å

232 gb|AAA58136.1|(253)-‐gb|AAA58136.1|(300)



ETQKSTCTGVEMFR(4)-‐GQVLAKPGTIKPHTK(11)

5 3.62E-‐14 YES YES 12.8 Å

233 gb|AAA58136.1|(253)-‐gb|AAA58136.1|(300)



ETQKSTCTGVEMFR(4)-‐PGTIKPHTK(5)

12 8.30E-‐14 YES YES 12.8 Å

234 gb|AAA58136.1|(264)-‐gb|AAA58136.1|(5)


translation elongation factor Tu |(5) KLLDEGR(1)-‐EKFER(2) 1 4.56E-‐05 YES NO

235 gb|AAA58136.1|(391)-‐gb|AAA58136.1|(300)



TVGAGVVAKVLG(9)-‐PGTIKPHTK(5)

2 2.13E-‐08 YES NO

236 gb|AAA58136.1|(57)-‐gb|AAA58136.1|(264)

CG Site No. 61; translation elongation factor Tu |(57)-‐CG Site No. 61; translation

elongation factor Tu |(264)

AFDQIDNAPEEKAR(12)-‐KLLDEGR(1)

4 60 1.15E-‐19 YES NO

237 gb|AAA58137.1|(23)-‐gb|AAA58137.1|(143)

CG Site No. 732; alternate name far; elongation factor EF-‐G|(23)-‐CG Site No.

732; alternate name far; elongation factor EF-‐G|(143)

NIGISAHIDAGKTTTTER(12)-‐IAFVNKMDR(6)

3 1.02E-‐10 YES YES 11.7 Å

238 gb|AAA58137.1|(370)-‐gb|AAA58137.1|(375)

CG Site No. 732; alternate name far; elongation factor EF-‐G |(370)-‐CG Site No. 732; alternate name far; elongation factor

EF-‐G|(375)

IVQMHANKR(8)-‐EEIKEVR(4)

2 8 2.57E-‐11 YES YES 17.2 Å

239 gb|AAA58137.1|(389)-‐gb|AAA58137.1|(440)

CG Site No. 732; alternate name far; elongation factor EF-‐G|(389)-‐CG Site No. 732; alternate name far; elongation factor

EF-‐G|(440)

AGDIAAAIGLKDVTTGDTLCDPDAPIILER(11)-‐LAKEDPSFR(3)

3 8.65E-‐06 YES YES 8.6 Å

240 gb|AAA58137.1|(440)-‐gb|AAA58137.1|(134)


EF-‐G|(134)

LAKEDPSFR(3)-‐YKVPR(2) 6 1.83E-‐08 YES NO

241 gb|AAA58137.1|(686)-‐gb|AAA58137.1|(643)


EF-‐G|(643)

ASYTMEFLKYDEAPSNVAQAVIEAR(9)-‐

GMLKGQESEVTGVK(4) 1 5.60E-‐07 YES YES 15.3 Å


Yang et al. 6/7/12 2:08 PM 19


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

242 gb|AAC73125.1|(166)-‐gb|AAC73125.1|(155)

chaperone Hsp70, co-‐chaperone with DnaJ |(166)-‐chaperone Hsp70, co-‐

chaperone with DnaJ |(155)

IAGLEVKR(7)-‐QATKDAGR(4)

11 30 7.82E-‐16 YES YES 9.3 Å

243 gb|AAC73125.1|(304)-‐gb|AAC73125.1|(246)



AKLESLVEDLVNR(2)-‐KDQGIDLR(1)

4 3.33E-‐17 YES YES 16.2 Å

244 gb|AAC73125.1|(304)-‐gb|AAC73125.1|(299)



AKLESLVEDLVNR(2)-‐HMNIKVTR(5)

8 69 4.42E-‐15 YES YES 9.7 Å

245 gb|AAC73125.1|(55)-‐gb|AAC73125.1|(263)



TTPSIIAYTQDGETLVGQPAKR(21)-‐LKEAAEK(2)

2 17 1.76E-‐09 YES YES 14.2 Å

246 ref|NP_417434.4|(87)/gb

|AAA69126.1|(97)-‐ref|NP_417434.4|(93)/gb

|AAA69126.1|(103)

conserved protein, DUF469 family |(87)/ ORF_f118(97)-‐conserved protein, DUF469

family |(93)/ ORF_f118(103) KWLEER(1)-‐KLDEVR(1) 1 47 2.83E-‐10 NO

247 ref|NP_416797.2|(55)-‐ref|NP_416797.2|(1)

conserved protein, UPF0304 family |(55)-‐conserved protein, UPF0304 family |(1)

EFGELKEETCR(6)-‐MEMTNAQR(1)

22 3.38E-‐15 YES NO

248 ref|NP_416797.2|(55)-‐ref|NP_416797.2|(1)


ELDREFGELKEETCR(10)-‐MEMTNAQR(1)

1 2.23E-‐14 YES NO

249

gb|AAC77485.1|(30)/ gb|AAA67568.1|(30)-‐gb|AAC77485.1|(80)/ gb|AAA67568.1|(80)

conserved protein, UPF0438 family |(30)/ o137|(30)-‐conserved protein, UPF0438

family |(80)/ o137|(80)

HGDFTIKEAQLLER(7)-‐VWSKYMTR(4)

2 9.49E-‐09 NO

250 gb|AAC76668.1|(61)-‐gb|AAC76668.1|(61)


GKVECTLR(2)-‐GKVECTLR(2) 2 7 5.15E-‐08 NO

251 gb|AAC73821.1|(133)-‐gb|AAC73821.1|(148)

dihydrolipoyltranssuccinase |(133)-‐dihydrolipoyltranssuccinase |(148)

LLAEHNLDASAIKGTGVGGR(13)-‐EDVEKHLAK(5)

6 1.16E-‐09 YES YES 10.7 Å

252 gb|AAC73821.1|(133)-‐gb|AAC73821.1|(94)


LLAEHNLDASAIKGTGVGGR(13)-‐SEEKASTPAQR(4)

2 4.02E-‐16 NO

253 gb|AAC73821.1|(156)-‐gb|AAC73821.1|(148)


APAKESAPAAAAPAAQPALAAR(4)-‐EDVEKHLAK(5)

1 5.75E-‐05 NO

254 gb|AAC73821.1|(94)-‐gb|AAC73821.1|(85)


SEEKASTPAQR(4)-‐EGNSAGKETSAK(7)

3 7.30E-‐07 NO

255 gb|AAC73777.1|(148)-‐gb|AAC73777.1|(117)

DNA-‐binding transcriptional dual regulator of siderophore biosynthesis and

transport |(148)-‐DNA-‐binding transcriptional dual regulator of

siderophore biosynthesis and transport |(117)

EDEHAHEGK(9)-‐EIAAKHGIR(5)

1 1.50E-‐09 NO

256 gb|AAC73975.1|(129)-‐gb|AAC73975.1|(162)

DNA-‐binding transcriptional dual regulator, leucine-‐binding |(129)-‐DNA-‐binding transcriptional dual regulator,

leucine-‐binding |(162)

KLLGETLLR(1)-‐LVIKTR(4) 2 4.23E-‐08 YES YES 20.6 Å

257 gb|AAC43085.1|(1065)-‐gb|AAC43085.1|(1073)

DNA-‐directed RNA polymerase, beta-‐subunit |(1065)-‐DNA-‐directed RNA polymerase, beta-‐subunit |(1073)

IQPGDKMAGR(6)-‐HGNKGVISK(4)

6 5 3.19E-‐09 YES NO

258 gb|AAC43086.1|(1132)-‐gb|AAC43086.1|(781)

DNA-‐directed RNA polymerase, beta'-‐subunit |(1132)-‐DNA-‐directed RNA polymerase, beta'-‐subunit |(781)

IPQESGGTKDITGGLPR(9)-‐KGLADTALK(1)

3 1.82E-‐12 YES NO

259 gb|AAC43085.1|(1158)-‐gb|AAC43085.1|(1)

DNA-‐directed RNA polymerase, beta-‐subunit |(1158)-‐DNA-‐directed RNA polymerase, beta-‐subunit |(1)

QKVDLSTFSDEEVMR(2)-‐MVYSYTEK(1)

16 3.49E-‐13 YES NO

260 gb|AAC43086.1|(1192)-‐gb|AAC43086.1|(1072)


LVITPVDGSDPYEEMIPKWR(18)-‐TAGGKDLRPALK(5)

1 8.39E-‐10 YES NO

261 gb|AAC43086.1|(87)-‐gb|AAC43086.1|(50)


GVICEKCGVEVTQTK(6)-‐TFKPER(3)

2 3.61E-‐08 YES YES 18.1 Å

262 gb|AAC43086.1|(953)-‐gb|AAC43086.1|(992)


AAAESSIQVKNK(10)-‐TKESYK(2)

1 4.76E-‐09 YES YES 8.9 Å

263 gb|AAC43086.1|(992)-‐gb|AAC43086.1|(955)


TKESYK(2)-‐NKGSIK(2) 2 1.88E-‐07 YES YES 9.9 Å

264 gb|AAC76774.1|(101)-‐gb|AAC76774.1|(119)

D-‐ribose transporter subunit |(101)-‐D-‐ribose transporter subunit |(119)

ILLINPTDSDAVGNAVKMANQANIPVITLDR(17)-‐

QATKGEVVSHIASDNVLGGK(4)

2 3.48E-‐08 YES YES 7.9 Å

265 gb|AAC76774.1|(275)-‐gb|AAC76774.1|(281)


GVETADKVLK(7)-‐GEKVQAK(3)

4 4 1.53E-‐11 YES YES 7.0 Å

266 gb|AAC76774.1|(291)-‐gb|AAC76774.1|(143)


YPVDLKLVVK(6)-‐IAGDYIAKK(8)

6 1.39E-‐24 YES YES 16.4 Å

267 gb|AAC74220.1|(378)-‐gb|AAC74220.1|(58)

e14 prophage; isocitrate dehydrogenase, specific for NADP+ |(378)-‐e14 prophage; isocitrate dehydrogenase, specific for

HMGWTEAADLIVKGMEGAINAK(13)-‐AYKGER(3)

1 4.16E-‐07 YES YES 10.2 Å


Yang et al. 6/7/12 2:08 PM 20


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

NADP+ |(58)

268 gb|AAC74220.1|(62)-‐gb|AAC74220.1|(12)

e14 prophage; isocitrate dehydrogenase, specific for NADP+ |(62)-‐e14 prophage; isocitrate dehydrogenase, specific for

NADP+ |(12)

KISWMEIYTGEK(1)-‐VVVPAQGKK(8)

7 7.84E-‐17 YES YES 13.6 Å

269 gb|AAA69289.1|(195)-‐gb|AAA69289.1|(56)

enolase |(195)-‐enolase |(56) MGSEVFHHLAKVLK(11)-‐

DGDKSR(4) 2 3.47E-‐04 YES YES 18.4 Å

270 gb|AAC74370.1|(205)-‐gb|AAC74370.1|(201)

enoyl-‐[acyl-‐carrier-‐protein] reductase, NADH-‐dependent |(205)-‐enoyl-‐[acyl-‐carrier-‐protein] reductase, NADH-‐

dependent |(201)

KMLAHCEAVTPIR(1)-‐TLAASGIKDFR(8)

6 8.61E-‐13 YES YES 9.0 Å

271 gb|AAC74370.1|(205)-‐gb|AAC74370.1|(201)

enoyl-‐[acyl-‐carrier-‐protein] reductase, NADH-‐dependent |(205)-‐enoyl-‐[acyl-‐carrier-‐protein] reductase, NADH-‐

dependent |(201)

KMLAHCEAVTPIRR(1)-‐TLAASGIKDFR(8)

10 8.50E-‐16 YES YES 9.0 Å

272 gb|AAC76757.1|(384)-‐gb|AAC76757.1|(388)

F1 sector of membrane-‐bound ATP synthase, alpha subunit |(384)-‐F1 sector of membrane-‐bound ATP synthase, alpha

subunit |(388)

VGGAAQTKIMK(8)-‐KLSGGIR(1)

5 3.30E-‐13 NO

273 gb|AAC73899.1|(101)-‐gb|AAC73899.1|(27)

Fe-‐binding and storage protein |(101)-‐Fe-‐binding and storage protein |(27)

AVQLGGVALGTTQVINSKTPLK(18)-‐KATVELLNR(1)

2 1.23E-‐11 YES YES 18.3 Å

274 gb|AAC73899.1|(140)-‐gb|AAC73899.1|(27)

Fe-‐binding and storage protein |(140)-‐Fe-‐binding and storage protein |(27)

AIGEAKDDDTADILTAASR(6)-‐KATVELLNR(1)

4 4.07E-‐16 YES YES 11.5 Å

275 gb|AAB18612.1|(255)-‐gb|AAB18612.1|(240)

formamidopyrimidine-‐DNA glycosylase |(255)-‐formamidopyrimidine-‐DNA

glycosylase |(240)

VCGTPIVATKHAQR(10)-‐KGEPCR(1)

1 6 3.77E-‐07 YES YES 11.9 Å

276 gb|AAC74683.1|(430)-‐gb|AAC74683.1|(426)

fumarate hydratase (fumarase C),aerobic Class II |(430)-‐fumarate hydratase (fumarase C),aerobic Class II |(426)

AHKEGLTLK(3)-‐AAEIAKK(6) 3 1.17E-‐13 YES YES 6.1 Å

277 gb|AAC74683.1|(69)-‐gb|AAC74683.1|(127)

fumarate hydratase (fumarase C),aerobic Class II |(69)-‐fumarate hydratase (fumarase C),aerobic Class II |(127)

VNEDLGLLSEEKASAIR(12)-‐KVHPNDDVNK(1)

3 7.00E-‐19 YES YES 8.6 Å

278 gb|AAC74683.1|(8)-‐gb|AAC74683.1|(426)

fumarate hydratase (fumarase C),aerobic Class II |(8)-‐fumarate hydratase (fumarase

C),aerobic Class II |(426)

SEKDSMGAIDVPADK(3)-‐AAEIAKK(6)

4 1.11E-‐08 YES NO

279 gb|AAC77114.1|(550)/gb

|AAA97053.1|(550)-‐ gb|AAC77114.1|(527)/gb

|AAA97053.1|(527)

formamidopyrimidine/5-‐formyluracil/ 5-‐hydroxymethyluracil DNA

glycosylase|(550)/fumarate reductase, flavoprotein subunit |(550)-‐

formamidopyrimidine/5-‐formyluracil/ 5-‐hydroxymethyluracil DNA

glycosylase|(527)/fumarate reductase, flavoprotein subunit |(527)

DDVNFLKHTLAFR(7)-‐KESR(1)

1 1.50E-‐04 YES YES 12.1 Å

280 gb|AAC74319.1|(57)-‐gb|AAC74319.1|(120)

global DNA-‐binding transcriptional dual regulator H-‐NS |(57)-‐global DNA-‐binding transcriptional dual regulator H-‐NS |(120)

KLQQYR(1)-‐TPAVIKK(6) 4 1.94E-‐09 NO

281 gb|AAC74319.1|(57)-‐gb|AAC74319.1|(57)


KLQQYR(1)-‐KLQQYR(1) 2 2.48E-‐07 NO

282 gb|AAC74319.1|(96)-‐gb|AAC74319.1|(136)


AQRPAKYSYVDENGETK(6)-‐SLDDFLIKQ(8)

4 8.13E-‐18 NO

283 gb|AAC74319.1|(96)-‐gb|AAC74319.1|(57)


AQRPAKYSYVDENGETK(6)-‐KLQQYR(1)

7 6.84E-‐15 NO

284 gb|AAC74319.1|(96)-‐gb|AAC74319.1|(57)


PAKYSYVDENGETK(3)-‐KLQQYR(1)

1 5.38E-‐07 NO

285 gb|AAB03004.1|(2)-‐gb|AAB03004.1|(231)

glutamine synthetase |(2)-‐glutamine synthetase |(231)

SAEHVLTMLNEHEVK(1)-‐FNTMTKK(6)

6 1.11E-‐07 NO

286 gb|AAC73774.1|(312)-‐gb|AAC73774.1|(2)

glutamyl-‐tRNA synthetase |(312)-‐glutamyl-‐tRNA synthetase |(2)

EFCKR(4)-‐SEAEAR(1) 2 2.69E-‐06 YES NO

287 gb|AAC74726.1|(101)-‐gb|AAC74726.1|(108)

glutaredoxin-‐4 |(101)-‐glutaredoxin-‐4 |(108)

GELQQLIKETAAK(8)-‐YKSEEPDAE(2)

4 26 4.34E-‐10 YES NO

288 gb|AAB18476.1|(329)-‐gb|AAB18476.1|(430)

glutathione oxidoreductase |(329)-‐glutathione oxidoreductase |(430)

LFNNKPDEHLDYSNIPTVVFSHPPIGTVGLTEPQAR(5)-‐

KDFDNTVAIHPTAAEEFVTMR(1)

2 1.96E-‐10 YES YES 15.5 Å

289 gb|AAA69114.1|(174)-‐gb|AAA69114.1|(205)

glutathione synthetase |(174)-‐glutathione synthetase |(205)

VKEGDPNLGVIAETLTEHGTR(2)-‐

YCMAQNYLPAIKDGDKR(12)

7 7.66E-‐22 NO

290 gb|AAC74849.1|(124)-‐gb|AAC74849.1|(192)

glyceraldehyde-‐3-‐phosphate dehydrogenase A |(124)-‐glyceraldehyde-‐3-‐phosphate dehydrogenase A |(192)

VVMTGPSKDNTPMFVK(8)-‐TVDGPSHKDWR(8)

1 4.74E-‐07 YES YES 18.9 Å

291 gb|AAC74849.1|(213)-‐ glyceraldehyde-‐3-‐phosphate GASQNIIPSSTGAAKAVGK(1 2 1.94E-‐07 YES YES 19.8 Å


Yang et al. 6/7/12 2:08 PM 21


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

gb|AAC74849.1|(192) dehydrogenase A |(213)-‐glyceraldehyde-‐3-‐phosphate dehydrogenase A |(192)

5)-‐TVDGPSHKDWR(8)

292 gb|AAC74849.1|(225)-‐gb|AAC74849.1|(249)


VLPELNGKLTGMAFR(8)-‐LEKAATYEQIK(3)

2 6.37E-‐06 YES YES 15.9 Å

293 gb|AAC74849.1|(331)-‐gb|AAC74849.1|(331)


VLDLIAHISK(10)-‐VLDLIAHISK(10)

1 5.65E-‐15 YES ? 0.0 Å

294 gb|AAB18536.1|(592)-‐gb|AAB18536.1|(584)

glycine-‐tRNA synthetase, beta subunit |(592)-‐glycine-‐tRNA synthetase, beta

subunit |(584)

VSNILAKSDEVLSDR(7)-‐TLDAAAALAAANKR(13)

1 1.17E-‐20 NO

295 gb|AAA97042.1|(117)-‐gb|AAA97042.1|(34)

GroEL protein |(117)-‐GroEL protein |(34) AVAAGMNPMDLKR(12)-‐

VTLGPKGR(6) 2 5 1.90E-‐12 YES YES 16.4 Å

296 gb|AAA97042.1|(122)-‐gb|AAA97042.1|(117)

GroEL protein |(122)-‐GroEL protein |(117) GIDKAVTAAVEELK(4)-‐AVAAGMNPMDLKR(12)

1 2.00E-‐13 YES YES 8.7 Å

297 gb|AAA97042.1|(122)-‐gb|AAA97042.1|(34)

GroEL protein |(122)-‐GroEL protein |(34) GIDKAVTAAVEELK(4)-‐

VTLGPKGR(6) 2 4.95E-‐05 YES YES 17.2 Å

298 gb|AAA97042.1|(272)-‐gb|AAA97042.1|(226)

GroEL protein |(272)-‐GroEL protein |(226) GIVKVAAVK(4)-‐KISNIR(1) 2 3 9.57E-‐09 YES YES 22.2 Å

299 gb|AAA97042.1|(364)-‐gb|AAA97042.1|(371)

GroEL protein |(364)-‐GroEL protein |(371) QQIEEATSDYDREKLQER(14

)-‐VAKLAGGVAVIK(3) 1 1.73E-‐04 YES YES 10.6 Å

300 gb|AAA97042.1|(371)-‐gb|AAA97042.1|(364)

GroEL protein |(371)-‐GroEL protein |(364) VAKLAGGVAVIK(3)-‐

EKLQER(2) 4 3.71E-‐06 YES YES 10.6 Å

301 gb|AAA97042.1|(51)-‐gb|AAA97042.1|(117)

GroEL protein |(51)-‐GroEL protein |(117) SFGAPTITKDGVSVAR(9)-‐AVAAGMNPMDLKR(12)

1 7.39E-‐10 YES YES 17.9 Å

302 gb|AAA97042.1|(51)-‐gb|AAA97042.1|(34)

GroEL protein |(51)-‐GroEL protein |(34) SFGAPTITKDGVSVAR(9)-‐

VTLGPKGR(6) 2 4 2.24E-‐07 YES YES 13.2 Å

303 gb|AAA97042.1|(7)-‐gb|AAA97042.1|(15)

GroEL protein |(7)-‐GroEL protein |(15) DVKFGNDAR(3)-‐VKMLR(2) 4 11 1.23E-‐06 YES YES 14.0 Å

304 gb|AAA97042.1|(7)-‐gb|AAA97042.1|(15)

GroEL protein |(7)-‐GroEL protein |(15) AAKDVKFGNDAR(6)-‐

VKMLR(2) 2 2.21E-‐05 YES YES 14.0 Å

305 gb|AAA97041.1|(1)-‐gb|AAA97041.1|(13)

GroES protein |(1)-‐GroES protein |(13) MNIRPLHDR(1)-‐VIVKR(4) 1 6.54E-‐05 YES NO

306 gb|AAA97041.1|(1)-‐gb|AAA97041.1|(15)

GroES protein |(1)-‐GroES protein |(15) MNIRPLHDR(1)-‐RKEVETK(2)

3 4.96E-‐10 YES NO

307 gb|AAC43098.1|(3)-‐gb|AAC43098.1|(18)

histonelike DNA-‐binding protein HU-‐alpha (NS2) (HU-‐2) |(3)-‐histonelike DNA-‐binding

protein HU-‐alpha (NS2) (HU-‐2) |(18)

MNKTQLIDVIAEK(3)-‐AELSKTQAK(5)

4 6.18E-‐10 YES YES 11.9 Å

308 gb|AAC74782.1|(66)-‐gb|AAC74782.1|(57)

integration host factor (IHF), DNA-‐binding protein, alpha subunit |(66)-‐integration host factor (IHF), DNA-‐binding protein,

alpha subunit |(57)

NPKTGEDIPITAR(3)-‐DKNQRPGR(2)

1 1.37E-‐05 YES YES 23.8 Å

309 gb|AAC73743.1|(675)-‐gb|AAC73743.1|(738)

leucyl-‐tRNA synthetase |(675)-‐leucyl-‐tRNA synthetase |(738)

VWKLVYEHTAK(3)-‐LAKAPTDGEQDR(3)

5 1.48E-‐20 NO

310 gb|AAC76752.1|(51)-‐gb|AAC76752.1|(246)

L-‐glutamine:D-‐fructose-‐6-‐phosphate aminotransferase |(51)-‐L-‐glutamine:D-‐fructose-‐6-‐phosphate aminotransferase

|(246)

LGKVQMLAQAAEEHPLHGGTGIAHTR(3)-‐

QDIESNLQYDAGDKGIYR(14)

22 5.06E-‐15 YES NO

311 gb|AAC73227.1|(339)-‐gb|AAC73227.1|(299)

lipoamide dehydrogenase, E3 component is part of three enzyme complexes |(339)-‐lipoamide dehydrogenase, E3 component is part of three enzyme complexes |(299)

KHYFDPK(1)-‐VDKQLR(3) 1 1.90E-‐04 NO

312 gb|AAC73227.1|(370)-‐gb|AAC73227.1|(299)

lipoamide dehydrogenase, E3 component is part of three enzyme complexes |(370)-‐lipoamide dehydrogenase, E3 component is part of three enzyme complexes |(299)

EKGISYETATFPWAASGR(2)-‐VDKQLR(3)

2 4.32E-‐06 NO

313 gb|AAA97029.1|(156)-‐gb|AAA97029.1|(2)

lysyl-‐tRNA synthetase |(156)-‐lysyl-‐tRNA synthetase |(2)

ALRPLPDKFHGLQDQEVR(8)-‐SEQETR(1)

3 1.86E-‐05 YES NO

314 gb|AAA58038.1|(217)-‐gb|AAA58038.1|(82)

malate dehydrogenase |(217)-‐malate dehydrogenase |(82)

IQNAGTEVVEAKAGGGSATLSMGQAAAR(12)-‐

KPGMDR(1) 5 7.72E-‐11 YES YES 14.8 Å

315 gb|AAA58038.1|(99)-‐gb|AAA58038.1|(134)

malate dehydrogenase |(99)-‐malate dehydrogenase |(134)

SDLFNVNAGIVKNLVQQVAK(12)-‐KAGVYDK(1)

1 9.27E-‐07 YES YES 11.4 Å

316 gb|AAA69143.1|(312)-‐gb|AAA69143.1|(379)

malate synthase |(312)-‐malate synthase |(379)

KLNDDR(1)-‐VQKNSR(3) 4 4.73E-‐09 YES YES 23.6 Å

317 gb|AAA58215.1|(54)-‐gb|AAA58215.1|(2)

maltodextrin phosphorylase |(54)-‐maltodextrin phosphorylase |(2)

AQPFAKPVANQR(6)-‐SQPIFNDK(1)

6 5.93E-‐15 YES YES 9.0 Å

318 gb|AAB03063.1|(109)-‐gb|AAB03063.1|(72)

matches PS00017: ATP_GTP_A; similar to Pasteurella haemolytica hypoth. protein ORF1; heat shock induced |(109)-‐matches

PS00017: ATP_GTP_A; similar to Pasteurella haemolytica hypoth. protein

ORF1; heat shock induced |(72)

DLTDAAVKMVR(8)-‐LAKLANAPFIK(3)

2 1.68E-‐18 YES YES 15.0 Å

319 gb|AAC75175.1|(466)-‐gb|AAC75175.1|(403)

methionyl-‐tRNA synthetase |(466)-‐methionyl-‐tRNA synthetase |(403)

YVDEQAPWVVAKQEGR(12)-‐NAGFINKR(7)

1 1 2.32E-‐08 YES YES 13.7 Å

320 gb|AAC74920.1|(212)-‐gb|AAC74920.1|(202)

multifunctional 2-‐keto-‐3-‐deoxygluconate 6-‐phosphate aldolase and 2-‐keto-‐4-‐

EAVEGAKL(7)-‐ITKLAR(3) 1 9.69E-‐06 YES YES 18.2 Å


Yang et al. 6/7/12 2:08 PM 22


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

hydroxyglutarate aldolase and oxaloacetate decarboxylase |(212)-‐

multifunctional 2-‐keto-‐3-‐deoxygluconate 6-‐phosphate aldolase and 2-‐keto-‐4-‐

hydroxyglutarate aldolase and oxaloacetate decarboxylase |(202)

321 gb|AAB03056.1|(226)-‐gb|AAB03056.1|(233)

ORF_f248 |(226)-‐ORF_f248 |(233) DTQQLLKETR(7)-‐QMTKHLR(4)

1 8 9.98E-‐14 YES YES 10.7 Å

322 gb|AAA69076.1|(102)-‐gb|AAA69076.1|(2)

ORF_f441; third start codon |(102)-‐ORF_f441; third start codon |(2)

LGQDAAPEKLGVDR(9)-‐SEISR(1)

13 7.23E-‐09 YES YES 18.7 Å

323 gb|AAA57918.1|(356)-‐gb|AAC73989.1|(293)

ORF_f746 |(356)-‐pyruvate formate lyase I |(293)

TLVTKNSFR(5)-‐DLKAGK(3) 1 3.98E-‐05 YES YES 13.2 Å

324 gb|AAC73539.1|(1)-‐gb|AAC73539.1|(81)

peptidyl-‐prolyl cis/trans isomerase (trigger factor) |(1)-‐peptidyl-‐prolyl

cis/trans isomerase (trigger factor) |(81)

MQVSVETTQGLGR(1)-‐NFIDAIIKEK(8)

2 2.27E-‐12 YES NO

325 gb|AAC73539.1|(279)-‐gb|AAC73539.1|(272)


cis/trans isomerase (trigger factor) |(272) ELKSAIR(3)-‐KNMER(1) 1 25 8.30E-‐08 YES YES 10.4 Å

326 gb|AAC73539.1|(361)-‐gb|AAC73539.1|(392)


cis/trans isomerase (trigger factor) |(392)

TNELKADEER(5)-‐NKELMDNMR(2)

2 9.59E-‐15 YES YES 17.0 Å

327 gb|AAC75299.1|(140)-‐gb|AAC75299.1|(195)

periplasmic glycerophosphodiester phosphodiesterase |(140)-‐periplasmic

glycerophosphodiester phosphodiesterase |(195)

FPMGKSDFR(5)-‐KYGYTGK(1)

6 3.87E-‐14 YES YES 19.8 Å

328 gb|AAA58200.1|(91)-‐gb|AAA58200.1|(68)

phosphoenolpyruvate carboxykinase |(91)-‐phosphoenolpyruvate

carboxykinase |(68)

GKNDNKPLSPETWQHLK(2)-‐SPKDK(3)

1 1.26E-‐04 YES YES 12.1 Å

329 gb|AAC73782.1|(273)-‐gb|AAC73782.1|(2)

phosphoglucomutase |(273)-‐phosphoglucomutase |(2)

FMHLDKDGAIR(6)-‐AIHNR(1)

3 3.63E-‐12 NO

330 gb|AAA69093.1|(27)-‐gb|AAA69093.1|(84)

phosphoglycerate kinase |(27)-‐phosphoglycerate kinase |(84)

ADLNVPVKDGKVTSDAR(8)-‐DKLSNPVR(2)

2 1.56E-‐08 YES YES 18.9 Å

331 gb|AAA69093.1|(30)-‐gb|AAA69093.1|(84)


DGKVTSDAR(3)-‐DKLSNPVR(2)

2 5.22E-‐07 YES YES 14.0 Å

332 gb|AAA69093.1|(49)-‐gb|AAA69093.1|(84)


ASLPTIELALKQGAK(11)-‐DKLSNPVR(2)

2 1.95E-‐08 YES YES 14.5 Å

333 gb|AAC73842.1|(100)-‐gb|AAC73842.1|(113)

phosphoglyceromutase 1 |(100)-‐phosphoglyceromutase 1 |(113)

HYGALQGLNKAETAEK(10)-‐YGDEQVKQWR(7)

5 9.31E-‐13 YES YES 9.9 Å

334 gb|AAC73842.1|(146)-‐gb|AAC73842.1|(86)

phosphoglyceromutase 1 |(146)-‐phosphoglyceromutase 1 |(86)

LSEKELPLTESLALTIDR(4)-‐SWKLNER(3)

1 1.31E-‐05 YES YES 13.4 Å

335 ref|YP_026170.1|(853)-‐ref|YP_026170.1|(622)

phosphoribosylformyl-‐glycineamide synthetase |(853)-‐phosphoribosylformyl-‐

glycineamide synthetase |(622)

QLGDKPADVR(5)-‐AKGDALAR(2)

1 3.45E-‐10 NO

336 gb|AAC75738.1|(38)-‐gb|AAC75738.1|(1)

pleiotropic regulatory protein for carbon source metabolism |(38)-‐pleiotropic regulatory protein for carbon source

metabolism |(1)

IGVNAPKEVSVHR(7)-‐MLILTR(1)

2 6.06E-‐08 YES YES 23.6 Å

337 gb|AAA57967.1|(286)-‐ref|NP_417633.4|(1)

polynucleotide phosphorylase |(286)-‐polynucleotide

phosphorylase/polyadenylase |(1) ITDKQER(4)-‐MLNPIVR(1) 2 29 4.17E-‐08 YES YES 15.3 Å

338 ref|NP_418204.2|(35)-‐ref|NP_418204.2|(35)

predicted cytoplasmic sugar-‐binding protein |(35)-‐predicted cytoplasmic

sugar-‐binding protein |(35)

LGHTDTLVVCDAGLPIPKSTTR(18)-‐

LGHTDTLVVCDAGLPIPKSTTR(18)

1 2.15E-‐09 NO

339 ref|NP_415943.4|(13)-‐ref|NP_415943.4|(1)

predicted protein |(13)-‐predicted protein |(1)

LKNENPR(2)-‐MFPEYR(1) 3 5.47E-‐10 NO

340 ref|NP_415943.4|(25)-‐ref|NP_415943.4|(36)

predicted protein |(25)-‐predicted protein |(36)

FMSLFDKHNK(7)-‐KEGSDGR(1)

2 8.45E-‐13 NO

341 ref|NP_415193.2|(331)-‐ref|NP_415193.2|(281)

predicted protein with nucleoside triphosphate hydrolase domain |(331)-‐

predicted protein with nucleoside triphosphate hydrolase domain |(281)

KAALAAER(1)-‐NTKSGLR(3) 1 13 3.47E-‐07 NO

342 gb|AAA57971.1|(131)-‐gb|AAA57971.1|(125)

protein chain initiation factor 2 |(131)-‐protein chain initiation factor 2 |(125)

EAQQKAER(5)-‐EAEESAKR(7)

9 3.23E-‐08 NO

343 gb|AAA57971.1|(131)-‐gb|AAA57971.1|(149)


EAQQKAER(5)-‐EAAEQAKR(7)

8 2.63E-‐10 NO

344 gb|AAA57971.1|(149)-‐gb|AAA57971.1|(149)


EAAEQAKR(7)-‐EAAEQAKR(7)

1 10 2.97E-‐09 NO

345 gb|AAA57971.1|(184)-‐gb|AAA57971.1|(149)


EQEAAELKR(8)-‐EAAEQAKR(7)

1 1.54E-‐06 NO

346 gb|AAA57971.1|(186)-‐gb|AAA57971.1|(194)


RKAEEEAR(2)-‐KLEEEAR(1) 4 1.34E-‐09 NO

347 gb|AAA57971.1|(186)-‐gb|AAA57971.1|(194)


KAEEEARR(1)-‐KLEEEAR(1) 10 1.52E-‐15 NO

348 gb|AAA57971.1|(194)-‐gb|AAA57971.1|(186)


KLEEEARR(1)-‐KAEEEARR(1) 1 2.58E-‐09 NO

349 gb|AAA57971.1|(194)-‐ protein chain initiation factor 2 |(194)-‐ KLEEEAR(1)-‐KAEEEAR(1) 16 4.84E-‐11 NO


Yang et al. 6/7/12 2:08 PM 23


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

gb|AAA57971.1|(186) protein chain initiation factor 2 |(186)

350 gb|AAC74788.1|(97)-‐gb|AAC74788.1|(125)

protein chain initiation factor IF-‐3 |(97)-‐protein chain initiation factor IF-‐3 |(125)

EIKFRPGTDEGDYQVK(3)-‐AKITLR(2)

3 7.18E-‐15 YES NO

351 gb|AAC73575.1|(238)-‐gb|AAC73575.1|(362)

protein refolding molecular co-‐chaperone Hsp90, Hsp70-‐dependent; heat-‐shock

protein; ATPase |(238)-‐protein refolding molecular co-‐chaperone Hsp90, Hsp70-‐dependent; heat-‐shock protein; ATPase

|(362)

NKSEITDEEYKEFYK(2)-‐VLQMLEKLAK(7)

3 1.71E-‐06 YES NO

352 gb|AAC73575.1|(516)-‐gb|AAC73575.1|(524)

protein refolding molecular co-‐chaperone Hsp90, Hsp70-‐dependent; heat-‐shock

protein; ATPase |(516)-‐protein refolding molecular co-‐chaperone Hsp90, Hsp70-‐dependent; heat-‐shock protein; ATPase

|(524)

VKALLGER(2)-‐VKDVR(2) 2 2.73E-‐06 YES NO

353 gb|AAA97280.1|(57)-‐gb|AAA97280.1|(84)

purine-‐nucleoside phosphorylase |(57)-‐purine-‐nucleoside phosphorylase |(84)

KISVMGHGMGIPSCSIYTK(1)-‐ELITDFGVKK(9)

2 9.58E-‐05 YES YES 10.7 Å

354 gb|AAC73225.1|(368)-‐gb|AAC73225.1|(305)

pyruvate dehydrogenase, decarboxylase component E1, thiamin-‐binding |(368)-‐pyruvate dehydrogenase, decarboxylase component E1, thiamin-‐binding |(305)

GGHDPKK(6)-‐KDTSGK(1) 2 9.03E-‐12 YES YES 20.5 Å

355 gb|AAC73225.1|(375)-‐gb|AAC73225.1|(381)

pyruvate dehydrogenase, decarboxylase component E1, thiamin-‐binding |(375)-‐pyruvate dehydrogenase, decarboxylase component E1, thiamin-‐binding |(381)

IYAAFKK(6)-‐AQETKGK(5) 1 4 1.64E-‐09 YES YES 12.3 Å

356 gb|AAC73988.1|(97)-‐gb|AAC73988.1|(2)

pyruvate formate lyase activating enzyme 1 |(97)-‐pyruvate formate lyase activating

enzyme 1 |(2)

KEGIHTCLDTNGFVR(1)-‐SVIGR(1)

2 7.19E-‐15 YES YES 18.9 Å

357 gb|AAC73989.1|(117)-‐gb|AAC73989.1|(162)

pyruvate formate lyase I |(117)-‐pyruvate formate lyase I |(162)

ALIPFGGIKMIEGSCK(9)-‐KSGVLTGLPDAYGR(1)

7 1.31E-‐17 YES YES 17.6 Å

358 gb|AAC73989.1|(162)-‐gb|AAC73989.1|(616)


KSGVLTGLPDAYGR(1)-‐KTGNTPDGR(1)

4 1.49E-‐15 YES YES 9.4 Å

359 gb|AAC73989.1|(195)-‐gb|AAC73989.1|(2)


VALYGIDYLMKDK(11)-‐SELNEK(1)

3 5 9.10E-‐13 YES YES 10.6 Å

360 gb|AAC73989.1|(195)-‐gb|AAC73989.1|(235)


VALYGIDYLMKDK(11)-‐ALGQMKEMAAK(6)

1 9.63E-‐06 YES YES 14.1 Å

361 gb|AAC73989.1|(454)-‐gb|AAC73989.1|(616)


TMLYAINGGVDEKLK(13)-‐KTGNTPDGR(1)

3 3.66E-‐10 YES YES 7.4 Å

362 gb|AAC73989.1|(725)-‐gb|AAC73989.1|(591)


EMLLDAMENPEKYPQLTIR(12)-‐IQKLHTYR(3)

1 4.57E-‐10 YES NO

363 gb|AAC74746.1|(68)-‐gb|AAC74746.1|(76)

pyruvate kinase I |(68)-‐pyruvate kinase I |(76)

TAAILLDTKGPEIR(9)-‐TMKLEGGNDVSLK(3)

1 4.35E-‐05 YES YES 20.7 Å

364 gb|AAC73283.1|(1)-‐gb|AAC73283.1|(7)

ribosome recycling factor |(1)-‐ribosome recycling factor |(7)

MISDIR(1)-‐KDAEVR(1) 1 17 4.69E-‐07 YES YES 10.4 Å

365 gb|AAC73283.1|(138)-‐gb|AAC73283.1|(1)


RDANDKVK(6)-‐MISDIR(1) 2 2.03E-‐08 YES YES 9.3 Å

366 gb|AAC73283.1|(138)-‐gb|AAC73283.1|(1)


DANDKVK(5)-‐MISDIR(1) 9 2.70E-‐09 YES YES 9.3 Å

367 gb|AAC73283.1|(138)-‐gb|AAC73283.1|(7)


DANDKVK(5)-‐KDAEVR(1) 8 3.67E-‐06 YES YES 12.9 Å

368 gb|AAC73283.1|(15)-‐gb|AAC73283.1|(26)


MDKCVEAFK(3)-‐TQISKIR(5) 1 7.67E-‐05 YES YES 17.2 Å

369 gb|AAC73283.1|(15)-‐gb|AAC73283.1|(7)


MDKCVEAFK(3)-‐KDAEVR(1)

2 4.57E-‐10 YES YES 12.5 Å

370 ref|NP_415894.4|(96)-‐ref|NP_415894.4|(104)

stress-‐induced protein, ATP-‐binding protein |(96)-‐stress-‐induced protein, ATP-‐

binding protein |(104)

VHVHVEEGSPKDR(11)-‐ILELAKK(6)

2 5.36E-‐10 NO

371 gb|AAC73822.1|(215)-‐gb|AAC73822.1|(1)

succinyl-‐CoA synthetase, beta subunit |(215)-‐succinyl-‐CoA synthetase, beta

subunit |(1)

QGDLICLDGKLGADGNALFR(10)-‐MNLHEYQAK(1)

4 9.28E-‐11 YES YES 11.1 Å

372 gb|AAC73822.1|(295)-‐gb|AAC73822.1|(360)

succinyl-‐CoA synthetase, beta subunit |(295)-‐succinyl-‐CoA synthetase, beta

subunit |(360)

LHGGEPANFLDVGGGATKER(18)-‐KLADSGLNIIAAK(1)

5 2.78E-‐30 YES YES 14.9 Å

373 gb|AAC74789.1|(286)-‐gb|AAC74789.1|(570)

threonyl-‐tRNA synthetase |(286)-‐threonyl-‐tRNA synthetase |(570)

LKEYQYQEVK(2)-‐VKADLR(2)

17 3.71E-‐08 YES NO

374 gb|AAC74789.1|(614)-‐gb|AAC74789.1|(638)


GKDLGSMDVNEVIEK(2)-‐SLKQLEE(3)

3 2.34E-‐05 YES YES 13.4 Å

375 gb|AAC74789.1|(638)-‐gb|AAC74789.1|(241)


SLKQLEE(3)-‐LEEAAKR(6) 3 5.87E-‐06 YES YES 21.3 Å

376 gb|AAA97278.1|(299)-‐gb|AAA97278.1|(291)

thymidine phosphorylase |(299)-‐thymidine phosphorylase |(291)

AKLQAVLDNGK(2)-‐LAKDDAEAR(3)

3 1.08E-‐07 NO

377 gb|AAA69102.1|(46)-‐gb|AAA69102.1|(226)

transketolase |(46)-‐transketolase |(226) DFLKHNPQNPSWADR(4)-‐

DIDGHDAASIKR(11) 4 6 9.58E-‐16 YES YES 15.2 Å

378 gb|AAA69102.1|(46)-‐gb|AAA69102.1|(226)

transketolase |(46)-‐transketolase |(226) DFLKHNPQNPSWADRDR(4

)-‐DIDGHDAASIKR(11) 2 1.82E-‐07 YES YES 15.2 Å

379 ref|YP_026188.1|(591)-‐gb|AAA69102.1|(453)

transketolase 1, thiamin-‐binding |(591)-‐transketolase |(453)

VVSMPSTDAFDKQDAAYR(12)-‐MAALMKQR(6)

1 1.98E-‐06 YES YES 16.4 Å


Yang et al. 6/7/12 2:08 PM 24


#Spec exp_2

Best E-‐Value

Struct. in PDB?

Cα-‐Cα <24 Å?

Note

380 gb|AAC73970.1|(39)-‐gb|AAC73970.1|(42)

translation initiation factor IF-‐1 |(39)-‐translation initiation factor IF-‐1 |(42)

VELENGHVVTAHISGKMR(16)-‐KNYIR(1)

3 5 1.31E-‐17 NO

381 gb|AAC74709.1|(85)-‐gb|AAC74709.1|(90)

tyrosyl-‐tRNA synthetase |(85)-‐tyrosyl-‐tRNA synthetase |(90)

FQQAGHKPVALVGGATGLIGDPSFKAAER(25)-‐

KLNTEETVQEWVDK(1) 12 6.70E-‐16 YES YES 11.2 Å

382 ref|NP_417046.1|(293)-‐ ref|NP_417046.1|(375)

serine hydroxymethyltransferase |(293)-‐ serine hydroxymethyltransferase |(375)

TYQQQVAKNAK(8)-‐GFKEAEAK(3)

4 6.45E-‐17 YES YES 8.1 Å

383 ref|NP_417046.1|(331)-‐ ref|NP_417046.1|(62)


NLTGKEADAALGR(5)-‐YAEGYPGKR(8)

4 25 4.28E-‐22 YES YES 11.4 Å

384 ref|NP_417046.1|(346)-‐ ref|NP_417046.1|(331)


ANITVNKNSVPNDPK(7)-‐NLTGKEADAALGR(5)

3 3.93E-‐10 YES YES 5.5 Å

385 ref|NP_417046.1|(346)-‐ ref|NP_417046.1|(62)


ANITVNKNSVPNDPK(7)-‐YAEGYPGKR(8)

2 4.67E-‐09 YES YES 12.4 Å

386 ref|NP_416993.2|(116)-‐ref|NP_416993.2|(148)

uracil phosphoribosyltransferase |(116)-‐uracil phosphoribosyltransferase |(148)

NEETLEPVPYFQKLVSNIDER(13)-‐KAGCSSIK(1)

1 9.67E-‐06 YES YES 13.5 Å

387 ref|NP_416993.2|(26)-‐ref|NP_416993.2|(14)


EQDISTKR(7)-‐HKLGLMR(2) 1 3.59E-‐09 YES YES 14.2 Å

388 ref|NP_416993.2|(26)-‐ref|NP_416993.2|(26)


EQDISTKR(7)-‐EQDISTKR(7) 3 1.57E-‐06 YES ? 0.0 Å

389 ref|NP_416993.2|(67)-‐ref|NP_416993.2|(70)


VTIEGWNGPVEIDQIKGK(16)-‐KITVVPILR(1)

2 7.26E-‐17 YES YES 8.3 Å

390 gb|AAC73282.1|(10)-‐gb|AAC73282.1|(2)

uridylate kinase |(10)-‐uridylate kinase |(2) PVYKR(4)-‐ATNAK(1) 1 1.76E-‐07 YES NO

391 gb|AAC73282.1|(68)-‐gb|AAC73282.1|(212)

uridylate kinase |(68)-‐uridylate kinase |(212)

GAGLAKAGMNR(6)-‐DHKLPIR(3)

1 4.51E-‐07 YES YES 17.8 Å

392 gb|AAA97155.1|(909)-‐gb|AAA97155.1|(926)

valyl-‐tRNA synthetase |(909)-‐valyl-‐tRNA synthetase |(926)

IENKLANEGFVAR(4)-‐APEAVIAKER(8)

5 17 4.22E-‐16 NO

393 gb|AAA57977.1|(392)-‐gb|AAA57977.1|(412)

yhbF; phosphoglucosamine mutase |(392)-‐yhbF; phosphoglucosamine mutase

|(412)

YTAGSGDPLEHESVKAVTAEVEAALGNR(15)-‐KSGTEPLIR(1)

10 2.08E-‐13 NO

394 gb|AAA57977.1|(412)-‐gb|AAA57977.1|(305)

yhbF; phosphoglucosamine mutase |(412)-‐yhbF; phosphoglucosamine mutase

|(305) KSGTEPLIR(1)-‐AKVGDR(2) 1 2.16E-‐05 NO

Summary

Out of the 394 cross-‐links, 237 correspond to proteins or protein complexes with structural models deposited in PDB. Consistent with the structural models are

179 (75.5%) cross-‐links.

From Exp_1 From Exp_2 Overlap Union

235 cross-‐links (656 spectra) 208 cross-‐links (1372 spectra) 49 cross-‐links (923 spectra) 394 cross-‐links (2028 spectra)

51 inter-‐protein (91 spectra) 75 inter-‐protein (195 spectra) 2 inter-‐protein (68 spectra) 124 inter-‐protein (286 spectra)

184 intra-‐protein (565 spectra) 133 intra-‐protein (1177 spectra) 47 intra-‐protein (855 spectra) 270 intra-‐protein (1742 spectra)


Yang et al. 6/7/12 2:08 PM 25

Supplementary Table 12. Inter-linked peptides identified from a C. elegans lysate. Filtering criteria: mass accuracy 10 ppm, FDR < 5%, E-value < 0.01.

No. Protein1-‐Protein2 CGC name Peptide1-‐Peptide2 #Spec-‐total

#Spec-‐exp1

#Spec-‐exp2

Best E-‐value

inter-‐molecular 1 C04F12.4(102)-‐C52B9.9(135) [RPL-‐14]-‐[MEC-‐18] AKLTDFER(2)-‐KAQMR(1) 1 1 1.48E-‐06 2 C06A8.2(387)-‐C53H9.1(108) [C06A8.2]-‐[RPL-‐27] ANAKLVEVK(4)-‐KALVEVK(1) 1 1 7.05E-‐03 3 C09D4.5(152)-‐T08A11.2(673) [RPL-‐19]-‐[T08A11.2] AKQLADQAQAR(2)-‐KSWQAR(1) 1 1 1.10E-‐03

4 C16A3.9(106)-‐C41G7.3(281)/R10D12.17(232)

[RPS-‐13]-‐[C41G7.3/SRW-‐14] KDIDSK(1)-‐KHIER(1) 2 2 7.30E-‐07

5 C53B4.5(1)/Y41E3.2(1)-‐M163.4(731)

[COL-‐119/DPY-‐4]-‐[GFI-‐3] DIDSKIKAYR(1)-‐LLGKDPR(4) 1 1 4.37E-‐03

6 F10B5.1(81)-‐F55C5.7a(383) [RPL-‐10]-‐[RSKD-‐1] NCGKDGFHLR(4)-‐KYVMK(1) 1 1 9.64E-‐04 7 F11C3.3(1144)-‐C18E9.7(832) [UNC-‐54]-‐[C18E9.7] AKSDLQR(2)-‐KSADR(1) 1 1 6.25E-‐07

8

F23D12.10(510)-‐F42G2.5(3)/Y57G11A.1a(492)/Y57G11A.1b(777)

[F23D12.10]-‐[F42G2.5/TAG-‐273] KNWLR(1)-‐MPEKK(4) 1 1 8.65E-‐08

9 T14G12.2(206)-‐Y82E9BR.18(802) [AEX-‐4]-‐[Y82E9BR.18] LQEPKLNR(5)-‐IQEKR(4) 1 1 6.02E-‐04 10 Y111B2A.16(0)-‐F32D8.2(488) [TAF-‐7.2]-‐[F32D8.2] MSIYPGVR(1)-‐THKCVR(3) 1 1 9.89E-‐04 intra-‐molecular

11

C04F6.1(1572)-‐C04F6.1(1565)/F59D8.2(1572)-‐F59D8.2(1565) VIT-‐5/VIT-‐4 FLKEAR(3)-‐HSKNAR(3) 1 1 2.74E-‐07

12 C14B9.7(77)-‐C14B9.7(86) RPL-‐21 GAVGIIVNKR(9)-‐GNILPKR(6) 1 1 3.59E-‐07 13 C53H9.1(116)-‐C53H9.1(108) RPL-‐27 SKFEER(2)-‐KALVEVK(1) 1 1 6.09E-‐07 14 F11C3.3(1144)-‐F11C3.3(1139) UNC-‐54 AKSDLQR(2)-‐SKADR(2) 1 1 5.00E-‐09 15 F11C3.3(979)-‐F11C3.3(971) UNC-‐54 QSKDHQIR(3)-‐KAESEK(1) 1 1 3.27E-‐04 16 F46E10.10a(239)-‐F46E10.10a(237) MDH-‐1 KLSSAMSAAK(1)-‐GGVIIEKR(7) 2 2 1.19E-‐09 17 F52D10.3a(12)-‐F52D10.3a(76) FTT-‐2 AKLAEQAER(2)-‐KQQMAK(1) 4 1 3 2.30E-‐07 18 F53G12.10(51)-‐F53G12.10(47) RPL-‐7 AEKYVQEYR(3)-‐TQYFKR(5) 1 1 6.53E-‐07 19 F53G12.10(99)-‐F53G12.10(102) RPL-‐7 GINQLHPKPR(8)-‐KALQILR(1) 3 3 1.25E-‐11 20 F55D10.2(60)-‐F55D10.2(53) RPL-‐25.1 TSKMDHFR(3)-‐KSAPK(1) 1 1 3.17E-‐06 21 K02F2.2(323)-‐K02F2.2(332) AHCY-‐1 DTIKPQVDR(4)-‐YTLKNGR(4) 1 1 3.66E-‐07 22 K10B3.7(256)-‐K10B3.7(265) GDP-‐3 LEKPASLDDIK(3)-‐KVIK(1) 1 1 4.48E-‐04 23 K12F2.1(1850)-‐K12F2.1(1854) MYO-‐3 HQDTEKNWR(6)-‐KAER(1) 1 1 3.46E-‐06 24 M01F1.2(100)-‐M01F1.2(78) RPL-‐16 GNEALKNLR(6)-‐APGKIFWR(4) 4 4 3.04E-‐15 25 M117.2(12)-‐M117.2(76) PAR-‐5 AKLAEQAER(2)-‐KQQLAK(1) 5 2 3 3.86E-‐11 26 M117.2(144)-‐M117.2(123) PAR-‐5 AAVVEKSQK(6)-‐MKGDYYR(2) 2 2 1.13E-‐11 27 R13A5.8(56)-‐R13A5.8(49) RPL-‐9 KWFGVR(1)-‐IGKSTLR(3) 1 1 1.01E-‐05 28 R13A5.8(62)-‐R13A5.8(49) RPL-‐9 KELAAIR(1)-‐IGKSTLR(3) 9 3 6 7.73E-‐13 29 R13A5.8(62)-‐R13A5.8(56) RPL-‐9 KELAAIR(1)-‐KWFGVR(1) 3 3 3.26E-‐11 30 T05E11.1(197)-‐T05E11.1(206) RPS-‐5 KKDELER(1)-‐VAKSNR(3) 2 2 2.03E-‐08 31 T05E11.1(198)-‐T05E11.1(206) RPS-‐5 KDELER(1)-‐VAKSNR(3) 2 1 1 5.66E-‐07 32 T05F1.3(125)-‐T05F1.3(129) RPS-‐19 ILSKQGR(4)-‐KDLDR(1) 3 1 2 2.72E-‐06 33 Y105E8B.1a(127)-‐Y105E8B.1a(127) LEV-‐11 KVMENR(1)-‐KVMENR(1) 2 2 3.08E-‐09 34 Y105E8B.1a(232)-‐Y105E8B.1a(127) LEV-‐11 LKEAETR(2)-‐KVMENR(1) 1 1 1.30E-‐07 35 Y105E8B.1a(34)-‐Y105E8B.1a(27) LEV-‐11 QITEKLER(5)-‐ADAAEEKVR(7) 2 2 2.30E-‐11 36 Y24D9A.8a(312)-‐Y24D9A.8a(305) Y24D9A.8 TLEKLIEAK(4)-‐NFAKDAR(4) 2 2 1.31E-‐10 37 Y38A10A.5(368)-‐Y38A10A.5(361) CRT-‐1 KKAEEEK(1)-‐KAEEEAR(1) 1 1 1.40E-‐07 38 Y38H6C.1a(34)-‐Y38H6C.1a(32) DCT-‐16 KDDEPER(1)-‐IAATYKK(6) 2 1 1 3.04E-‐08 39 ZC434.2(95)-‐ZC434.2(101) RPS-‐7 DILILAKR(7)-‐ILPKPQR(4) 2 2 3.33E-‐11


Yang et al. 6/7/12 2:08 PM 26

Supplementary Table 13. Cross-linking analysis of the CNGP complex using DSS

No. Inter-linked sites #total spec (#pair)

Best E-value

Cα-Cα Distance (Å)


Compatible with

structure?

1 Cbf5_K161-Cbf5_K114 7 (1) 1.34E-‐08 23.99 High Yes

2 Cbf5_K161-‐Gar1_K115 19 (2) 5.66E-‐12 18.85 High Yes

3 Cbf5_K161-‐Gar1_K59 4 (1) 7.67E-‐14 21.95 High Yes


5 Cbf5_K180-‐Cbf5_K137 2 (1) 7.99E-‐17 13.58 High Yes

6 Cbf5_K180-Nop10_K18 9 (1) 2.42E-‐15 12.98 High Yes

7 Cbf5_K180-Nop10_K19 5 (1) 1.92E-‐12 15.46 High Yes



10 Cbf5_K9-‐Cbf5_K50 7 (2) 2.94E-‐13 16.57 Low Yes

11 Gar1_K77-‐Cbf5_K161 5 (1) 4.92E-‐07 18.89 High Yes

12 Gar1_K77-Gar1_K115 22 (2) 9.61E-‐16 11.01 High Yes

13 Nhp2_K131-‐Nhp2_K133 2 (1) 2.79E-‐05 5.86 High Yes



16 Nop10_K1-Nop10_K19 6 (2) 1.11E-‐20 11.48 High Yes

17 Nop10_K28-‐Nop10_K49 2 (1) 5.55E-‐04 n/a Low 18 Nop10_K40-Nhp2_K61 15 (2) 1.26E-‐11 12.3 High Yes

19 Nop10_K40-Nhp2_K65 4 (1) 3.84E-‐10 13.96 High Yes

20 Nop10_K40-Nhp2_K69 3 (1) 1.36E-‐03 17.18 High Yes

21 Nop10_K40-‐Nop10_K49 2 (1) 4.80E-‐05 n/a Low 22 Gar1_K115-Gar1_K104 1 (1) 3.42E-‐05 24.75 Mid No

Filtering criteria: 10 ppm mass accuracy, FDR < 5%, E-value < 0.01, and ≥ 2 spectral copies (except Gar1_K115-‐Gar1_K104). Shown in bold are 12 cross-linked lysine pairs observed previously in the BS3 experiments (Supplementary Table 8). The Gar1_K115-‐Gar1_K104 cross-link is listed because it was observed nine times with BS3. n/a, not available (either or both residues are invisible in the CNGP structure).


Yang et al. 6/7/12 2:08 PM 27

Supplementary Table 14. Cross-linking analysis of the CNGP complex using EDC


Best E-value


Manual eval. of spec

qual.

Compatible with

structure?

1 Cbf5_K267-‐Cbf5_D273 9 (2) 1.27E-‐07 19.11 High No

2 Gar1_K108-‐Gar1_D96 2 (1) 4.91E-‐06 19.06 High No

3 Gar1_K115-‐Gar1_D107 11 (1) 9.44E-‐07 20.46 High No

4 Gar1_K115-‐Gar1_E105 4 (1) 1.49E-‐04 24.12 High No

5 Gar1_K72-‐Gar1_E80 3 (1) 5.21E-‐09 21.27 Mid No

6 Nhp2_K143-‐Nhp2_E152 6 (1) 2.36E-‐05 14.17 High No

7 Nhp2_K37-‐Nhp2_D18 5 (1) 5.40E-‐10 n/a Mid

Filtering criteria: 10 ppm mass accuracy, FDR < 5%, E-value < 0.01, and ≥ 2 spectral copies. The maximum Cα-Cα distance of an EDC cross-linked K-D or K-E pair is expected to be 9.7 Å or 11 Å, respectively. This is calculated based on the projected distances of C-C, C-O, C-N (amide), and C-N (amine) bonds, at 1.26, 1.17, 1.15, and 1.19 Å, respectively.


Yang et al. 6/7/12 2:08 PM 28

Supplementary Table 15. Cross-linking analysis of the CNGP complex using AMAS


Best E-value

Cα-Cα Distance

(Å)


Compatible with

structure?

1 Cbf5_K114-‐Cbf5_C339 4 (1) 8.67E-‐08 13.07 High Yes

2 Cbf5_K358-‐Cbf5_C339 6 (2) 1.63E-‐06 n/a High

3 Cbf5_K370-‐Cbf5_C339 3 (1) 3.66E-‐07 n/a Mid


5 Gar1_K77-‐Gar1_C94 7 (1) 6.84E-‐11 6.82 High Yes

Filtering criteria: 10 ppm mass accuracy, FDR < 5%, E-value < 0.01, and ≥ 2 spectral copies. The Cα-Cα distance of a K-C pair cross-linked by AMAS is expected to be less than 16.5 Å. This is calculated based on the projected distances of C-C, C-O, C-N (amide), C-N (amine), C-S, and S-S bonds, at 1.26, 1.17, 1.15, 1.19, 1.44, and 1.50 Å, respectively.


Yang et al. 6/7/12 2:08 PM 29

Supplementary Table 16. Cross-linking analysis of the CNGP complex using Sulfo-GMBS


Best E-value



Compatible with

structure?

1 Cbf5_K114-Cbf5_C339 7 (1) 2.52E-‐07 13.07 High Yes

2 Cbf5_K137-‐Cbf5_C125 6 (1) 2.40E-‐07 17.73 High Yes

3 Cbf5_K161-‐Cbf5_C125 3 (1) 9.62E-‐08 26.76 High No



6 Cbf5_K358-Cbf5_C339 15 (3) 4.08E-‐11 n/a High








14 Cbf5_K50-‐Gar1_C94 3 (1) 3.11E-‐07 61.68 High No

15 Gar1_K59-‐Cbf5_C190 2 (1) 1.44E-‐06 24.88 Mid No

16 Gar1_K59-‐Gar1_C94 3 (1) 3.73E-‐07 21.44 High No

17 Gar1_K77-Gar1_C94 14 (2) 1.66E-‐12 6.82 High Yes

18 Nop10_K19-‐Cbf5_C125 2 (1) 7.48E-‐08 17.13 High Yes

19 Nop10_K40-‐Cbf5_C125 4 (1) 7.64E-‐07 32.61 High No

20 Nop10_K40-‐Gar1_C94 3 (1) 9.30E-‐10 56.17 High No

Filtering criteria: 10 ppm mass accuracy, FDR < 5%, E-value < 0.01, and ≥ 2 spectral copies. Cross-links observed with AMAS (see Supplementary Table 15) are highlighted in yellow. The maximum Cα-Cα distance of a K-C pair cross-linked by Sulfo-GMBS is 19 Å. This is calculated based on the projected distances of C-C, C-O, C-N (amide), C-N (amine), C-S, and S-S bonds, at 1.26, 1.17, 1.15, 1.19, 1.44, and 1.50 Å, respectively.


Yang et al. 6/7/12 2:08 PM 30

Supplementary Table 17. xQuest search of the GST [d0/d4]-BS3 cross-linking data On-‐line xQuest search

parameters (A) 4misclv_no-‐a-‐ion_0.01MS2

(B) 4misclv_a-‐ion_0.01MS2

(C) 4misclv_no-‐a-‐ion_0.02MS2

(D) 4misclv_a-‐ion_0.02MS2

(E) 2misclv_a-‐ion_0.01MS2

(F) 2misclv_no-‐a-‐ion_0.10MS2

Filtering of results F-‐F hits

non F-‐F

F-‐F hits

non F-‐F

F-‐F hits

non F-‐F

F-‐F hits

non F-‐F

F-‐F hits

non F-‐F

F-‐F hits

non F-‐F

#spec with score >= 4 47 81 36 70 42 42 29 44 30 61 16 14

#spec with score >= 5 41 38 26 31 28 23 22 23 22 32 10 6

#spec with score >= 6 28 19 21 15 21 16 20 18 18 18 6 4

#spec with score >= 7 20 13 17 15 16 9 17 11 16 16 5 3

#spec with score >= 8 17 10 15 10 11 6 14 8 13 12 4 2

#spec with score >= 9 15 6 13 6 10 4 11 5 12 9 4 2

#spec with score >= 10 11 6 10 6 7 4 8 5 9 7 2 2

Top 10 7 3 7 3 7 3 8 2 5 5 6 4

Top 20 14 6 13 7 14 6 13 7 11 9 12 8

(i) #spec scoring higher than the 1st reverse match in own charge

group

10 0 9 0 5 0 5 0 4 0 4 0

(ii) #spec scoring higher than the 2nd reverse match in own charge

group

11 4 10 4 12 4 11 4 6 4 10 4

(iii) #spec scoring higher than the 3rd reverse match in own charge

group

15 8 10 8 16 8 11 8 9 8 15 8

Details of (i) # F-‐F hits for +3, +4, +5, and +6 spectra are:

5, 3, 1, and 1 1, 3, 2, and 3 1, 1, 2, and 1 1, 1, 0, and 3 2, 1, 1, and 0 4, 0, 0, and 0

Details of (ii) # of F-‐F hits for +3, +4, +5, and +6 spectra are:

5, 3, 1, and 2 1, 4, 2, and 3 4, 3, 3, and 2 2, 3, 3, and 3 2, 2, 1, and 1 6, 2, 2, and 0

Details of (iii) # of F-‐F hits for +3, +4, +5, and +6 spectra are:

7, 3, 3, and 2 1, 4, 2, and 3. 8, 3, 3, and 2. 2, 3, 3, and 3 3, 3, 2, and 1 10, 2, 3, and 0

The cross-linking data of GST exp_1 in Supplementary Table 7 was searched with xQuest on-line. For the search parameters indicated in the table, 2/4misclv, 2 or 4 missed cleavages allowed; a-ion/no-a-ion, a, b, y ions or only b, y ions considered; 0.01/0.02/0.1MS2, MS/MS mass tolerance at 0.01, 0.02, or 0.1 m/z. Other parameters were: Cross-linker BS3_delta4; enzyme trypsin; xlink mass-shift 138.0680796; monolink mass-shifts 156.0786442, 155.0964278; isotope shift 4.0247 Da; retention time tolerance +/- 3 minutes; reactive amino acid, K; ionization mode ESI; fixed modification C 57.02146; MS1 mass tolerance 10 ppm; min 3 AA, max 40AA. A F-F cross-link means that both peptides match to the forward sequence of GST. The best result in each category is colored red. No spectra with > +6 charge scored 4 or higher.


Yang et al. 6/7/12 2:08 PM 31

Supplementary Table 18. Comparison of the GST [d0/d4]-BS3 cross-links identified by xQuest and pLink GST cross-links identified by xQuest using parameter set (A) and filtering condition (i) in Supplementary Table 17 are

shown.

Scan# Charge xQuest score

GST_pep1–pep2 Sites in pep.

Sites in GST intra/ inter

Dist. (Å)

< 24 Å?

Spec IDed by pLink? Xlink IDed by pLink? (Supp Table 7, exp_1)

1498 3 17.2 KRIEAIPQIDK–YLKSSK K1-‐K3 K180-‐K193 intra 14.2 Yes Yes, same result Yes

(13 spectral copies)

3208 4 15.4 LLLEYLEEKYEEHLYERDEG

DKWRNK–YLKSSK K22-‐K3 K39-‐K193 intra 28.8 No No No

3315 6 13.6

KFELGLEFPNLPYYIDGDVK–

YIADKHNMLGGCPKERAEISMLEGAVLDIR

K1-‐K14 K44-‐K86 inter 21.4 Yes No No

1481 5 13.3 IKGLVQPTR–

YIADKHNMLGGCPKER K2-‐K14 K10-‐K86 inter 25.2 No No No

3435 3 12.9 IAYSKDFETLKVDFLSK–

WRNKK K11-‐K4 K118-‐K43 inter 28.5 No

Yes, different result (VDFLSKLPEMLK(6)-‐

IAYSKDFETLK(5), K124-‐K112, 20.4 or 22.0 Å)

No

3719 3 12.6 DFETLKVDFLSK–WRNKK K6-‐K4 K118-‐K43 inter 28.5 No No No

3247 4 11.6 DFETLKVDFLSK–

LLLEYLEEKYEEHLYERDEGDK

K6-‐K9 K118-‐K26 intra 37.7 No No No

3767 3 11.4 IAYSKDFETLK–RIEAIPQIDKYLK


4064 3 11.2 IAYSKDFETLK–RIEAIPQIDKYLK



WRNKK K11-‐K4 K118-‐K43 inter 28.5 No

Yes, different result (VDFLSKLPEMLK(6)-‐

IAYSKDFETLK(5), K124-‐K112, 20.4 or 22.0 Å)

No


Yang et al. 6/7/12 2:08 PM 32

Supplementary Table 19. xQuest search of the GST [d0]-BS3 cross-linking data On-‐line xQuest search parameters (A) 4misclv_no-‐a-‐ion_0.01MS2 (C) 4misclv_no-‐a-‐ion_0.02MS2

Filtering of results F-‐F hits non F-‐F F-‐F hits non F-‐F

#spec with score >= 4 59 113 59 113

#spec with score >= 5 46 94 45 76

#spec with score >= 6 42 71 32 56

#spec with score >= 7 28 50 23 32

#spec with score >= 8 15 32 15 22

#spec with score >= 9 12 18 11 17

#spec with score >= 10 8 12 8 12

Top 10 5 5 5 5

Top 20 8 12 8 12

(i) #spec scoring higher than the 1st reverse match in own charge group

5 0 7 0

(ii) #spec scoring higher than the 2nd reverse match in own charge group

7 4 9 4

(iii) #spec scoring higher than the 3rd reverse match in own charge group

9 8 11 8

Details of (i) # F-‐F hits for +3, +4, +5, and +6 spectra:

1,2,1,1 1,3,1,2

Details of (ii) # F-‐F hits for +3, +4, +5, and +6 spectra:

1,2,1,3 1,3,3,2

Details of (iii) # F-‐F hits for +3, +4, +5, and +6 spectra:

1,2,3,3 1,3,4,3

GST was cross-linked with [d0]-BS3 alone. Parameters sets (A) and (C) in Supplementary Table 17 were used for on-line xQuest search, except for isotope shift being zero. The best result is colored red.


Yang et al. 6/7/12 2:08 PM 33

Supplementary Table 20. Comparison of the GST [d0]-BS3 cross-links identified by xQuest and pLink Cross-links colored blue are identified by both xQuest and pLink. For those colored red or orange, identical or similar ones are identified in Supplementary Table 7.

xQuest Result (search parameter set (C), filtering condition (i) from Supp Table 19)

Scan# Charge xQuest score

GST_pep1–pep2 Sites in pep.

Sites in GST Intra/ inter

Cα-‐Cα (Å)

< 24 Å? Spec IDed by pLink?

Xlink IDed by pLink?

3899 6 27.57 DEGDKWRNKKFELGLEFPNLPYYIDGDVK– DEGDKWRNKKFELGLEFPNLPYYIDGDVK


3896 5 27.18 DEGDKWRNKKFELGLEFPNLPYYIDGDVK– DEGDKWRNKKFELGLEFPNLPYYIDGDVK


3563 3 16.96 VDFLSKLPEMLK-‐ VDFLSKLPEMLK K6-‐K6 K124-‐K124 intra 31.8 No Yes Yes


WRNKK K11-‐K4 K118-‐K43 inter 28.5 No No No

3680 4 10.43 YGVSRIAYSKDFETLK–

YIAWPLQGWQATFGGGDHPPKSDLVPR K10-‐K21 K112-‐K217 n/a No No

3611 4 10.4 YIADKHNMLGGCPK– YIADKHNMLGGCPK K5-‐K5 K77-‐K77 intra 20.4 No No No

3439 6 5.94 LTQSMAIIRYIADKHNMLGGCPKER–

YGVSRIAYSKDFETLK K14-‐K10 K77-‐K112 intra 40.7 No No No

pLink Result (precursor mass accuracy 10 ppm, fragment mass accuracy 20 ppm, FDR <5%, E-‐value <0.01)

Scan#. charge

Spec qual.

Best E-‐value

GST_pep1–pep2 Sites in GST Intra/ inter

Cα-‐Cα (Å)

< 24 Å? Spec IDed by xQuest?

Xlink IDed by xQuest?

3249.3 2989.3 2984.4 3246.4 2993.5

high 7.13E-‐21

KFELGLEFPNLPYYIDGDVK(1)-‐IKGLVQPTR(2) NKKFELGLEFPNLPYYIDGDVK(3)-‐IKGLVQPTR(2) NKKFELGLEFPNLPYYIDGDVK(3)-‐IKGLVQPTR(2) KFELGLEFPNLPYYIDGDVK(1)-‐IKGLVQPTR(2) NKKFELGLEFPNLPYYIDGDVK(3)-‐IKGLVQPTR(2)

K44-‐K10 intra 20.04 Yes No No

1793.4 1799.3

high 7.87E-‐21 YIADKHNMLGGCPKER(14)-‐YIADKHNMLGGCPK(5) YIADKHNMLGGCPKER(14)-‐YIADKHNMLGGCPK(5)

K86-‐K77 inter 8.39 Yes No No

2541.3 high 1.14E-‐16 YIADKHNMLGGCPK(5)-‐SPILGYWK(1) K77-‐K1 intra 14.25 Yes No No

3387.3 high 1.30E-‐15 IEAIPQIDKYLK(9)-‐MSPILGYWK(1) K190-‐K0 n/a No No

1936.5 1940.3

high 1.26E-‐14 YIADKHNMLGGCPK(5)-‐IKGLVQPTR(2) YIADKHNMLGGCPK(5)-‐IKGLVQPTR(2)

K77-‐K10 intra 23.03 Yes No No

2377.3 high 8.19E-‐14 YIADKHNMLGGCPKER(14)-‐MSPILGYWK(1) K86-‐K0 n/a No No

2460.4 high 8.84E-‐14 IEAIPQIDKYLK(9)-‐IKGLVQPTR(2) K190-‐K10 intra 16.99 Yes No No

3092.3 high 1.43E-‐13 VDFLSKLPEMLK(6)-‐IKGLVQPTR(2) K124-‐K10 intra 23.51 Yes No No

1820.5 high 1.64E-‐13 YIADKHNMLGGCPKER(14)-‐KRIEAIPQIDK(1) K86-‐K180 intra 29.02 No No No

2335.3 2824.3

high 3.57E-‐13 IAYSKDFETLK(5)-‐IAYSKDFETLK(5)

IAYSKDFETLKVDFLSK(5)-‐IAYSKDFETLK(5) K112-‐K112 inter 16.80 Yes No No

1737.4 1738.3

high 1.57E-‐12 YIADKHNMLGGCPKER(14)-‐IKGLVQPTR(2) YIADKHNMLGGCPKER(14)-‐IKGLVQPTR(2)

K86-‐K10 inter 25.22 No No No

3534.4 high 2.28E-‐12 NKKFELGLEFPNLPYYIDGDVK(3)-‐LPEMLKMFEDR(6) K44-‐K130 inter 14.86 Yes No No

2114.5 2210.3

low 6.92E-‐12 YIADKHNMLGGCPKER(14)-‐RIEAIPQIDKYLK(10) YIADKHNMLGGCPKER(14)-‐IEAIPQIDKYLK(9)


3695.3. high 2.09E-‐10 NKKFELGLEFPNLPYYIDGDVK(2)-‐MSPILGYWK(1) K43-‐K0 n/a No No

3774.3 3776.5

high 4.93E-‐10 NKKFELGLEFPNLPYYIDGDVK(3)-‐VDFLSKLPEMLK(6) NKKFELGLEFPNLPYYIDGDVK(3)-‐VDFLSKLPEMLK(6)


2267.3 2263.4 2261.5

high 8.66E-‐10 YIADKHNMLGGCPKER(14)-‐MSPILGYWKIK(9) YIADKHNMLGGCPKER(14)-‐MSPILGYWKIK(9) YIADKHNMLGGCPKER(14)-‐MSPILGYWKIK(9)


2677.4 high 1.77E-‐09 NKKFELGLEFPNLPYYIDGDVK(3)-‐ K44-‐K86 inter 21.35 Yes No No


Yang et al. 6/7/12 2:08 PM 34

YIADKHNMLGGCPKER(14)

1744.3 high 3.08E-‐09 KRIEAIPQIDK(1)-‐YLKSSK(3) K180-‐K193 intra 14.24 Yes No No

3801.3 mid 8.33E-‐09 KFELGLEFPNLPYYIDGDVK(1)-‐LPEMLKMFEDR(6) K44-‐K130 inter 14.86 Yes No No

1741.4 high 1.13E-‐08 YIADKHNMLGGCPKER(14)-‐LVCFKK(5) K86-‐K179 intra 26.29 No No No

3773.4 3894.4

mid 7.69E-‐06 NKKFELGLEFPNLPYYIDGDVK(2)-‐VDFLSKLPEMLK(6)

NKKFELGLEFPNLPYYIDGDVK(2)-‐DFETLKVDFLSKLPEMLK(12)


3129.4 high 1.89E-‐05 NKKFELGLEFPNLPYYIDGDVK(2)-‐DEGDKWR(5) K43-‐K39 intra 6.42 Yes No No

2634.3 high 2.30E-‐05 YIADKHNMLGGCPK(5)-‐MSPILGYWK(1) K77-‐K0 n/a No No

3737.4 high 2.52E-‐05 VDFLSKLPEMLK(6)-‐IAYSKDFETLK(5)

K124-‐K112 intra/*inter

20.39/*22.02

Yes No No

1628.6 high 7.21E-‐05 YIADKHNMLGGCPKER(14)-‐YIADKHNMLGGCPKER(14) K86-‐K86 inter 21.29 Yes No No

3981.3 high 1.45E-‐04 FELGLEFPNLPYYIDGDVKLTQSMAIIR(19)-‐SPILGYWK(1) K63-‐K1 intra 12.10 Yes No No

2999.4 low 1.66E-‐04 YIAWPLQGWQATFGGGDHPPKSDLVPR(21)-‐LVCFKK(5) K217-‐K179 n/a No No

2150.4 high 2.03E-‐04 YIADKHNMLGGCPKER(14)-‐IEAIPQIDKYLKSSK(12) K86-‐K193 inter 32.23 No No No

3498.5 low 4.58E-‐04 KFELGLEFPNLPYYIDGDVK(1)-‐LLLEYLEEKYEEHLYER(9) K44-‐K26 intra 23.61 Yes No No

3563.3 low 4.54E-‐03 VDFLSKLPEMLK(6)-‐VDFLSKLPEMLK(6) K124-‐K124 inter 31.80 No Yes Yes


Yang et al. 6/7/12 2:08 PM 35

Supplementary Note for “Identification of Cross-linked Peptides from Complex Samples” by Yang et al.

pLink

• Datasets for software training and testing The key to optimizing CXMS is to understand the fragmentation patterns of cross-linked peptides.

Thus, we synthesized peptides mimicking cross-linked sequences after trypsin digestion. A total of 38 peptides of 5 to 28 amino acids were synthesized, each containing a lysine (K) or arginine at the C-terminus and at least one non-C-terminal K for cross-linking (Supplementary Table 2). All possible pair combinations-741 in total (including self-self pairs)-were treated with a 1:1 mix of [d0]- and [d4]-BS3 and analyzed one by one using reverse phase HPLC coupled to an LTQ-orbitrap-ETD mass spectrometer. We collected normal-resolution collision-induced dissociation (CID), normal-resolution electron-transfer dissociation (ETD), and high-resolution higher-energy collisional dissociation (HCD) spectra for each precursor ion and found that HCD produced the best results, followed by ETD (data not shown). Therefore, we focused on HCD. A total of 2077 non-redundant HCD spectra were collected from inter-linked, synthetic peptides, including cross-linking isoforms of the same pair (Supplementary Fig. 4).

The datasets are listed in Supplementary Table 3. Positive datasets A and B were generated from pair-wise cross-linking of two synthetic peptides. The resulting HCD spectra were searched against a database of only two target peptides that were in the cross-linking reaction. The two peptide sequences were cross-linked in silico or not in every possible way, mono-, loop-, or inter-linked. A precursor mass tolerance of 50 ppm was specified for database search using pFind with simple adaptation1. Only single cleavage products b+1, b+2, y+1, and y+2 were considered at this step, and a positive match required a minimum E-value of 0.005 and FDR no greater than 1%. Dataset A is made up of 2077 non-redundant HCD spectra of cross-linked peptides, 1030 of which are from light [d0]-BS3 (subset A1) and 1047 from heavy [d0]-BS3 (subset A2). Dataset B contains 13267 spectra and is equivalent to dataset A plus redundant spectral copies. Dataset B can be divided into subsets B1 ([d0]-BS3 cross-links) and B2 ([d4]-BS3 cross-links). The rest of the spectra from the 741 peptide-pair experiments that failed to identify inter-links were collected into dataset C. Some spectra in dataset C represent mono- or loop-linked peptides, and some may be inter-link spectra of very poor quality. Dataset D contains HCD spectra of regular peptides identified from C. elegans lysates (not cross-linked). Datasets C and D are negative datasets. From the negative datasets C and D, 5060 spectra were randomly selected for training and another 5060 spectra were selected for testing. From the positive dataset B, 5468 and 5467 spectra were randomly selected for training and testing, respectively. There were no overlap between the training set and the test set.

• Spectral quality filtering Because the computational cost of cross-link search is high, it is necessary to remove spectra that are

most certainly not going to identify cross-links. A Spectral Quality Score (SQS) was calculated as described before using 14 spectral features listed in Supplementary Table 4. Of these, 12 spectral features have been


Yang et al. 6/7/12 2:08 PM 36

described for CID spectra2. Since high resolution and high mass accuracy HCD spectra generally allow charge state determination of peptides and their fragments, we added two new features, the number of peaks with known charge-states and the fraction of peaks with known charge-states.

To determine the weight of each spectral feature in SQS calculation, we used Linear Discriminant Analysis (LDA). The high-quality cross-link spectra in positive dataset B were randomly sorted into two groups (6633 spectra in each), one for training and the other for testing. Similarly, two non-overlapping negative datasets were randomly selected from dataset C, one containing 6135 spectra for training and the other containing 6134 spectra for testing. The weights of the 14 features obtained from LDA are shown in Supplementary Table 4.

We found from the test data sets that a SQS threshold value of 4.2 was able to achieve 99.5% sensitivity, 94% accuracy, and 93% specificity, that is, 99.5% of the positive spectra were correctly retained and 93% of the negative spectra were correctly removed (Supplementary Fig. 6). Hence, all spectra were filtered by requiring a SQS score greater than 4.2.

• Pre-processing All peaks in a spectrum are classified into 6 categories shown below. Only the ones from the first

category are used for peptide-spectrum matching, the rest are removed. 1. Main peaks (monoisotopic) 2. Isotopic peaks 3. Peaks resulting from a neutral loss of ammonia 4. Peaks resulting from a neutral loss of water 5. Precursor ions. 6. Noise peaks.

Precursor ions and noise peaks are discarded first. Noise peaks are those that do not match any theoretical ions and appear in almost every spectrum. Shown in Supplementary Fig. 7 is the histogram (in 1 m/z bin) of unmatched peaks from 1030 HCD spectra (dataset Sub_A1, Supplementary Table 3). There are obvious noise peaks in the low m/z range, especially at 108, 153, and 200. In most spectra, these noise peaks tend to appear simultaneously.

Then isotopic peaks, ammonia- and water-loss peaks are identified and removed. These peaks are redundant and subsidiary to the main peaks they are affiliated with. If a main peak exists, affiliated peaks are removed to reduce random match.

After removing the peaks that contribute little to peptide-spectrum matching as above, the remaining peaks are called main peaks and their charge states are labeled if possible. The intensities of all main peaks are square root transformed to prevent peptide-spectrum matching from being excessively influenced by a few high-intensity peaks.

• Open search against large databases for complex protein mixtures For CXMS analysis of complex protein mixtures, the database search space (all possible peptide pair

combinations, or combinatorial mode) is prohibitively large. To make it feasible, we devised an open-search mode for any protein database of more than 100 proteins (Supplementary Fig. 8a). Briefly, before searching for a cross-linked peptide pair α−β as a whole, pLink looks for possible sequences of α and β separately by treating the other peptide as a modification on a lysine residue (pre-scoring). Then, the spectrum is scored against candidate α and β sequence pairs (fine-scoring).


Yang et al. 6/7/12 2:08 PM 37

The pre-scoring step resembles a conventional database search in that no candidate sequences are cross-linked in silico. However, unlike conventional search, which stipulates a narrow mass tolerance, the mass tolerance window is “opened up” to allow a spectrum to match to all candidate peptides whose theoretical masses are the same as or lower than that of the precursor. In addition, the candidate sequences must bear the specificity of the cross-linker and the digestion enzyme, and contain no more than a user-defined limit for missed cleavages. With BS3 and trypsin and no more than two missed cleavages, it means that the candidate sequence must have a K/R at the C-terminus and at least one but no more than three non-C-terminal K unless it is a protein N-terminal sequence, in which case zero, one, or two non-C-terminal K are allowed. For each filtered and pre-processed spectrum, a fast pre-scoring algorithm (described below) finds a list of possible sequences with a modification on K. The mass of the modification is the difference between the mass of the precursor and that of the peptide sequence being examined. By requiring a minimum –log(pre-score) value of 3.0, 89% of the non-target sequences were removed, while 99.7% of the targets were retained (Supplementary Fig. 8b).

After pre-scoring, candidate sequences are sorted into two groups. Those with a theoretical mass (the peptide itself, ignoring the mass of the modification) greater than (M – 50 ppm)/2, where M is the precursor mass, are classified as candidate α sequences. Similarly, those with a theoretical mass less than (M + 50 ppm)/2 are classified as candidate β sequences. In a cross-linked peptide pair α−β, α is always the one with a higher mass.

When all the spectra in the positive dataset “B” (13267 cross-link spectra, Supplementary Table 3) were searched against the source peptide sequences plus a background E. coli protein database (6164 forward and 6164 reversed protein sequences), it was found that 99% of the time, the correct α sequence were among the top 20 candidates while the correct β sequence were among the top 250. The average rank orders for the correct α and β sequences are 1.65 and 8.73, respectively. As such, the top 500 α and β candidates are kept for each spectrum, and the associated probability of losing the correct α or β sequence is 0.003.

Next, candidate α and β sequences are paired up to satisfy the following relationship within a specified mass tolerance range (default is 50 ppm for HCD): Mα + Mβ + Mlinker = M.

The last step of the open search is to score the spectrum again against candidate α−β pairs cross-linked in silico using a refined scoring algorithm (see below). For low-complexity samples, filtered and pre-processed spectra can go straight to fine scoring against candidate α−β pairs generated directly from a small protein database (Fig. 1a).

• Pre-scoring algorithm The pre-scoring algorithm in the open-search mode measures the significance of the match between

ions in an experimental spectrum and peaks in the theoretical spectrum of a candidate sequence. It consists of three components X, R, and T, representing the number of matched ions, average intensity ranking of matched ions, and the length of the longest sequence tag, respectively, that arise by chance.

(1) Statistic X: number of matched ions In the case of a random match, the number of matched ions X follows a hypergeometric distribution.

The probability density function of X is as follows:

(1)

In equation (eq.) 1, “N” is the maximal number of distinguishable peaks, equal to the


Yang et al. 6/7/12 2:08 PM 38

theoretical mass of a candidate sequence divided by the mass tolerance width for fragments; “n” is the number of peaks in the theoretical spectrum, letter “l” represents the number of fragment ions in an experimental spectrum; and “x” is the number of fragment ions that match to the candidate sequence.

The probability of “X,” the number of matched ions arising by chance, exceeding “x” is calculated using eq. 2.

(2)

(2) Statistic R: mean intensity ranking of matched ions Ions are ranked by intensity from high to low. The relative ranking of the ith ion is Ri (Ri = i/n,

while i=1, 2, 3, …n). For randomly matched ions, we assume an independent and even distribution of Ri between 0 and 1. The expectation of Ri is 0.5 and the variance is 1/12. Let “R” be the mean relative ranking of a group of randomly matched ions. Then, R follows a normal distribution and the probability density function of R is:

(3)

In eq. 3, µ and σ2 are expectation and variance of R, respectively. For x number of matched

ions:

(4)

(5)

For R (the mean intensity ranking of randomly matched ions) to be less than r (observed mean

intensity ranking of matched ions), the probability can be calculated as follows:

(6)

(3) Statistic T: length of the longest sequence tag T is the amino acid length of the longest sequence tag obtained from a spectrum divided by

that of a candidate peptide, so T ∈ [0,1]. Let a string of 0 and 1 indicate whether or not an amino acid residue is correctly identified at each and every position of a peptide. Assuming that there is a 50% chance of correct identification at any position, and identification at each is an independent event, then T is the longest run of 1s normalized to the length of the string. The probability distribution of T with different peptide lengths can be simulated using the Monte Carlo method and stored in a table. Given the length of a candidate peptide, the probability of T (by chance) > t (observed) can be looked up from the table. Supplementary Fig. 9 shows the probability values as a function of T for peptides of 5, 8, 10,


Yang et al. 6/7/12 2:08 PM 39

15, 18, 20, or 25 aa, as generated from Monte Carlo simulation. Lastly, the pre-score is the product of the three p-values described above.

(7)

Supplementary Fig. 10 illustrates how Pre_score is computed. We found that with a Pre-score threshold of 10-3, or –log(pre-score)≥3, sensitivity reached 99.7%, specificity 89%, and accuracy 70% (Supplementary Fig. 8b).

Pre-scoring is essential for large database search. When 5455 spectra were searched against a database of 108 proteins on a personal computer (Intel Core 2 Quad CPU, 2.4 GHz, 2 GB RAM), pre-scoring reduced the search time from 10 hours to 43 minutes.

• Fine-scoring algorithm The fine-scoring algorithm is an extension of the Kernel Spectral Dot Product (KSDP)

algorithm previously developed for the database search engine pFind6. We took the following steps to refine the KSDP algorithm for cross-link identification.

(1) Selection of fragment ion types for scoring Theoretically many types of fragment ions can be generated from inter-linked peptides, but in

actuality some are rarely seen and some never. It is crucial to select the appropriate ion types for scoring. Failing to consider ions that do exist abundantly in cross-link spectra doubtlessly compromises identification, and so does careless inclusion of ions that only exist theoretically because it does nothing but increasing the chance of random match.

To distinguish various fragment ion types, we used the systematic nomenclature suggested previously with slight modifications3.

In an inter-linked peptide pair, the higher-mass peptide is α and the lower-mass peptide is β; if two peptides are of the same mass but different sequence (extremely rare), α sequence is lower than β in alphabetical order; if two molecules of the same peptide cross-link, then it is both α and β.

For single backbone cleavage products, only the dissociated peptide is indicated, e.g. βy21+,

βb52+, αy8

2+. If αy82+ contains the cross-linking site it may be labeled as αy8

2+x to indicate that the intact β peptide is attached to the α fragment through the linker. Suffix such as “–H2O” or “–18.0153” indicates neutral loss, e.g. αb5

1+–NH3. For fragments that require two backbone cleavage events, they are labeled like αy5αb3

1+, βy6βa5

2+, αb5βy21+, or αy8βa7

3+ (illustrated in Supplementary Fig. 11a-c). Always, α precedes β if both peptides are dissociated and y precedes b or a when both cleavages occur on the same peptide. These fragments all contain the cross-link and may be decorated with “x” (e.g. αy8βa7

3+x). Among these types are two special αyαa and βyβa fragments that result from enhanced cleavage of the nearest peptide bond N-terminal to a cross-linked lysine Cα atom and breakage of the C-C bond involving this Cα atom3. These are called K-linked fragments, labeled as KLα or KLβ, equivalent to a lysine immonium ion attached to an intact α or β peptide through the linker (Supplementary Fig. 11c). KLα or KLβ ions with a neutral loss of ammonia (KLα/β –17) are also common as reported before4.

Either amide bond in the BS3 linker can also break3,4 . α/βL5 or α/βL13 refer to the resulting α or β peptide with a linker fragment (illustrated in Supplementary Fig. 11d). The suffix number refers


Yang et al. 6/7/12 2:08 PM 40

to the number of atoms between the lysine Cα atom and the cleaved amide bond. Since we know the absolute identifications of the 2077 non-redundant cross-link spectra from

dataset A (Supplementary Table 3), we matched all theoretical fragments (peaks) to ions found in the experimental spectra after pre-processing. For each ion type, the following features were collected:

1. Count Ratio, calculated as the number of matched peaks of an ion type divided by the number of matched peaks of all ion types

2. Gain Ratio, is the number of matched peaks of an ion type divided by the total number of peaks of that ion type

3. Average Intensity, average of normalized intensity of matched peaks belonging to the same ion type (normalized to the base peak intensity)

4. Match Significance, the product of Count Ratio, Gain Ratio, and Average Intensity 5. Average Mass Deviation, average mass deviation of matched ions from the

theoretical peaks they are matched to 6. Tag Length, amino acid length of the longest sequence tag deduced from ions of the

same type, normalized to the length of the peptide (for cleavages that are all on one side of the cross-link on either α or β peptide)

7. Average Number of Amino Acids, indicating on average how many amino acid residues constitute an ion of a given type

We ranked all ion types by match significance. The top 20 ion types are shown in Supplementary Table 5. The most significant ion type is y1+. Of all matched peaks, 21.8% are y1+, and 67.1% of all theoretical y1+ peaks have matching ions in experimental spectra. Moreover, matched y1+ peaks have much higher intensity compared to the rest of the ion types. The y1+ ions also tend to be continuous, generating sequence tags that can cover, on average, 61.7% of the peptide sequence.

Besides single-cleavage ion types, several double-cleavage ion types are also significant, such as yb1+, ya1+, α/βL, and KLα/β. The yb1+ ions include αyαb

1+, βyβb1+, αyβb

1+, and αbβy1+ sub-types.

Similarly, the ya1+ ions include αyαa1+, βyβa

1+, αyβa1+, and αaβy

1+. Related ions αyβy1+, αbβb

1+, and αaβa1+

are not included because they are too rare. Precursor ions [M+5H]5+ and [M+4H]4+ are intense in HCD spectra of high charge state cross-links. Without pre-processing, [M+5H]5+ and [M+4H]4+ ranked 3rd and 4th in match significance, next only to y1+ and y2+ (data not shown). These precursor ions provide no sequence information, therefore they are not used for scoring. Although a fraction remains after pre-processing, they are out of the top 10 in match significance.

To further analyze the properties of fragment ions common in cross-link spectra, we divided each ion type into two sub-types depending on whether or not a fragment contains a cross-link. Their theoretical peak-experimental ion match properties were analyzed as above, and the results indicate that: a) the charge states of matched peaks containing a cross-link tend to be higher that those not; b) most of the matched peaks and ion intensity are accounted for by those containing no cross-links, c) for ion types b1+, y1+, a1+, and ya1+, the sub-types that contain a cross-link can be ignored; d) for ion types b2+, b3+, a2+, a3+, and y3+, the sub-types that do not contain a cross-link are negligible; e) for ion types y2+ and yb1+ both sub-types should be taken into account.

(2) Weighting factors in KSDP The refined scoring function is based on the KSDP model 1:


Yang et al. 6/7/12 2:08 PM 41

(8)

In eq. 8 “S” is the sum of the intensity of all matched ions, “L” is the number of peaks to be

weighted for continuity, “K” is the sum of the weight for ion continuity and the weight for coexistence of different types of ions supporting the same cleavage, and “θ” is a constant balancing K and S.

According to ion continuousness (feature “Tag Length”), only the ion types with a large Tag Length value such as y1+, y2+, b1+ and b2+ may be weighted for continuity. In contrast, all ion types are to be weighted for co-existence.

(9)

(10)

In the equations above “M” is a matrix, in which each row is an ion type and each column

represents a cleavage site. The value in each cell indicates whether or not a theoretical peak finds a matching ion in the experimental spectrum, 1 for yes and 0 for no. “l1” and “l2” are the lower and upper boundaries of a continuity window from which Kcontinuity is to be calculated. Similarly, “l3” and “l4” are the boundaries of a coexistence window. An example is shown in Supplementary Fig. 12. There are two continuity windows 1 (for b ions) and 2 (for y ions). Also, there are four coexistence windows 3 through 6 whose Kcoexistence values are to be computed. Window 3 emphasizes the cleavages immediately C-terminal to the cross-linking site in different ion types, and window 6 highlights fragments resulting from linker cleavage.

For inter-linked peptides α-β, Kcontinuity and Kcoexistence are computed separately for α and β, and the final K value is calculated as follows:

(11)

(3) Optimizing the use of ion types To find out the best way to utilize the information carried by each ion type, we considered the

following options: a) whether or not an ion type is to be used in KSDP scoring; b) for each ion type, whether its cross-link-containing and cross-link-free sub-types should be treated separately; c) for each ion type or sub-type, whether it should be weighted for continuity.

To be able to decide on each option or combination of options, we searched dataset_A1 (1030 non-redundant HCD spectra, Supplementary Table 3) against a target database and a decoy database.


Yang et al. 6/7/12 2:08 PM 42

The goal was to find the best setting to maximize the difference between the target database score and the decoy database score. Here we took normalized refined_scores (E-values, see below) and used the negative values of their natural logarithms, i.e. –ln(E-value), to compare differences. The initial setting and the optimal setting we finally arrived at are shown in Supplementary Table 6. In the final setting, yb1+ and ya1+ ions are limited to the ones that contain no more than 4 amino acids to reduce random match, since matched yb1+ and ya1+ peaks have an average of 3.226 and 1.772 amino acids (Supplementary Table 5). After optimization, the negative ln(E-value) of target database search increased by an average of 7.1772, whereas the decoy database search only increased by 1.918 (Supplementary Fig. 13). Similar results were obtained with 240 positively identified cross-link spectra from the CXMS analyses of GST and CNGP. The negative ln(E-value) of target database search increased by an average of 20.9 from the initial setting to the optimized setting, whereas the decoy database search increased by 11.9. Consideration of cross-link specific ions made a significant contribution to this improvement (Supplementary Fig. 14).

• Calculation of E-values Refined_scores cannot be compared between spectra. The absolute score value is related to

the precursor charge state and number of ions in a spectrum, and the length of a matching peptide, so a “busy” spectrum matched to a wrong sequence may have a higher Refined_score than a “sparse” spectrum to the right sequence. We normalize Refined_scores by converting them to E-values. The following steps are taken to calculate E-values.

1) Generate 5000 Theoretical Peptide Candidates (TPCs). For each TPC, randomly generate an amino acid at every position until the cumulative mass is just above the precursor mass minus the linker. Then split each TPC into halves, one half as α (marked as α’) and the other as β (marked as β’). TPCs from this step mimic the situation where both α and β identifications are out of random match.

2) Keep α peptide the same and randomly generate 5000 TPCs in place of β (marked as β’). For each TPC, randomly generate an amino acid at every position until the cumulative mass is just above the mass of β peptide.

3) Keep β peptide the same and randomly generate 5000 TPCs in place of α (marked as α’) in the same way. TPCs from step 2 and step 3 are meant to mimic the situation where the identification of one cross-linked peptide is correct but the other is by chance.

4) Compute Refined_score for a spectrum against each of the 5000 TPCs (a total of 15,000 TPCs).

5) Take the top 10% Refined_scores and rank them from low to high. These are background data points.

6) Calculate empirical p-value for each Refined_score value, that is, at each score value, the empirical p-value for a score higher than current one is the number of data points with a higher score divided by the number of total data points (5000).

7) From the background data points above, find the maximum likelihood estimate of the parameters of the linear model between log(p-value) and Refined_score. log(p-value) = a*x + b (x is Refined_score, a<0, b>0)

8) Obtain the p-value(s) for the candidate cross-link(s) matched to a spectrum. 9) Calculate E-value by multiplying the p-value of a match by the number of candidate

cross-links falling within the mass tolerance window from a search database.


Yang et al. 6/7/12 2:08 PM 43

We tested other methods to generate background candidate peptides, for example, taking Real Peptide Candidates (RPCs) from the search database, or using other methods to generate TPCs (data not shown). The simple TPC method described above proved to be fast, effective and stable.

• Estimation and control of FDR A major problem of CXMS is how to calculate and control FDR associated with cross-link

identifications. For this, we devised a modified version of the reversed database strategy5 to estimate FDR. The sequence of each protein entry in a forward or target database is reversed to create a decoy database the same size as the forward database. Peptide sequences from both the forward (F) and the reversed (R) database are cross-linked in silico in every possible combinations, producing three categories of cross-links: those between two forward peptides (F-F), between two reversed peptides (R-R), and those between a forward peptide and a reversed peptide (F-R and R-F). As shown in Supplementary Fig. 15a, the F-F category is marked “T” for true because only this category contains true identifications; the R-R category is marked “F” for false; the F-R and R-F are collectively called “U” for union of F and R. The size ratio among T, F, and U is 1:1:2. Out of random match, a spectrum has a ¼, ¼, and ½ chance to match to T, F, and U, respectively.

To find out if this theoretical prediction holds true, we searched the cross-link data of the yeast UTP-B complex against three databases (a human protein database, an archaea protein database, and a random sequence database built by computer) that has none of the UTP-B subunit sequences. Therefore, all matches are random. We found that the number matches to T, F, and U closely followed the expected ratio of 1:1:2 for all three databases (Supplementary Fig. 15b). If the UTP-B protein sequences were included, then the number of matches to T increased, accompanied by a drop in both F and U. Thus, the theory holds true and we can use the number of matches to T, F, and U to estimate FDR.

Among the spectra that match to peptide pairs in T (NT), there are two types of false matches: (1) both peptide sequences are wrong and (2) one peptides sequence is correct but the other is wrong. The number of type 1 false matches is estimated by the number of spectra that match to F (NF), meanwhile, twice as many of them (2NF) are expected to match to U. Therefore, the number of type 2 false matches is estimated by (NU – 2NF). Hence, we derive the following:

(12)

(13)

In rare cases where NU-NF ≤ 0, FDR is estimated using the ratio of NF/NT. To test the accuracy of estimated FDR using eq. 13 we pooled dataset A (cross-link spectra

with known identity) and dataset D (spectra from non-cross-linked samples) together and searched a database containing target and decoy sequences. The result shows that estimated FDR is in excellent agreement with true FDR (Supplementary Fig. 15c). Only when FDR reaches above 40%, which is too high to be worthy of fine distinction, the estimation begins to deviate.


Yang et al. 6/7/12 2:08 PM 44

• pLink performance tests (1) Test#1–Small Dataset against Small Database The test data consisted of 2077 cross-link spectra from dataset A and 3016 non-cross-link

spectra from dataset D. In the database are the sequences of 38 synthetic peptides and the UTP-B proteins (six proteins), along with the reversed sequences. The search parameters are: precursor mass tolerance 50 ppm, fragment mass tolerance 20 ppm, cross-linker [d0]-BS3, search mode combinatorial (all peptide pair combinations enter fine scoring). Of the 2077 spectra in dataset A, 1030 are [d0]-BS3 cross-links and 1047 are [d4]-BS3 cross-links. Therefore, in this search the [d4]-BS3 cross-link spectra can match to one correct peptide sequence at best. This was done by purpose to create the scenario where only one peptide identification is right and the other is wrong. All together, we expect 1030 positive identifications and 4063 negative identifications in a perfect search. Sensitivity, accuracy, and specificity are calculated as follows.

(14)

(15)

(16)

The search result shows that positive and negative identifications are well separated by E-

value and precursor mass accuracy. A 10 ppm precursor mass accuracy was applied and the result was examined at varying FDR control levels. At 5% FDR, sensitivity is as high as 99%, accuracy 98%, and specificity 99%.

(2) Test#2–Small Dataset against Large Database In open-search mode, the same test data above were searched against an E. coli database with

38 synthetic peptide sequences appended to it, along with the reversed sequences. There are 12328 protein sequences (forward+reversed) in this database, from which 657195 peptide sequences are derived. Repeating the same analysis method, we find a small drop in sensitivity, but sensitivity, accuracy, and specificity are still at 95% or above.

(3) Test#3–Large Dataset against Small Database A total of 42051 spectra from datasets B and D and a subset of C (7668 spectra) constituted

the test data. The search and filtering parameters were the same as above. Similarly, only 7706 [d0]-BS3 cross-link spectra (dataset_B1) were expected to be identified because in the search parameter the cross-linker was set to [d0]-BS3 only. The small database was the same one used in test#1. From the combinatorial-mode search, sensitivity, accuracy, specificity are all above 97% at 5% FDR. The pLink performance in this test is comparable to that in test#1, suggesting that with a small database, having lots of interfering negative spectra hardly affects the result.


6/7/12 2:08 PM

45

(4) Test#4–Large Dataset against Large Database The same dataset in test#3 (42051 spectra) were searched using the open-search mode against the

database used in test#2 (12328 proteins, or 657195 peptides). At 5% FDR, the sensitivity decreased to 92%, but accuracy and specificity remained above 95% (Fig. 1b).

• pLink run time in typical experiments (1) Example #1: 5267 spectra + tiny database (6 proteins + reverse sequences)

hardware: personal computer Intel Core 2 Quad CPU, 2.33 GHz, 2 GB RAM search parameters: same as above run time: 120 min in combinatorial search mode

(2) Example #2: 5455 spectra + median database (108 proteins + reverse sequences) hardware: personal computer Intel Core 2 Quad CPU, 2.33 GHz, 2 GB RAM search parameters: same as above run time: 43 min in open search mode

(3) Example #3: 6403 spectra + large database (10065 proteins + reverse sequences) hardware: computer cluster, 8 units of Dual Xeon 5405, 2.0 GHz, 4G RAM, 8 core/unit search parameters: same as above run time: 46 min in open search mode

• Comparison of pLink and xQuest To benchmark the performance of pLink, we compared it to xQuest6. The GST cross-linking data from

experiment 1 in Supplementary Table 7 (using [d0]/[d4]-BS3) and a database containing GST and trypsin, along with their reversed sequences were submitted to the xQuest web server (http://prottools.ethz.ch/orinner/public/htdocs/xquest/xquest_review.html).

As shown in Supplementary Table 17, the search parameters were varied in order to find the best

setting for HCD data. Allowing 4 missed trypsin cleavage sites, no consideration of a-ions, MS1 mass tolerance of 10 ppm, and MS2 mass tolerance of either 0.01 or 0.02 m/z (equivalent to 10 or 20 ppm at 1000 m/z) yielded better results than others. The best result (setting A with filtering condition (i), bolded and colored red in Supplementary Table 17) identified 10 spectra, corresponding to 8 cross-linked lysine pairs (Supplementary Table 18). Most of these lysine pairs have a Cα-Cα distance greater than 24 Å, only two are within the distance limit for BS3. The overlap between the xQuest result and the pLink result is marginal. Only the K180-K193 cross-link, which has the highest xQuest score and shortest Cα-Cα distance, was identified with both programs. From the same data and similar parameters (MS2 mass tolerance at 20 ppm instead of 0.01 m/z), pLink identified 7 cross-linked lysine pairs from over 40 spectra; all except one are formed between two lysine residues less than 24 Å apart (Supplementary Table 7, exp_1). Repeating the comparison using the GST exp_2 data from Supplementary Table 7 recapitulated the difference. Thus, the pLink results appear to be more reliable and encompass more successful spectral identifications.


6/7/12 2:08 PM

46

For samples treated with light cross-linker only, pLink is equally effective. For instance, a similar number of cross-links are identified from E. coli experiment #1 (with [d0]-BS3) and #2 (with 1/1 [d0]/[d4]-BS3) using our CXMS method (Supplementary Table 11). The xQuest algorithm relies on isotopic spectral pairs to differentiate common and xlink ions, which are the basis of its identification strategy6. Without light/heavy isotope labeling, xQuest is expected to be ineffective. This is verified experimentally using data from a GST sample cross-linked with light [d0]-BS3. As shown in Supplementary Tables 19 and 20, the best xQuest identification result consists of 7 GST cross-links, none of them is supported by the GST structure (Cα-Cα distance > 24 Å for six of them, one without structural evidence). In contrast, pLink identified 30 cross-linked lysine pairs from 43 spectra. Most of them are structurally sound (Cα-Cα distance < 24 Å for 18 pairs, > 24 Å for 7 pairs, and 5 without structural evidence). There is only one overlap between the xQuest result and the pLink result (colored blue in Supplementary Table 20). Between the cross-links identified by pLink from the light BS3 experiment and those from the [d0]/[d4]-BS3 experiments (Supplementary Table 7), two are identical (colored red in Supplementary Table 20) and another pair are very similar (K63-K1, colored orange in Supplementary Table 20, vs K63-M0, i.e. the protein N-terminus, in Supplementary Table 7).

Supplementary Discussion The earliest software for CXMS can be traced back to year 20007. Since then, much effort has been

invested into this technology. Divide-and-conquer is one strategy whose critical component is a cross-linker

which breaks in a MS2 scan, thereby allowing sequencing of two released peptides in subsequent MS3 scans.

Among these are disuccinimidyl sulfoxide (DSSO), Protein Interaction Reporter (PIR) cross-linkers,

Isotopically Coded Cleavable (ICC) cross-linkers, and cyanurbiotindipropionylsuccinimide (CBDPS)8-11.

Another approach is mainly focused on developing software tools that identify two inter-linked peptides without

having to separate them, including SearchXLinks12, X!Links13, xComb14, Popitam15, Xi16, PepLynx17, Xlink-

Identifier18, and a SEQUEST-like search engine for crosslink analysis19. Here we took the latter approach and

developed a software tool compatible with a variety of common cross-linkers. Our goal is to make CXMS easy,

effective, and readily available.

In our BS3 cross-linking analysis of GST and the CNGP complex, most of the observed cross-links are

consistent with the crystal structure data, but there are a few that are not. In repeated experiments, the

problematic cross-links are mostly observed only 1–3 times; in contrast, the structurally supported cross-links

are observed with an average of 14.7 times for GST (three repeats) and 13.4 times for CNGP (two repeats)

(Supplementary Tables 7-8). The structurally incompatible cross-links may be non-specific ones coming from

multiple sources. They could arise by chance when two non-interacting proteins in Brownian motion happen to

be momentarily close enough to be captured by a cross-linker. We show experimentally that this does occur, but

the frequency is very low, at least 100-fold lower than cross-links within a protein or protein complex

(Supplementary Fig. 5). Another possibility is that they result from cross-linking of protein aggregates, i.e.

denatured or partially denatured proteins. We are careful to avoid protein aggregates and over-cross-linking. For


6/7/12 2:08 PM

47

example, we abandon cross-linking conditions that cause any visible precipitation. However, microscopic

aggregation remains a possibility. The third possibility is that in some regions there might be some alternative

structures of a protein in solution compared to that in crystal lattice.

Notably, there are cross-linking “hot” sites; these are lysine residues that form cross-links with two or

more sites, e.g. GST_K26 (Supplementary Table 7) or Cbf5_K180 (Supplementary Table 8). Conversely,

there are “cold” sites” that are not observed at all even though they could theoretically form cross-links with

other lysine residues. Out of >300 K-K combinations in GST, 54 K-K cross-links are theoretically possible

using BS3 after applying surface accessibility and distance constrains (Supplementary Fig. 17)18. Yet only

eight of them were identified and of the eight, three involve K26 (Supplementary Table 7). In GST and CNGP,

cross-links that are incompatible with structure are often originated from hot sites. For the UTP-B complex, out

of 345 amine groups (339 lysines plus 6 N-termini), only 94 (27%) were detected as cross-linking sites. Yet, 39

(41%) of these lysine residues cross-link with two or more sites, accounting for 88.5% of the total cross-link

spectra.

The uneven distribution of observed cross-links is likely a combined result of solution phase chemistry

(accessibility and reactivity of two amine groups within 11.4 Å distance) and gas phase chemistry (ionization,

m/z, and fragmentation of cross-linked peptides). The former determines how many cross-links are formed and

the amount of each; the latter governs the visibility of cross-linked peptides in mass spectrometry. Solvent

accessible surface distance can be calculated using Xwalk20. Here we focus on reactivity of lysine residues. The

pKa value of the ε-amino group of lysine is usually around 10.521, but it changes with local environment and

can be as low as 5.322. Those with higher pKa values are less nucleophilic, i.e. less reactive to BS3. A positively

charged environment favors the deprotonated form of the lysine ε-amine, lowering its pKa and increasing its

reactivity. On the contrary, a negatively charged surrounding stabilizes the protonated form, increasing its pKa

and lowering its reactivity. Moreover, positively charged regions would attract negatively charged BS3 better.

So, everything else being the same, a lysine adjacent to an arginine would be more reactive than one next to

serine or glutamic acid. If a lysine side chain forms a salt bridge with an acidic residue, then it is unlikely to

react with BS3. Cross-linking hot sites possibly occupy regions that are positively charged and highly accessible

to BS3.

So far, pLink is the only algorithm that has been optimized with a large standard dataset in which the

absolute identity of each cross-linked peptide pair is known. However, it is optimized only for BS3. For

comprehensive structural analysis by CXMS, other homo- or hetero-bifunctional cross-linkers would be helpful.

pLink is compatible with other cross-linkers, whether they are specific to lysine or not, but for best performance

some of the ion types may need adjustment if the fragmentation behavior of a cross-linker differs from that of

BS3. As expected, we find that pLink is equally effective for DSS, a functional homolog of BS3


6/7/12 2:08 PM

48

(Supplementary Table 13). Similar NHS-type cross-linkers like BS2G and DSG should work just as well.

pLink also works for three hetero-bifunctional cross-linkers we have tested–EDC (inducing K-D or K-E cross-

link, zero-length), AMAS (K-C cross-link, 4.4-Å spacer arm) and sulfo-GMBS (K-C cross-link, 7.3-Å spacer

arm) (Supplementary Tables 14-16). The percentage of the cross-links that are structurally sound is lower than

that obtained with amine-specific cross-linkers BS3 and DSS (compare Supplementary Tables 14-16 to

Supplementary Tables 8 and 13). Especially for EDC, none of the cross-links identified fits distance constrain.

This may be due to imperfections of cross-linking conditions. For hetero-bifunctional cross-linkers, different

functional groups call for different reaction conditions, and it has yet to be determined what is the best balance

between preserving the native conformation of proteins and maximizing cross-linking efficiency. The EDC-

carboxyl reaction is most efficient at pH 4.5, the maleimide-sulfhydryl reaction is best at pH 6.5-7.5, whereas

the amine-target NHS-ester reaction is performed at pH 7-9. For EDC, the reaction has to be carried out in two

steps, first at slightly acidic pH, then at neutral or slightly alkaline pH. The pH change may affect protein

conformation somewhat. In spite of this, the general trend remains true that the longer the spacer arm of a cross-

linker, the more cross-links. For example, the sulfo-GMBS cross-links of the CNGP complex included all five

AMAS cross-links and 15 additional ones (Supplementary Tables 15-16).

Besides chemical cross-linking, pLink is applicable to natural cross-links such as disulfide bond or

sumoylation. Again, for optimal results, a large enough standard dataset of disulfide-linked peptides or

sumoylated peptides will be helpful for fine-tuning of the pLink parameters.

The current version of CXMS still has limited sensitivity to detect protein-protein interactions in

endogenous protein complexes. With further improvement, such as the development of cross-linkers with

increased efficiency and affinity tags for specific enrichment of cross-linked peptides, the technique will

become more powerful to explore the interactome of complex samples.

Overall, CXMS provides reliable structural information, trading resolution with ease and speed. It complements high-resolution approaches, particularly in cases where a protein of interest is difficult to crystallize. CXMS may also be utilized in kinetic studies of protein folding and unfolding, or protein complex assembly and disassembly.

References 1. Fu, Y. et al. Bioinformatics 20, 1948-‐1954 (2004).

2. Nesvizhskii, A. I. et al. Mol Cell Proteomics 5, 652-‐670 (2006).

3. Schilling, B., Row, R. H., Gibson, B. W., Guo, X. & Young, M. M. J Am Soc Mass Spectrom 14, 834-‐850

(2003).


6/7/12 2:08 PM

49

4. Gaucher, S. P., Hadi, M. Z. & Young, M. M. J Am Soc Mass Spectrom 17, 395-‐405 (2006).

5. Elias, J.E. & Gygi, S.P. Nature methods 4, 207-‐14 (2007)

6. Rinner, O. et al. Nature methods 5, 4 (2008).

7. Young, M. M. et al. Proc Natl Acad Sci U S A 97, 5802-‐5806 (2000).

8. Kao, A. et al. Mol Cell Proteomics 10, M110 002212 (2011).

9. Anderson, G. A., Tolic, N., Tang, X., Zheng, C. & Bruce, J. E. J Proteome Res 6, 3412-‐3421 (2007).

10. Petrotchenko, E. V. & Borchers, C. H. BMC Bioinformatics 11, 64 (2010).

11. Petrotchenko, E. V., Serpa, J. J. & Borchers, C. H. Mol Cell Proteomics 10, M110 001420 (2011).

12. Wefing, S., Schnaible, V. & Hoffmann, D. Anal Chem 78, 1235-‐1241 (2006).

13. Lee, Y. J., Lackner, L. L., Nunnari, J. M. & Phinney, B. S. J Proteome Res 6, 3908-‐3917 (2007).

14. Panchaud, A., Singh, P., Shaffer, S. A. & Goodlett, D. R. J Proteome Res 9, 2508-‐2515 (2010).

15. Singh, P. et al., Anal Chem 80, 8799-‐806 (2008).

16. Chen, Z. A. et al. EMBO J 29, 717-‐726 (2010).

17. Zelter, A. et al. J Proteome Res 9, 3583-‐3589 (2010).

18. Du, X. et al. J Proteome Res 10, 923-‐931 (2011).

19. McIlwain, S., Draghicescu, P., Singh, P., Goodlett, D. R. & Noble, W. S. J Proteome Res 9, 2488-‐2495

(2010).

20. Kahraman, A., Malmstrom, L. & Aebersold, R. Bioinformatics 27, 2163-‐2164 (2011).

21. Grimsley, G. R., Scholtz, J. M. & Pace, C. N. Protein Sci 18, 247-‐251 (2009).

22. Isom, D. G., Castaneda, C. A., Cannon, B. R. & Garcia-‐Moreno, B. Proc Natl Acad Sci USA 108, 5260-‐

5265 (2011).

23. Su, C. et al. Nucleic Acids Res. 36, D632–D636 (2008)


nmeth.2099 si titles - nature research · supplementary table 17 3xquest search of the gst [d 0 /d...

Documents