a proteomic atlas of the legume medicago truncatula and ... · pdf fileof symbiosis between...

11
1198 VOLUME 34 NUMBER 11 NOVEMBER 2016 NATURE BIOTECHNOLOGY RESOURCE To meet the food demand of a growing world population, current plant research aims to improve yield, nutrient content, and resist- ance to pathogens while enhancing agricultural sustainability and lowering environmental impacts. Leguminous plants are of special interest as they acquire nitrogen from the air through association with rhizobia, nitrogen-fixing soil bacteria, reducing the need for fertilizers. Rhizobia fix nitrogen within root nodules 1,2 , a special- ized organ where they encounter plant-derived factors that induce the differentiation of free-living rhizobia into bacteroids that express the nitrogenase and fix nitrogen in exchange for photosynthetically derived dicarboxylic acids 3–8 . The molecular mechanisms that enable legumes to host nitrogen-fixing symbionts are not well understood. A comprehensive understanding of this process may allow nitrogen fixation to be engineered into non-leguminous crops, including cereals such as maize, rice, and wheat. Several large-scale studies have probed the molecular mechanisms of symbiosis between the model legume M. truncatula and one of the best-characterized rhizobial species, S. meliloti 9–18 . Recently, a large-scale genome-wide transcriptome analyses was conducted to evaluate organ-specific gene expression, especially during seed and nodule development in M. truncatula 19 . Further, the recently pub- lished genome sequence of M. truncatula considerably improved the gene inventory 20 . We leveraged these resources, together with high-resolution mass spectrometry 21,22 , and we generated a compre- hensive, quantitative atlas of protein expression of M. truncatula in association with S. meliloti. This proteomic atlas provides evidence for 23,013 protein groups (19,679 from the eukaryotic host plant, M. truncatula; 3,334 from S. meliloti) along with 20,120 phospho- rylation sites and 734 lysine acetylation sites. Using this resource, we discovered a core M. truncatula proteome and defined a subset of proteins that display organ-specific regulation. We identified organ-specific processes and assigned putative functions to several uncharacterized proteins. Finally, we generated a symbiosis-specific regulatory network and identified putative stages of host-factor expression and their endosymbiont targets. RESULTS A protein and post-translational modification atlas of M. truncatula and its endosymbiont S. meliloti To compile a comprehensive proteome profile of M. truncatula, we harvested plants and dissected them into seven major organs: flower, apical meristem (bud), leaf, stem, root, seed, and nodule (Supplementary Fig. 1a). To analyze stages of nodule development, we collected nodules at 10 (young), 14 (mature), and 28 (old) d A proteomic atlas of the legume Medicago truncatula and its nitrogen-fixing endosymbiont Sinorhizobium meliloti Harald Marx 1,9 , Catherine E Minogue 1,9 , Dhileepkumar Jayaraman 2 , Alicia L Richards 1 , Nicholas W Kwiecien 1 , Alireza F Siahpirani 3 , Shanmugam Rajasekar 2 , Junko Maeda 4 , Kevin Garcia 4 , Angel R Del Valle-Echevarria 4 , Jeremy D Volkening 5 , Michael S Westphall 1 , Sushmita Roy 6,7 , Michael R Sussman 5 , Jean-Michel Ané 2,4 & Joshua J Coon 1,8 Legumes are essential components of agricultural systems because they enrich the soil in nitrogen and require little environmentally deleterious fertilizers. A complex symbiotic association between legumes and nitrogen-fixing soil bacteria called rhizobia culminates in the development of root nodules, where rhizobia fix atmospheric nitrogen and transfer it to their plant host. Here we describe a quantitative proteomic atlas of the model legume Medicago truncatula and its rhizobial symbiont Sinorhizobium meliloti, which includes more than 23,000 proteins, 20,000 phosphorylation sites, and 700 lysine acetylation sites. Our analysis provides insight into mechanisms regulating symbiosis. We identify a calmodulin-binding protein as a key regulator in the host and assign putative roles and targets to host factors (bioactive peptides) that control gene expression in the symbiont. Further mining of this proteomic resource may enable engineering of crops and their microbial partners to increase agricultural productivity and sustainability. 1 Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA. 2 Department of Agronomy, University of Wisconsin-Madison, Madison, Wisconsin, USA. 3 Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, USA. 4 Department of Bacteriology, University of Wisconsin-Madison, Madison, Wisconsin, USA. 5 Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA. 6 Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA. 7 Wisconsin Institute for Discovery, University of Wisconsin- Madison, Madison, Wisconsin, USA. 8 Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA. 9 These authors contributed equally to this work.Correspondence should be addressed to J.-M.A. ([email protected]) or J.J.C. ([email protected]). Received 3 June; accepted 23 August; published online 17 October 2016; corrected online 20 October 2016; doi:10.1038/nbt.3681 © 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

Upload: buingoc

Post on 28-Mar-2018

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

1198 VOLUME 34 NUMBER 11 NOVEMBER 2016 nature biotechnology

r e s o u r c e

To meet the food demand of a growing world population, current plant research aims to improve yield, nutrient content, and resist-ance to pathogens while enhancing agricultural sustainability and lowering environmental impacts. Leguminous plants are of special interest as they acquire nitrogen from the air through association with rhizobia, nitrogen-fixing soil bacteria, reducing the need for fertilizers. Rhizobia fix nitrogen within root nodules1,2, a special-ized organ where they encounter plant-derived factors that induce the differentiation of free-living rhizobia into bacteroids that express the nitrogenase and fix nitrogen in exchange for photosynthetically derived dicarboxylic acids3–8. The molecular mechanisms that enable legumes to host nitrogen-fixing symbionts are not well understood. A comprehensive understanding of this process may allow nitrogen fixation to be engineered into non-leguminous crops, including cereals such as maize, rice, and wheat.

Several large-scale studies have probed the molecular mechanisms of symbiosis between the model legume M. truncatula and one of the best-characterized rhizobial species, S. meliloti9–18. Recently, a large-scale genome-wide transcriptome analyses was conducted to evaluate organ-specific gene expression, especially during seed and nodule development in M. truncatula19. Further, the recently pub-lished genome sequence of M. truncatula considerably improved

the gene inventory20. We leveraged these resources, together with high-resolution mass spectrometry21,22 , and we generated a compre-hensive, quantitative atlas of protein expression of M. truncatula in association with S. meliloti. This proteomic atlas provides evidence for 23,013 protein groups (19,679 from the eukaryotic host plant, M. truncatula; 3,334 from S. meliloti) along with 20,120 phospho-rylation sites and 734 lysine acetylation sites. Using this resource, we discovered a core M. truncatula proteome and defined a subset of proteins that display organ-specific regulation. We identified organ-specific processes and assigned putative functions to several uncharacterized proteins. Finally, we generated a symbiosis-specific regulatory network and identified putative stages of host-factor expression and their endosymbiont targets.

RESULTSA protein and post-translational modification atlas of M. truncatula and its endosymbiont S. melilotiTo compile a comprehensive proteome profile of M. truncatula, we harvested plants and dissected them into seven major organs: flower, apical meristem (bud), leaf, stem, root, seed, and nodule (Supplementary Fig. 1a). To analyze stages of nodule development, we collected nodules at 10 (young), 14 (mature), and 28 (old) d

A proteomic atlas of the legume Medicago truncatula and its nitrogen-fixing endosymbiont Sinorhizobium melilotiHarald Marx1,9, Catherine E Minogue1,9, Dhileepkumar Jayaraman2, Alicia L Richards1, Nicholas W Kwiecien1, Alireza F Siahpirani3, Shanmugam Rajasekar2, Junko Maeda4, Kevin Garcia4, Angel R Del Valle-Echevarria4, Jeremy D Volkening5, Michael S Westphall1, Sushmita Roy6,7, Michael R Sussman5, Jean-Michel Ané2,4 & Joshua J Coon1,8

Legumes are essential components of agricultural systems because they enrich the soil in nitrogen and require little environmentally deleterious fertilizers. A complex symbiotic association between legumes and nitrogen-fixing soil bacteria called rhizobia culminates in the development of root nodules, where rhizobia fix atmospheric nitrogen and transfer it to their plant host. Here we describe a quantitative proteomic atlas of the model legume Medicago truncatula and its rhizobial symbiont Sinorhizobium meliloti, which includes more than 23,000 proteins, 20,000 phosphorylation sites, and 700 lysine acetylation sites. Our analysis provides insight into mechanisms regulating symbiosis. We identify a calmodulin-binding protein as a key regulator in the host and assign putative roles and targets to host factors (bioactive peptides) that control gene expression in the symbiont. Further mining of this proteomic resource may enable engineering of crops and their microbial partners to increase agricultural productivity and sustainability.

1Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA. 2Department of Agronomy, University of Wisconsin-Madison, Madison, Wisconsin, USA. 3Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, USA. 4Department of Bacteriology, University of Wisconsin-Madison, Madison, Wisconsin, USA. 5Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA. 6Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA. 7Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, USA. 8Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA. 9These authors contributed equally to this work.Correspondence should be addressed to J.-M.A. ([email protected]) or J.J.C. ([email protected]).

Received 3 June; accepted 23 August; published online 17 October 2016; corrected online 20 October 2016; doi:10.1038/nbt.3681

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 2: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

nature biotechnology VOLUME 34 NUMBER 11 NOVEMBER 2016 1199

r e s o u r c e

post-rhizobial inoculation, using previously established experi-mental conditions19. Peptides resulting from proteolytic digestion of the proteome were fractionated and analyzed by capillary liquid chromatography coupled with high-resolution tandem mass spec-trometry (nHPLC-MS/MS) (Supplementary Fig. 1b). In total, we performed 216 nHPLC-MS/MS experiments from which 10,322,254 MS/MS spectra were generated. Next, we constructed a non-redun-dant consensus protein sequence database that comprised both known (UniProt, JCVI, Ensembl Plants, and RefSeq) and novel gene models for M. truncatula (Wisconsin Medicago Group (WMG) data-base, 105,280 entries). To permit the detection of proteins originating from either rhizobial bacteria or M. truncatula, we concatenated the S. meliloti UniProt and WMG database to form a single, all-inclusive database. We correlated 2,565,645 MS/MS spectra to peptide sequence (1% false-discovery rate, FDR), providing evidence for 23,013 protein groups (1% protein FDR)—19,679 from M. truncatula and 3,334 from S. meliloti (Supplementary Table 1). Of the 19,679 M. truncatula protein groups, 17,213 could unambiguously be mapped to 15,813 known genes and 324 genes that had previously not been described. There were instances of peptide evidence for either intra- or intergenic regions that refined existing gene models (Supplementary Table 2)16. The WMG database enabled identification of 3.9–33.7% (JCVI and RefSeq) more proteins than any individual database.

We used affinity enrichment technologies to profile protein phosphorylation and lysine acetylation, which are important post- translational modifications (PTMs; Supplementary Fig. 1b). Altogether we detected 20,120 sites of phosphorylation and 734 sites of lysine acetylation, which were localized with single amino acid res-olution (localization probability > 0.75). The 324 previously unknown M. truncatula proteins contained 175 phosphorylation and five lysine acetylation sites. The distribution of identified phosphoryla-tion sites—86.2% pSer, 13.2% pThr, and 0.6% pTyr—is in the range previously observed in Mus musculus, Homo sapiens, Zea mays, Oryza sativa, Arabidopsis thaliana, and M. truncatula15,23–28. To measure protein expression profiles across organs and over the rhizobial infection time course, we employed both label-free (LFQ) and iso-baric label-based (TMT) quantification. LFQ in comparison to TMT afforded a greater proteomic depth (20,114 vs. 12,521 protein groups) and a higher dynamic range (Supplementary Fig. 2), whereas TMT mitigated low run-to-run overlap in PTM quantification by allowing comparison across six organs in a single MS analysis. The respective acquisition times were about 100 h (LFQ) and 150 h (TMT). The atlas is accessible at http://compendium.medicago.wisc.edu/.

Global characterization of protein expressionTo provide a bird’s-eye view of global protein expression, we constructed circular proteome maps (CPM, Fig. 1). While each M. truncatula organ expresses over 9,000 proteins (Fig. 1a, heat maps), a subset of proteins (5,701) was detected in all organs (Fig. 1a and Supplementary Fig. 3a). This “core” proteome is significantly enriched for proteins (FDR q < 0.01, Fisher’s exact test) involved in protein folding, pro-tein transport, glycolytic process, gluconeogenesis, and translation, pervasive processes used throughout the plant (Supplementary Fig. 3b). Further, members of the core proteome are highly expressed and share expression similarity from one organ to the next (mean r = 0.76, Fig. 1a, black histogram and Supplementary Fig. 3c). In fact, while the core proteome represents roughly half of all detected proteins in a given organ, it accounts for 80.3% of the total protein abundance (P = 3.076−08, Student’s t-test). Setting the core proteome aside, by correlating all protein expression across organs we revealed three groups: aerial (flower, meristem, leaf, and stem), subterraneous

(roots and nodules), and seeds (Supplementary Fig. 3c). In the nod-ule rhizobia, we identified about half (3,334) of the 6,308 unreviewed UniProt database entries; 2,831 (about 86%) of the detected proteins were expressed at all time points (Fig. 1b, heat maps).

To investigate the spatial and temporal specificity of protein expression and PTM regulation, we used Shannon entropy (Fig. 1 and Supplementary Fig. 4a); further, our multi-omics data sets enabled us to capture organ-specific and interplay of protein expres-sion (LFQ) and PTM regulation (TMT) (Supplementary Fig. 4b). Flowers, roots, and seeds have each the most specialized proteomes and are significantly enriched (FDR q < 0.01, Fisher’s exact test) for associated biological processes (Fig. 1a, red histogram). Proteins uniquely expressed in flower and root comprised invertase/pectin methylesterase inhibitors and pectinesterases (inhibitor), among others. These proteins are important for cell wall metabolism, plant development, and defense against pathogens. The flower pectineste-rases act in concert with polygalacturonase, UDP-d-glucuronate 4-epimerase and cellulose synthase in the starch and sucrose meta-bolism pathway. By comparing the phosphorylation sites to those of other protein family members (paralogs), we hypothesize that phosphorylation may deactivate those pathway members.

The specialized seed proteome encompasses well-known proteins with major roles of nutrient and lipid storage, for example, cupin, legu-min, glycinin, vicilin, allergen, and oleosins. Notably, cupin, legumin, vicilin, and oleosins displayed phosphorylation and/or lysine acetyla-tion that, to our knowledge, have not been described previously. We observed that most cupins (RmlC type) were phosphorylated in seeds and not in other organs, suggesting phosphorylation may play a role in the storage mechanism, possibly similar to oleosins during ger-mination29. The nodule proteomes contained specific proteins that could be linked to a wide range of functions (Fig. 1b, red histogram). Among the most commonly observed nodule proteins were transcrip-tional regulator families (e.g., PaaX, TetR, LysR) that probably act as transcription switches during nodule development30,31.

Post-translational modification regulation of the core proteomeSince protein PTMs provide a regulatory mechanism to modulate protein function, we wondered to what extent do proteins observed in all organs, i.e., the core proteome, have organ-specific and ubiquitous PTMs. We characterized organ-specific phosphorylation (Fig. 1a, blue histogram) and lysine acetylation (Fig. 1a, purple histogram) in the core proteome to gain insight into processes (FDR q < 0.01, Fisher’s exact test). These PTMs may facilitate organ-specific requirements using the same set of core proteins. For instance, in the root nodules, bacteroids rely on the host plant to provide organic acids, mainly malate, to produce the ATP necessary to fix atmospheric nitrogen32. In the core proteome, we observed that the malate dehydrogenase phosphorylation occupancy was increasing, suggesting a regulatory role during nodule maturation. Similarly, rhizobial proteins associ-ated with nitrogen-fixation were present across the rhizobial time course in the nitrogen fixation zone (Fig. 1b). PTM regulation in prokaryotes is not as common as in eukaryotes and is not well-studied (Supplementary Fig. 4b). We observed phosphorylation of crucial bacterial proteins in the nitrogen fixation process, including the nitrogenase alpha chain (F6BZI0, UniProt), nifH (F6BZA5, UniProt), nifX (F6BZI3, UniProt), and Ferredoxin III, nif-specific (F6BZI5, UniProt). Lysine acetylation supplements the regulatory repertoire on the nitrogenase alpha chain and nifH, as well as on nitrogenase cofactor biosynthesis protein nifB (F6BZ98, UniProt), nitrogenase molybdenum-iron beta chain (F6BZI1, UniProt), fixT (F6BZ96, UniProt), and fixX (F6BZA0, UniProt). Further experiments will be

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 3: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

1200 VOLUME 34 NUMBER 11 NOVEMBER 2016 nature biotechnology

r e s o u r c e

required to elucidate when these modifications occur, the protein kinases and acetylases responsible for them, and their function in the regulation of biological nitrogen fixation.

Functional characterization and annotation of the M. truncatula proteomeTo provide some basic functional characterization of proteins and PTMs (TMT; median CV = 15.6%, protein, 14.7%, phospho, 24.6%, acetyl), we applied variance analysis (FDR q < 0.01, ANOVA), hierar-chical clustering, and gene ontology enrichment (FDR q < 0.01, Fisher’s exact test). We leveraged this analysis to perform protein co-expression and PTM co-regulation analysis, concomitant with domain (PFAM) and sequence (BLAST) analysis, and achieved putative functional annotation of several uncharacterized proteins and predicted gene products (Fig. 2, Supplementary Fig. 5 and Supplementary Table 3). This approach linked known proteins to novel PTMs. In several cases, we discovered proteins and PTMs acting in concert. For example, to support the bacteroid nitrogenase enzyme, nodules regulate oxygen levels by expressing leghemoglobins33,34. Leghemoglobin Lb120-1 family members display similar protein expression profiles (Fig. 2, protein cluster 2) as well as multiple phosphorylation sites that had not been previously described (S44, S50, S55, S60, and S75, Fig. 2, phospho cluster 12). We identified 13 sites of phosphorylation

on these proteins, of which six are conserved in the legume Lotus japonicus. One of these sites, S44 in Lb120-1, corresponds to S45 in L. japonicus (Q3C1F7, UniProt), a site that has been associated with the regulation of oxygen binding in L. japonicus35.

We extended our analysis with a guilt-by-association approach36. This expression clustering revealed distinct groups of proteins of the ribosomal machinery as a highly conserved functional module (Fig. 2, protein cluster 7). One of the groups comprised the unchar-acterized proteins (00024160|MEDTR, 00013439|MEDTR, WMG), small (40S) and large (60S) ribosomal subunits, and other ribosomal proteins (Fig. 2, left). Beyond the correlated expression profiles, these uncharacterized genes had domains consistent with ribosomal function (MRP-S34 and Coilin, N-terminal, respectively), indicating putative roles in the mitochondrial ribosome and post-transcriptional modification of small nuclear and nucleolar RNAs. Another predicted gene product (00061383|MEDTR, WMG) clustered with other ribo-somal proteins (Fig. 2, left). Sequence and domain analysis revealed a PsbP domain (G7JBQ7, UniProt), but the molecule, which is a part of the photosystem II oxygen evolving complex, also contains a ribosomal S11 family domain (E-value = 5.5 × 10−31, PFAM).

Focusing solely on PTM, we identified another predicted gene product (00048984|MEDTR, WMG) with an RNA recognition motif (RRM 1; E-value = 2.5 × 10−14, PFAM) and two zinc knuckle

# organs protein identified in

19 8 7 3 26 5 4

Organ-specific protein

Organ-specific acetyl-site

Organ-specific phospho-site

Relative protein abundance

b

0

1,00

0

2,000

3,0000

1,000

2,00

0

3,0000

1,000

2,000

3,000

a

0 1,00

02,

000

3,00

04,

000

5,00

06,

000

7,00

08,

000

9,00

010

,000

11,0

000

1,00

0

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

11,000

12,000

13,000

01,0002,0003,000

4,0005,000

6,0007,0008,0009,00010,00011,0000

1,0002,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

11,00001,0002,

000

3,00

0

4,00

0

5,00

0

6,00

0

7,00

0

8,00

0

9,00

0 0

1,00

0

2,00

0

3,00

04,00

05,0006,0007,0008,0009,000

01,000

2,0003,000

4,0005,0006,0007,000

8,000

9,000

10,000

11,000 0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,00010,000

11,000 0

1,0002,000

3,0004,000

5,0006,000

7,0008,0009,000

Flower

Leaf

Root

Seed

Stem

Nod

ule

(day

14)

Nod

ule

(day

10)

Nodule (day 28)

Apical meristem

(day 10)

(day

28)

(day 14)Rhizobia

Rhizobia

Rhi

zobi

aFigure 1 In-depth proteome sequencing reveals organ-specific proteins and post-translational modifications. (a,b) Circular proteome maps depict the similarities and differences in the organ-specific proteomes acquired following proteome analysis of seven M. truncatula organs (apical meristem, flower, leaf, root, seed, stem, and nodules 10, 14, and 28 d past infection) (a) and nodule rhizobia (10, 14, and 28 d past infection) (b). The number of protein identifications associated with each organ is displayed by heat maps. The color gradient within these heat maps and respective links reflect the number of organs each protein identification is associated with, where the lightest green region represents the core proteome. The relative abundance of each protein within a given organ is represented by the black histograms. Organ-specific proteins, phosphorylation sites, and lysine acetylation sites are indicated by the red, blue, and purple histograms, respectively. Note that the length of each bar reflects the organ specificity of each protein or PTM site, i.e., longer bars represent a greater degree of specificity.

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 4: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

nature biotechnology VOLUME 34 NUMBER 11 NOVEMBER 2016 1201

r e s o u r c e

(zf-CCHC; E-value < 9.9 × 10−5, PFAM) domains and co-regulation through phosphorylation (S290, S189, and S227;S231) that is specific to seeds (P < 0.01, Tukey’s test; Fig. 2, left). All these co-regulated

phosphorylation sites are conserved in a close soybean homolog (Glycine soja, A0A0B2PP71, UniProt). Altogether these data suggest this protein plays a role in the spliceosome37.

Proteome dynamics and transcriptome co-expression analyses reveal a nodule-specific networkFrom the nodule development time series, we identified numerous significantly (FDR q < 0.01, Student’s t-test) regulated proteins and phosphorylation events in nodules in comparison to roots. These proteins encompassed well-known processes associated with nodule function such as nodulins, nodule senescence, oxygen transport, and immune response (Fig. 3a). Noteworthy upregulated proteins include Annexin D8 (ref. 38), which likely plays a similar role as MtAnn1 in the symbiotic association with rhizobia, a pentatricopeptide repeat (PPR) protein, and a serine/threonine kinase haspin39,40. Our results suggest these proteins are likely important for nodule growth and development (Fig. 3a).

To provide a backbone from which to leverage our proteomic atlas, we constructed an expression network using M. truncatula transcrip-tomic data. With this approach, we aimed to predict regulator–target relationships and to annotate proteins with an interaction and/or reg-ulatory context. The resulting inferred network connected 5,464 genes to 5,469 putative interactions (WMG global network, edge confidence > 0.75). With an emphasis on nodule biology, we overlaid nodule- specific proteins, their modifications, and a motif filter (Rxxs, sP, sPK, and sF) onto the global M. truncatula network (Supplementary Fig. 6). Many of these co-localized to one sub-network, which we extracted using a ‘greedy’ approach (Fig. 3b). The WMG Protein Atlas provides evidence for 200 of the 380 nodes of the predicted sub- network; further, 50 network members are modulated by PTM.

Several serine/threonine kinases (STK) and cell-wall-associated kinases occupy six out of ten hubs within our symbiosis sub-network (Fig. 3a). Perhaps the most striking lead stemming from this analysis is the potential regulatory role of a predicted calmodulin-binding protein (00034805|MEDTR, WMG). This calmodulin-binding protein has the highest degree of connectivity. Its transcript abun-dance is co-regulated with genes that are known to play critical roles in symbiosis, i.e., leghemoglobins, nodule-specific cysteine-rich peptides (NCR), and calcium-binding proteins. Our protein atlas revealed this calmodulin-binding protein is heavily phosphorylated (S42, S74, S416, S436, and S432) and either has a previously unrec-ognized translation start (with N-terminal acetylation) or is post- translationally processed (Supplementary Fig. 7). Calmodulin-binding proteins and kinases are classically involved in signaling pathways. For instance, periodic oscillations of the nuclear and perinuclear calcium concentration (calcium spiking) occur in the signaling process leading to nodule formation and are decoded by a calcium/calmodulin-dependent kinase (CCaMK). We conclude these candidate genes likely play a central role in regulating and maintaining symbiosis and will be prime targets for future functional analyses of nodule development.

Putative targets of host factors in the endosymbiont S. melilotiBeyond cataloging proteins and their PTMs, our Protein Atlas con-tains entries for 252 detected and quantified nodule-specific cysteine-rich (NCR) peptides. NCR peptides are similar to anti-microbial peptides and, when released by M. truncatula, act by permeabilizing bacterial membranes, disrupting cell division, and inhibiting protein synthesis—all for the purpose of inducing bacteroid development. Figure 4a presents the expression patterns and correlations of the NCR peptides at the different nodule developmental stages.

Nodule morphogenesis, metal ion binding

Oxygen binding and transport

Nutrient reservoir activity

NADH dehydrogenase (plastoquinone) activity,photosynthesis (PSN), light harvesting

Isopentenyl diphosphate, chlorophyll, andcarotenoid biosynthetic processes

Translation, intracellular protein transport

Peroxidase activity, response to oxidativestress and biotic stimulus

Threonine-type endopeptidase activity

Response to biotic stimulus, defense response

PhotosynthesisDNA binding

Chloroplast relocation, pentose phosphate shunt

Sequence-specific DNA binding transcriptionfactor activity

Phosphate-containing compound and glutamatemetabolic processes, ATPase activity

Monolayer-surrounded lipid storage body

ATP binding, protein Ser/Thr kinase activity

Lipid binding, regulation of transcription DNA-templated

Oxygen transport and binding, heme binding

Nutrient reservoir activity

Protein import into nucleus, protein transporter activity

Translation, Ribosome

Protein polymerization, microtubule-based process

Nodule morphogenesis, calcium-dependent phospholipid binding, TCA cycle

Photosynthesis

Enzyme inhibitor activity, pectinesterase activity,cell wall modification, pectin catabolic process

Inorganic diphophatase activity, phosphate-containing compound metabolic process

Phospho-isoforms (n = 11,101)

Acetyl-isoforms (n = 234)

log 2

(TM

T e

xpre

ssio

n)

Protein groups (n = 4,765)

Seed

Leaf

Stem

Root

Nodule

s

Flower

[1]

[2]

[4]

[5]

[6]

[7]

[9]

[10]

[11]

[1]

[2]

[3]

[5]

[6]

[7]

[8]

[10]

[12]

[1]

[4]

[5]

[7]

[8]

[10]

[11]

[12]

Cluster Id

Seed

Seed

−2

−1

0

1

2

Nodule

sRoo

t

Stem

Leaf

Flower

00024160|MEDTRMRP-S34 family

Nodule

sRoo

t

Stem

Leaf

Flower

−2

−1

0

1

2

3

00061383|MEDTRRibosomal S11 family

−2

0

2

4

6

8

Nodule

sRoo

t

Stem

Leaf

Flower

Seed

00048984|MEDTR - S18900048984|MEDTR - S227;S231

−2

0

2

4

6

8

Nodule

sRoo

t

Stem

Leaf

Flower

Seed

00048984|MEDTR - S290RRM 1 domainzf-CCHC domains

−4

−2

0

2

4

6

−6

8

00013439|MEDTRCoilin, N-terminal family

Figure 2 Functional characterization of proteins and post-translational modifications in M. truncatula. Heat maps are composed of proteins, phospho-isoforms and lysine acetyl-isoforms significantly changing (FDR q < 0.01, ANOVA) between six M. truncatula organs (n = 4,765, n = 11,101 and n = 234, respectively). Expression data were grouped by hierarchical clustering, both on the organ level and on the protein/PTM-isoform level. For each data set, proteins/PTM-isoforms were grouped into 12 clusters, each of which was subjected to gene ontology (GO) enrichment (FDR q < 0.01, Fisher’s exact test). Each heat map is associated with the top GO terms (right) significantly enriched within select clusters. Expression profiles (left) depict co-expressed and co-regulated (uncharacterized) proteins. Uncharacterized proteins and assigned PFAM domains are highlighted by WMG accession.

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 5: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

1202 VOLUME 34 NUMBER 11 NOVEMBER 2016 nature biotechnology

r e s o u r c e

The analysis revealed three distinct stages, indicative of temporal regulation to facilitate the symbiosis. Early-stage NCR peptides (clus-ters 1–3, Fig. 4a) may enable other NCRs to access the cytoplasm of S. meliloti. Only a few candidate molecules have so far been suggested to have this function10,41. Here we extend this concept with temporal expression evidence that implicates about 70 NCR peptides as having the putative function of increasing permeability42. The intermediate (cluster 4, Fig. 4a) and late (clusters 5–8, Fig. 4a) stage NCR peptides most likely target S. meliloti proteins. We hypothesize that these clus-ters reflect the role that the NCR peptides play in legume symbiosis— with those in the early stage driving nodule morphogenesis and those in the late stage driving symbiosis.

To investigate the targets of these NCR peptides, we next analyzed S. meliloti protein expression. The proteome (LFQ) comparison of old (28 d) to young (10 d) nodules showed up- or downregulation (log2 (|fold-change|) > 1.0) of almost 50% of the 3,334 proteins. Analysis of the most significant (FDR q ≤ 0.05, Fisher’s exact test) biological processes regulated within S. meliloti using gene ontology (GO) revealed the protein classes targeted by these NCR peptides. Downregulated proteins, almost 20% of all proteins, were enriched in DNA binding and transcription factor activity, among others (Fig. 4b). Previous studies identified some cell cycle regulators as targets of

NCR peptides43,44. Our data are in line with the hypothesis that NCR peptides could target specific protein families of transcriptional regulators (AraC, RpiR, LysR, TetR, GntR, and LuxR), proteins related to cell division (FtsZ, FtsW), and RNA polymerase cofactors (Fig. 4b). To better associate the downregulated and enriched proteins with possible NCR peptide interactors, we used the STRING (v9.1) protein–protein interactions and mapped these with BLAST (E-value < 1 × 10−5, sequence identity ≥ 95%) to S. meliloti, resulting in a network of 407 proteins and 1,172 interactions. This analysis revealed that the NifA protein is downregulated between 10 and 28 d, whereas other Nif and Fix proteins were constantly expressed45. As expected for a regulator, NifA displayed a high degree of connectivity (25 interac-tors) to symbiosis-relevant proteins; however, this analysis implicated several other interactors of NifA, including nodulation protein A, signal transduction histidine kinase (NtrB), uroporphyrinogen decarboxylase (HemE).

Other plant peptides and signal peptidasesIn addition to the NCR peptides described above, the WMG Protein Atlas contains evidence for a few hundred plant peptides—expressed genes whose mature protein product are mostly < 20 amino acids. These peptides are increasingly recognized as having critical roles

−5 0 5 10

DNA binding BRDClass II KNOXProtamine P1SEC14p

IQWSD1-likeCyclin T1

PGM

NAT

CaM-binding

CaM-binding

NOX

ALDH

LysM kinase K1B

PDR

SYP122

Auxin-regulated

Exo70

CDPKACT-like kinase

−5 0 5 10

8

6

4

2

0

8

6

4

2

0

log2 (nodule/root)

–log

10 (

adju

sted

P v

alue

)

SymbiosisUnknown function

NBS-LRR

PPR

Haspin

Annexin D8P4H alpha

SPP

SPC

HMG

HMG

P = 0.05

PGIPGH18

2OG-Fe(II) oxygenase family oxidoreductase

PE inhibitor

UDP-glucose 6-dehydrogenase

Annexin

P = 0.05

log2 (nodule/root)

–log

10 (

adju

sted

P v

alue

)

Protein

Phosphorylated protein

Acetylated protein

Protein not identified by MS

b

Latenodulation

Nodulesenescence

Embryo-specific

Calmodulin Kinase

Latenodulation

WAK

STK STK

CRK

LRR-RK

MTCaM-binding

PUP

IQ

STK STK

P4HMtN21

P4HSPC

LRR-RKPTR1

WAK

CPN1

Leghemoglobin

NA 0.5 6.51.5 2.5 3.5 4.5 5.5

log2 (nodule / root)

a

Figure 3 Nodule-specific proteins and post-translational modifications provide evidence for key regulators in symbiosis. (a) Volcano plots compare protein (top) and phospho-isoform (bottom) expression in nodules to root organs, indicating known proteins key to symbiosis, proteins of unknown function and candidates with a putative role in symbiosis. Known symbiosis proteins encompass processes such as nodule senescence, oxygen transport, and immune response, among others. (b) A nodule-specific sub-network was extracted from our global transcript co-expression network by mapping significantly (FDR q < 0.01, Student’s t-test) upregulated proteins and motifs from (a) to the network. The sub-network is organized based on the classification of hub genes, which were either calmodulin-like/calmodulin-binding or kinases. The color of each node reflects the fold-change from (a). Select gene families have been illustrated by dashed boxes. Genes not identified by our protein analysis have been faded.

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 6: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

nature biotechnology VOLUME 34 NUMBER 11 NOVEMBER 2016 1203

r e s o u r c e

in development, cell–cell communication, immune response, and symbiosis46. We detected several different peptide classes including stomagen, defensins, rapid alkalinization factors (RALF), S-locus cysteine-rich (SCR), and nodule- specific cysteine-rich (NCR). Active plant peptides result from cleavage of the N-terminal signal peptide from the full-length gene product, a process carried out by special-ized signal peptidases. One such signal peptidase, DNF1, is required for the differentiation of rhizobia into bacteroids. Even though most signal peptidases are ubiquitously expressed, we identified signal peptidases that are specific (P < 0.01, Tukey’s test, Fig. 2) to the leaf (Medtr2g058970.1, JCVI) and root nodule (AES99530, Ensembl Plants). Additionally, we localized two nodule-specific (FDR q < 0.01, Student’s t-test, Fig. 3a) signal peptidases, DNF1 (Medtr3g027890.1, JCVI) and a signal peptide peptidase-like protein (Medtr1g008280.2, JCVI), and suggest a possible role as an NCR peptidase (Fig. 3). Finally, another key element of this peptide-driven regulatory process are peptide transporters, of which, our nodule-specific network contains

two, MtN21, PTR1 (Fig. 3b). Finally, we provide evidence that NCR peptides are post-translationally modified by both phosphoryla-tion and lysine acetylation (Medtr6g043380.1, Medtr7g051320.1, Medtr6g463200.1, Medtr7g029760.1, Medtr3g065710.1, Medtr2g0443 30.1, Medtr4g033290.1, JCVI; AES72906, Ensembl Plants). Investigating the role of these proteins and the post-translational modification of these peptides will improve our understanding of molecular mecha-nisms allowing symbiotic nitrogen fixation.

DISCUSSIONWe have conducted a two-species-in-one large-scale quantitative proteomic analyses47, providing a unique view of protein and PTM expression across the major organs of the eukaryotic model legume M. truncatula and its prokaryotic endosymbiont S. meliloti. The resulting extensive data set will be a valuable resource for the study of legume biology and symbiosis, enabling the elucidation of organ-specific functionality and providing unique insights into protein

8765

31 2

Downegulated

Upregulated

Uncha

ngin

g−0.5

0.0

0.5

1.0

NC

R c

hang

es

−0.5

0.0

0.5

1.0

NC

R c

hang

es

10 14 28

Time (d)

−0.5

0.0

0.5

1.0

10 14

NC

R c

hang

es

Time (d)10 14 28

Time (d)

Low High

Cluster membership

0

–1

–0.5

0.5

1P

ears

on c

orre

latio

n

32

1

DNAreplication

UvrABC system

AraC family

LysR family

RpiR family

Adenylate cyclase

TetR family

NifA

NosR

NifB

NtrCFtsZ

DivL

FtsEGntRfamily

Lacl family

AsnC family

Winged helixfamily

IclR family

RNApolymerase

LuxR family

GreA/GreBfamily

FixX

FixTNusGRecR

NifN

NifE

NusA

NifH

NifD

RecQ MutL

LexA

ChvG

FixNAddB

MraZ

Ndk

NodA

PykA

Pnp

RecF

NtrB

GyrA

Dcd

DnaA

RecA

FtsW

CtrA

RpoD

0

–4

8

–1

–2

–3

1

2

3

4

5

6

7

log2 (day 28 / day 10)

Degree

1

2

3

4

5

6

7

8

10 14 28

Time (d)

28

4

10 14

Time (d)28 10 14

Time (d)28 10 14

Time (d)28 10 14

Time (d)28

a

b

Figure 4 Temporal stages of host-factor expression in M. truncatula and putative targets in S. meliloti. (a) Host factor time-course profiles obtained from the deep sequencing (LFQ) analysis (n = 252) were grouped by hierarchical (left) and fuzzy c-means clustering (right). Expression profiles were pairwise correlated (Pearson), clustered (hierarchical) by correlation coefficients and reordered by fuzzy clusters. Darker trace colors reflect stronger membership to the given fuzzy cluster. (b) S. meliloti downregulated (more than twofold) proteins from 10–28 d past rhizobial infection were gene ontology enriched (FDR q ≤ 0.05, Fisher’s exact test), extracted (shown along border) and mapped to their corresponding protein-protein interactions (STRING database, shown within inner network circle). The size of each protein node reflects its degree within the network and the color of each node reflects how its protein expression changed over the 18-day time course.

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 7: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

1204 VOLUME 34 NUMBER 11 NOVEMBER 2016 nature biotechnology

r e s o u r c e

(LFQ) and PTM (TMT) changes. The depth and dynamic range of the LFQ data set allows us to report a core proteome and our calcu-lation of Shannon entropy is robust to minimal variation to assess organ-specificity. TMT has well-known interference issues for com-plex mixtures, which we countered by reducing sample complexity with phospho- or lysine acetyl-peptide enrichment, performing experiments in biological triplicates, and rigorous test statistics to identify bona fide differential protein and PTM expression. Among the multitude of specific proteins involved, we have, for example, noted protein and/or PTM level changes in a large number of NCR peptides that play a role in the symbiotic interplay between host and bacterial cells. The proposed model that NCR peptides target cell cycle regulators, transcriptional regulators, proteins related to cell division, and RNA polymerase co-factors is in line with previous studies. We also observed downregulation of DnaA, CtrA, and FtsZ among others, which validates prior studies (Fig. 4b)43,44. Beyond symbiotic processes, we provide evidence for phosphorylation sites regulating photosynthesis, phototropism, and blue-light-sensing in leaves and stems48. Future studies in the model legume M. truncatula, and its bacterial endosymbionts will allow testing of these hypotheses through various experimental approaches, including protein–protein interactions, protein–DNA interactions, and reverse genetics. In the long term, this resource may also be leveraged to identify critical pro-teins and PTMs ultimately useful for engineering biological nitrogen fixation in non-leguminous crops such as cereals that do not naturally perform this function efficiently.

METHODSMethods and any associated references are available in the online version of the paper.

Accession codes. The raw mass spectrometry data was submitted via ProteomeXchange49 to the PRIDE repository (PXD002692)50.

Note: Any Supplementary Information and Source Data files are available in the online version of the paper.

ACKNoWLEDGMENtSWe thank T. Rhoads, K. Overmyer, and A. Hebert for fruitful discussions. This work was supported mainly by funds from the NSF IOS-PGRP Grants 1237936 and 1546742. S.R. gratefully acknowledges support from an NSF CAREER award (DBI 1350677). C.E.M. was supported by an NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359). A.L.R. gratefully acknowledges support from a National Institutes of Health-funded Genomic Sciences Training Program (5T32HG002760), the ACS Division of Analytical Chemistry, and the Society of Analytical Chemists of Pittsburgh (SACP).

AUtHoR CoNtRIBUtIoNSH.M., C.E.M., D.J., J.-M.A., and J.J.C. conceived and designed the study. D.J., S. Rajasekar, and J.M., provided the plant material. C.E.M., A.L.R., and M.S.W. performed the proteomics experiments. H.M. and C.E.M. analyzed the data. H.M., C.E.M., D.J., K.G., and A.R.D.V. interpreted the data. N.W.K. and H.M. built the website. A.F.S., J.D.V., and S. Roy generated the regulatory network. The paper was written by H.M., C.E.M., D.J., M.R.S., K.G., A.R.D.V., J.-M.A., and J.J.C. and was edited by all authors.

CoMPEtING FINANCIAL INtEREStSThe authors declare no competing financial interests.

reprints and permissions information is available online at http://www.nature.com/reprints/index.html.

1. Timmers, A.C., Auriac, M.C. & Truchet, G. Refined analysis of early symbiotic steps of the Rhizobium-Medicago interaction in relationship with microtubular cytoskeleton rearrangements. Development 126, 3617–3628 (1999).

2. Xiao, T.T. et al. Fate map of Medicago truncatula root nodules. Development 141, 3517–3528 (2014).

3. Gibson, K.E., Kobayashi, H. & Walker, G.C. Molecular determinants of a symbiotic chronic infection. Annu. Rev. Genet. 42, 413–441 (2008).

4. Mergaert, P. et al. A novel family in Medicago truncatula consisting of more than 300 nodule-specific genes coding for small, secreted polypeptides with conserved cysteine motifs. Plant Physiol. 132, 161–173 (2003).

5. Vasse, J., de Billy, F., Camut, S. & Truchet, G. Correlation between ultrastructural differentiation of bacteroids and nitrogen fixation in alfalfa nodules. J. Bacteriol. 172, 4295–4306 (1990).

6. Lauressergues, D. et al. Primary transcripts of microRNAs encode regulatory peptides. Nature 520, 90–93 (2015).

7. Oldroyd, G.E.D., Murray, J.D., Poole, P.S. & Downie, J.A. The rules of engagement in the legume-rhizobial symbiosis. Annu. Rev. Genet. 45, 119–144 (2011).

8. Van de Velde, W. et al. Aging in legume symbiosis. A molecular view on nodule senescence in Medicago truncatula. Plant Physiol. 141, 711–720 (2006).

9. Limpens, E. et al. cell- and tissue-specific transcriptome analyses of Medicago truncatula root nodules. PLoS One 8, e64377 (2013).

10. Maunoury, N. et al. Differentiation of symbiotic cells and endosymbionts in Medicago truncatula nodulation are coupled to two transcriptome-switches. PLoS One 5, e9519 (2010).

11. Lohar, D.P. et al. Transcript analysis of early nodulation events in Medicago truncatula. Plant Physiol. 140, 221–234 (2006).

12. El Yahyaoui, F. et al. Expression profiling in Medicago truncatula identifies more than 750 genes differentially expressed during nodulation, including many potential regulators of the symbiotic program. Plant Physiol. 136, 3159–3176 (2004).

13. Grimsrud, P.A. et al. Large-scale phosphoprotein analysis in Medicago truncatula roots provides insight into in vivo kinase activity in legumes. Plant Physiol. 152, 19–28 (2010).

14. Rose, C.M. et al. Medicago PhosphoProtein Database: a repository for Medicago truncatula phosphoprotein data. Front. Plant Sci. 3, 122 (2012).

15. Rose, C.M. et al. Rapid phosphoproteomic and transcriptomic changes in the rhizobia-legume symbiosis. Mol. Cell. Proteomics 11, 724–744 (2012).

16. Volkening, J.D. et al. A proteogenomic survey of the Medicago truncatula genome. Mol. Cell. Proteomics 11, 933–944 (2012).

17. Clarke, V.C. et al. Proteomic analysis of the soybean symbiosome identifies new symbiotic proteins. Mol. Cell. Proteomics 14, 1301–1322 (2015).

18. Durgo, H. et al. Identification of nodule-specific cysteine-rich plant peptides in endosymbiotic bacteria. Proteomics 15, 2291–2295 (2015).

19. Benedito, V.A. et al. A gene expression atlas of the model legume Medicago truncatula. Plant J. 55, 504–513 (2008).

20. Young, N.D. et al. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480, 520–524 (2011).

21. Senko, M.W. et al. Novel parallelized quadrupole/linear ion trap/Orbitrap tribrid mass spectrometer improving proteome coverage and peptide identification rates. Anal. Chem. 85, 11710–11714 (2013).

22. Hebert, A.S. et al. The one hour yeast proteome. Mol. Cell. Proteomics 13, 339–347 (2014).

23. Huttlin, E.L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010).

24. Sharma, K. et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 8, 1583–1594 (2014).

25. Nakagami, H. et al. Large-scale comparative phosphoproteomics identifies conserved phosphorylation sites in plants. Plant Physiol. 153, 1161–1174 (2010).

26. Walley, J.W. et al. Reconstruction of protein networks from an atlas of maize seed proteotypes. Proc. Natl. Acad. Sci. USA 110, E4808–E4817 (2013).

27. Roitinger, E. et al. Quantitative phosphoproteomics of the ataxia telangiectasia-mutated (ATM) and ataxia telangiectasia-mutated and rad3-related (ATR) dependent DNA damage response in Arabidopsis thaliana. Mol. Cell. Proteomics 14, 556–571 (2015).

28. van Wijk, K.J., Friso, G., Walther, D. & Schulze, W.X. Meta-analysis of Arabidopsis thaliana phospho-proteomics data reveals compartmentalization of phosphorylation motifs. Plant Cell 26, 2367–2389 (2014).

29. Deruyffelaere, C. et al. Ubiquitin-mediated proteasomal degradation of oleosins is involved in oil body mobilization during post-germinative seedling growth in Arabidopsis. Plant Cell Physiol. 56, 1374–1387 (2015).

30. Lang, C. & Long, S.R. Transcriptomic analysis of sinorhizobium meliloti and Medicago truncatula symbiosis using nitrogen fixation-deficient nodules. Mol. Plant Microbe Interact. 28, 856–868 (2015).

31. Moreau, M. et al. EDS1 contributes to nonhost resistance of Arabidopsis thaliana against Erwinia amylovora. Mol. Plant Microbe Interact. 25, 421–430 (2012).

32. Udvardi, M. & Poole, P.S. Transport and metabolism in legume-rhizobia symbioses. Annu. Rev. Plant Biol. 64, 781–805 (2013).

33. Ott, T. et al. Symbiotic leghemoglobins are crucial for nitrogen fixation in legume root nodules but not for general plant growth and development. Curr. Biol. 15, 531–535 (2005).

34. Dixon, R. & Kahn, D. Genetic regulation of biological nitrogen fixation. Nat. Rev. Microbiol. 2, 621–631 (2004).

35. Bhar, K. et al. Phosphorylation of leghemoglobin at S45 is most effective to disrupt the molecular environment of its oxygen binding pocket. Protein J. 34, 158–167 (2015).

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 8: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

nature biotechnology VOLUME 34 NUMBER 11 NOVEMBER 2016 1205

r e s o u r c e

36. Stuart, J.M., Segal, E., Koller, D. & Kim, S.K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

37. Gallardo, K., Le Signor, C., Vandekerckhove, J., Thompson, R.D. & Burstin, J. Proteomics of Medicago truncatula seed development establishes the time frame of diverse metabolic processes related to reserve accumulation. Plant Physiol. 133, 664–682 (2003).

38. Niebel, Fde.C., Lescure, N., Cullimore, J.V. & Gamas, P. The Medicago truncatula MtAnn1 gene encoding an annexin is induced by Nod factors and during the symbiotic interaction with Rhizobium meliloti. Mol. Plant Microbe Interact. 11, 504–513 (1998).

39. Barkan, A. & Small, I. Pentatricopeptide repeat proteins in plants. Annu. Rev. Plant Biol. 65, 415–442 (2014).

40. Kurihara, D., Matsunaga, S., Omura, T., Higashiyama, T. & Fukui, K. Identification and characterization of plant Haspin kinase as a histone H3 threonine kinase. BMC Plant Biol. 11, 73 (2011).

41. Nallu, S. et al. Regulatory patterns of a large family of defensin-like genes expressed in nodules of Medicago truncatula. PLoS One 8, e60355 (2013).

42. Tiricz, H. et al. Antimicrobial nodule-specific cysteine-rich peptides induce membrane depolarization-associated changes in the transcriptome of Sinorhizobium meliloti. Appl. Environ. Microbiol. 79, 6737–6746 (2013).

43. Farkas, A. et al. Medicago truncatula symbiotic peptide NCR247 contributes to bacteroid differentiation through multiple mechanisms. Proc. Natl. Acad. Sci. USA 111, 5183–5188 (2014).

44. Penterman, J. et al. Host plant peptides elicit a transcriptional response to control the Sinorhizobium meliloti cell cycle during symbiosis. Proc. Natl. Acad. Sci. USA 111, 3561–3566 (2014).

45. Gong, Z.Y., He, Z.S., Zhu, J.B., Yu, G.Q. & Zou, H.S. Sinorhizobium meliloti nifA mutant induces different gene expression profile from wild type in Alfalfa nodules. Cell Res. 16, 818–829 (2006).

46. Marshall, E., Costa, L.M. & Gutierrez-Marcos, J. Cysteine-rich peptides (CRPs) mediate diverse aspects of cell-cell communication in plant reproduction and development. J. Exp. Bot. 62, 1677–1686 (2011).

47. Poliakov, Anton et al. Large-scale label-free quantitative proteomics of the pea aphid-Buchnera symbiosis. Mol. Cell. Proteomics 10, M110.007039 (2011).

48. Inoue, S. et al. Blue light-induced autophosphorylation of phototropin is a primary step for signaling. Proc. Natl. Acad. Sci. USA 105, 5626–5631 (2008).

49. Vizcaíno, J.A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 (2014).

50. Vizcaíno, J.A. et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2013).

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 9: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

nature biotechnology doi:10.1038/nbt.3681

ONLINE METHODSPlant materials and growth. M. truncatula cv. Jemalong A17 plants were grown and harvested under similar conditions to those used for the gene expression atlas presented by Benedito et al.19 (2008). Seeds were acid scarified, surface sterilized, and vernalized for 3 d at 4 °C. Germinated seedlings were grown in pots containing Turface in the greenhouse. Non-inoculated plants, fertilized daily with half-strength BD solution51 containing 2 mM KNO3 and 2 mM NH4NO3, were used for the collection of organs such as roots, stems, leaves, flowers, shoot buds, and seeds. All the organs mentioned above, except for the flowers and seeds, were collected at 28 d after planting. For flowers and seeds, the initial vernalization was carried out for 2 weeks, and the organs were harvested on the first day of flower opening and at maturity, respectively. Simultaneously a group of seeds vernalized for 3 d at 4 °C and grown on Turface, inoculated with S. meliloti on the 1st and the 7th day after planting and fertilized with half- strength BD solution containing 0.5 mM KNO3, were used for the collection of 28-d-old nodules. For the collection of 10- and 14-d-old nodules, plants were grown under aeroponic conditions and nodules were harvested 10 and 14 d after inoculation. Multiple plants were used for the organ collection and pooled to represent the biological replicates.

Protein extraction. Organs were ground into a powder under liquid nitro-gen using a mortar and pestle. Ice-cold extraction buffer was added to each organ sample such that the volume of extraction buffer (290 mM sucrose, 250 mM TRIS (pH 7.6), 25 mM EDTA (pH 8.0), 10 mM KCl, 25 mM NaF, 50 mM Na pyrophosphate, 1 mM ammonium molybdate, 1 mM PMSF, 1 µg/mL leupeptin, 1 µg/mL pepstatin, 1 µg/mL aprotinin) added (mL) was equal to five times the dry weight (g) of the ground plant organ. Sample mixtures were vortexed, subjected to probe sonication continuously for 3 min, and then filtered through Miracloth. Homogenized organ material was snap frozen in liquid nitrogen and stored at −80 °C.

Protein was precipitated from organ extract using chloroform/methanol precipitation. One volume of chloroform was added to one volume of organ extract before vortexing. Three volumes of water were then added, followed by vortexing and subsequent centrifugation for 5 min (4,696g, 4 °C). The resulting top layer was removed and discarded with a serological pipet. Three volumes of methanol were then added to the solution, followed by vortexing and subsequent centrifugation for 5 min (4,696g, 4 °C). The resulting protein pellet was transferred to a 2 mL Eppendorf tube and washed 3× by vortex-ing with ice-cold 80% acetone and centrifuging for 3 min (10,000g, 4 °C). Following these washes, the protein pellet was dried on ice for ~1 h and then immediately lysed.

Protein lysis and digestion. Protein pellets were resuspended in ~1.25 mL of lysis buffer (8 M urea, 50 mM Tris-HCl (pH 8), 30 mM NaCl, 1 mM CaCl2, 20 mM sodium butyrate, 10 mM nicotinamide, mini EDTA-free protease inhibitor (Roche Diagnostics, Indianapolis, IN), and phosSTOP phosphatase inhibitor (Roche Diagnostics, Indianapolis, IN), and lysed on ice with a probe sonicator. Protein content was evaluated using a BCA assay (Thermo Fisher Scientific, San Jose, CA).

Proteins were reduced with 5 mM dithiothreitol (incubation at 58 °C for 40 min) and alkylated with 15 mM iodoacetamide (incubation in the dark, at ambient temperature, for 40 min). Alkylation was quenched by adding five mM dithiothreitol (incubation at ambient temperature for 15 min). Proteins were enzymatically digested in a two-step process. First, proteinase LysC (Wako Chemicals, Richmond, VA) was added to each sample at a ratio of 1:200 (enzyme:protein), and the resulting mixtures were incubated at 37 °C for 2.5 h. Next, samples were diluted to a final concentration of 1.5 M urea (pH 8) with a solution of 50 mM Tris and 5 mM CaCl2. Sequencing-grade trypsin (Promega, Madison, WI) was added to each sample at a ratio of 1:50 (enzyme:protein), and the resulting mixtures were rocked at ambient temperature overnight. Digests were quenched by bringing the pH~2 with trifluoroacetic acid and immediately desalted using C18 solid-phase extraction columns (SepPak, Waters, Milford, MA).

Protein isobaric labeling. Desalted material was labeled with TMT 10-plex isobaric tags (Thermo-Pierce, Rockford, IL). Only eight of the ten labels were used per experiment for the comparison of the eight M. truncatula organs.

Note that bud protein was only analyzed using deep proteome sequencing due to sample limitations. Before quenching the TMT reactions, ~5 µg of material from each TMT channel was combined into a test mix and analyzed by LC-MS/MS to evaluate labeling efficiency and obtain optimal ratios for sample recombination. Following quenching, tagged peptides were combined in equal amounts by mass (~750 µg per channel) and desalted. All experiments had ≥ 99% labeling efficiency, calculated by the number of labeled peptides divided by the total number of peptide identifications.

Peptide fractionation. TMT-labeled peptides were fractionated by strong cat-ion exchange (SCX) using a polysulfoethylaspartamide column (9.4 × 200 mm; PolyLC) on a Surveyor LC quaternary pump (Thermo Scientific). Buffer compositions were as follows: buffer A (5 mM KH2PO4, 30% acetonitrile (pH 2.65)), buffer B (5 mM KH2PO4, 350 mM KCl, 30% acetonitrile (pH 2.65)), buffer C (50 mM KH2PO4, 500 mM KCl (pH 7.5)). Each TMT mixture was resuspended in buffer A, injected onto the column, and subjected to the following gradient for separation: 100% buffer A from 0–2 min, 0–15% buffer from 2–5 min, and 15– 100% buffer B from 5–35 min. Buffer B was held at 100% for 10 min, and then the column was washed extensively with buffer C and water before recalibration. Flow rate was held at 3.0 mL/min throughout the separation. SCX samples were pooled into 12 fractions. Fractions were collected as follows: 1: 0–8 min, 2: 8–9.5 min, 3: 9.5–11 min, 4: 11–12.5 min, 5: 12.5–14 min, 6: 14–15.5 min, 7: 15.5– 17 min, 8: 17–20 min, 9: 20–23 min, 10: 23–26 min, 11: 26–32 min, 12: 32–42 min. Fractions were immediately frozen, lyophilized, and desalted. A small portion of each, 5%, was extracted and used for protein analysis. The remaining material was retained for phosphorylation and acetyl-lysine enrichment. Unlabeled peptides were fractionated by high pH reversed-phase using a column packed with the non- polar material (Gemini 5 µm C18 110 Å LC Column 250 × 4.6 mm, Phenomenex, Torrance, CA) on a Surveyor LC quaternary pump (Thermo Scientific). Buffer compositions were as follows: buffer A (pH 10.0) and buffer B (pH 10.0). Peptides from each organ were separately resuspended in buffer A, injected onto the column, and subjected to the following gradient for separation. Flow rate was held at 0.8 mL/min for the first 40 min of frac-tionation and 1.0 mL/min for the final 20 min. Forty fractions were collected between 6 and 46 min of the elution period, then recombined into ten fractions (combination scheme – final fraction 1: RP fractions 1, 11, 21, and 31; final fraction 2: RP fractions 2, 12, 22, and 32; …, final fraction 10: RP fractions 10, 20, 30, and 40). Recombined fractions were immediately frozen, lyophilized, and stored in at −80 °C until analyzed for deep proteome analysis.

Phosphopeptide enrichment. Phosphopeptides were enriched using immobilized metal affinity chromatography (IMAC) with magnetic beads (Qiagen, Valencia, CA). Following equilibration with water, the magnetic beads were incubated with 40 mM EDTA (pH 8.0) for 1 h, with shaking. Next, the beads were washed four times with water and incubated with 30 mM FeCl3 for 1 h, with shaking. Beads were then washed four times with 80% acetonitrile/0.15% trifluoroacetic acid (TFA). Each of the 12 SCX fractions was resuspended in 80% acetonitrile/0.15% TFA and incubated with the magnetic beads for 45 min, with shaking. Following this incubation, all unbound peptides were collected for subsequent acetyl-lysine enrichment. Bound peptides were washed three times with 80% acetonitrile/0.15% TFA and eluted with 50% acetonitrile, 0.7% NH4OH. Eluted peptides were immediately acidified with 4% FA, frozen, and lyophilized. Each phosphopeptide fraction was re-suspended in 20 µL 0.2% FA for LC-MS/MS analysis.

Acetyl-lysine enrichment. Peptides unbound by IMAC enrichment were combined into six fractions. The 12 SCX fractions were pooled as follows: 1: 0–9.5 min; 11–12.5 min, 2: 9–11.5 min, 3: 12.5–14 min, 4: 14–15.5 min, 5: 15.5–23 min, 6: 23–42 min. Enrichment was performed at a ratio of ~200 µg anti-acetyl-lysine beads per 1 mg of protein. Each fraction was dissolved in 50 mM HEPES (pH 7.5)/100 mM KCl buffer, combined with ~50 µL pan-acetyl-lysine antibody-agarose conjugate (Immunechem, ICP0388), and rotated overnight at 4 °C. Samples were rinsed eight times with 50 mM HEPES (pH 7.5)/100 mM KCl buffer before elution with 0.1% TFA. Eluted peptides were then desalted, frozen, and lyophilized. Each acetyl-lysine fraction was resuspended in 20 µL 0.2% FA for LC-MS/MS analysis.

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 10: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

nature biotechnologydoi:10.1038/nbt.3681

LC-MS/MS analysis. All experiments were performed using a NanoAcquity UPLC system (Waters, Milford, MA) coupled to an Orbitrap Fusion (Q-OT-qIT) mass spectrometer (Thermo Fisher Scientific, San Jose, CA). Reversed-phase columns were made in-house by packing a fused silica capillary (75 µm i.d., 360 µm o.d, with a laser-pulled electrospray tip) with 1.7 µm diameter, 130 Å pore size Bridged Ethylene Hybrid C18 particles (Waters) to a final length of 30 cm. The column was heated to 60 °C for all experiments. Samples were loaded onto the column for 12 min in 95:5 buffer A (water, 0.2% formic acid, and 5% DMSO):buffer B (acetonitrile, 0.2% formic acid, and 5% DMSO) at a flow-rate of 0.30 µL/min. Peptides were eluted using the following gradient: an increase to 7% B over 1 min, followed by a 42 min linear gradient from 7% to 18% B, followed by a 28 min linear gradient from 18% to 27% B, followed by a final 1 min ramp to 75% B which was held for 3 min. The column was equilibrated with 5% buffer B for an additional 25 min. Precursor peptide cations were generated from the eluent through the utilization of a nanoESI source. Phospho-enriched fractions were analyzed in duplicate.

Mass spectrometry instrument methods for TMT-labeled sample analy-sis consisted of MS1 survey scans (5 × 105 target value; 60,000 resolution; 350–1,400Th) that were used to guide subsequent data-dependent MS/MS scans analyzed in the Orbitrap (5 s cycle time, 1.0 Th isolation window, HCD fragmentation; normalized collision energy of 37; 5 × 104 target value, 60,000 resolution). Dynamic exclusion duration was set to 30 s, with a maximum exclusion list of 500 and an exclusion width of ± 10 p.p.m. the selected average mass. Maximum injection times were set to 100 ms for all MS1 scans, 100 ms for MS/MS scans in whole protein analyses, and 250 ms for MS/MS scans in phospho- and acetyl- enrichment analyses.

Mass spectrometry instrument methods for unlabeled sample analysis consisted of MS1 survey scans in the Orbitrap (5 × 105 target value; 60,000 resolution; 350– 1500 Th) that were used to guide subsequent data-dependent MS/MS scans analyzed in the ion trap (1.0 Th isolation window, higher-energy collision dissociation (HCD) fragmentation; normalized collision energy of 30; 1 × 104 target value, IT rapid scan). Dynamic exclusion duration was set to 30 s, with a maximum exclusion list of 500 and an exclusion width of ± 10 p.p.m. the selected average mass. Maximum injection times were set to 100 ms for all MS1 scans and 35 ms for MS/MS scans.

Data analysis. LFQ and TMT data sets were processed separately using the MaxQuant software (version 1.5.1.0)52. Searches were performed against a non-redundant target-decoy consensus database53 composed of four well-known M. truncatula protein sequence databases (UniProt, 2014.12.18 (ref. 54) Ensembl Plants, 2014.12.18 (ref. 55), RefSeq, 2014.12.18 (ref. 56), and JCVI, 2013.07.31 (ref. 57), de novo gene models and a rhizobial protein database (S. meliloti, UniProt, 2015.01.30). The gene models were predicted with Augustus58 based on the Mt4.0 genome assembly (GCA_000219495.2, Ensembl Plants) and A. thaliana training parameters. The consensus protein descriptions were assigned based on a pre-defined order (UniProt, JCVI, Ensembl Plants, and RefSeq). Searches were conducted using the default precursor mass toler-ances set by Andromeda (20 p.p.m. first search, 4.5 p.p.m. main search). LFQ data and TMT data were searched using a product mass tolerance of 0.35 Da and 0.015 Da, respectively. A maximum of two missed tryptic cleavages was allowed. The fixed modifications specified were carbamidomethylation of cysteine residues, TMT 10-plex on peptide N termini, and TMT 10-plex on lysine residues. The variable modifications specified were oxidation of methio-nine and TMT 10-plex on tyrosine residues. Additional variable modifications were specified for phosphopeptide analyses (phosphorylation of threonine, serine, and tyrosine residues) and acetyl-peptide analyses (acetylation of lysine residues). Note that the acetylation modification mass shift was set to −187.1523 Da to account for the difference between an acetyl group and a TMT 10-plex tag, which enables the use of TMT 10-plex as a fixed modification on lysine residues, even for acetylated peptides. For all experiments, peptides and their corresponding proteins groups were both filtered to a 1% FDR.

Label-free quantification was performed within MaxQuant using MaxLFQ59. Missing values were imputed using the Perseus60 tool available with MaxQuant by replacing missing values from a normal distribution to simulate low abundance values in a typical MS experiment. TMT quantification was per-formed within MaxQuant. Following MaxQuant analysis, noise-band capping and protein normalization was performed at the peptide level using in-house

software that employs strategies described previously61 (Supplementary Code). To consolidate the TMT experimental data for the implementation of statistical analyses, quantitative data from each experiment was log2 trans-formed and mean-normalized across all organs for each given protein or PTM isoform. Each column was also mean-normalized before statistical analyses to center the distribution of quantitative values obtained for each organ within each replicate on zero.

PTM localization was performed within the MaxQuant software suite62. Only those PTMs that could be localized to a specific amino acid residue with over 75% confidence were included in our analysis. PTM peptides were consoli-dated into PTM-isoforms as is performed in the COMPASS software suite63.

Bioinformatics. The circular proteome map (CPM) was constructed with Circos64 (Supplementary Code). For the CPM, organ LFQ values (Lo) were normalized for each protein (l) as follows: norm. l = l – min(Lo) / max(Lo) – min(Lo), with l ∈ Lo.

To assign organ specificity to modified and unmodified proteins, we used Shannon entropy as previously described with a 10% quantile threshold (Supplementary Fig. 4a)65. In brief, a relative expression value for each pro-tein was calculated, as follows:

rel. l l/ Lo= Σ

And an entropy index score (s) was calculated as

s = −1 2× ×Σ( log ( ))rel. l rel. l

A low score indicates organ specificity. In the case of modified proteins, we calculated scores for all isoforms for a protein and selected the isoform with the lowest score to represent the protein. For illustration purposes we inverted the scale, as follows:

′ = − =s n s nlog ( )2 with number of samples

All statistical analyses were performed using R66. Assessment of organ similar-ity using the LFQ data was generated by performing hierarchical clustering (Euclidean distance) on the Pearson correlation values calculated for each organ pair using the rcorr and corrplot package. ANOVA analyses were per-formed using only those proteins and PTM isoforms quantified across two or three replicates. For each data set, the P-values resulting from ANOVA were corrected for multiple hypotheses with false-discovery rate. Hierarchical clustering (Euclidean distance) was performed using only those proteins or PTM isoforms that were found to contain significant variance, as determined by ANOVA. Post hoc testing using the Tukey Honest Significant Differences (HSD) method was performed to evaluate organ specificity for each protein/PTM isoform found to contain significant variance. Heat maps were produced using the pheatmap package. Protein clusters were generated based on the protein dendrogram generated during the hierarchical clustering. The number of clusters produced following hierarchical clustering was user-defined and was selected based on observable heat map trends, cluster membership, and gene ontology (GO) enrichment (each of these factors was evaluated following the generation of 6, 8, 10, 12, 15, and 20 clusters for each data set; the cluster count that produced the most intuitive grouping was selected). GO enrichment was performed using the Fisher’s exact test and false-discovery rate correction. The comparison of protein/PTM isoform expression between the nodules and roots was performed by using the Student’s t-test and false-discovery rate cor-rection. Phospho-serine and acetyl-lysine motif analysis were performed using nodule-specific (FDR q < 0.01, Student’s t-test and fold-change > 2) phospho- and acetyl-isoforms, respectively. Motif analysis was performed using v1.2 of motif-x. Sequences were manually aligned and all M. truncatula proteins identified in our study were used as background. A width of 13 residues, the minimum number of 15 occurrences, and a significance of 0.001 were speci-fied before running motif-x. To identify NCR-specific expression stages, we combined hierarchical (hard) clustering and fuzzy c-means (soft) clustering (Mfuzz package in R). The global ordering of the hierarchical clustering was refined with the fuzzy clusters by reordering according to the three stages.

Network analysis. To infer the regulatory network for M. truncatula, we used an expression-based network inference algorithm, modular regulatory network learning with per gene information (MERLIN)67 algorithm within a stability

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

Page 11: A proteomic atlas of the legume Medicago truncatula and ... · PDF fileof symbiosis between the model legume . M. truncatula. and one of the best-characterized rhizobial species, S

nature biotechnology doi:10.1038/nbt.3681

selection framework68. MERLIN models a regulatory network as a statistical dependency network representing regulatory relationships among regulators (such as transcription factors, signaling proteins, post-translational modify-ing enzymes) and target genes as a set of statistical dependencies. For each gene, MERLIN learns a separate regulatory program with the constraint that genes with similar expression profile should have similar (but not necessarily identical regulatory programs). MERLIN takes as input candidate regulators and genome-wide expression levels and uses an iterative algorithm to learn the regulators for individual genes while imposing a module constraint. Briefly, in each iteration, MERLIN uses a greedy approach to update the regulatory program of each gene based on the expression of the regulators and module assignment of their targets, and then updates the module assignment using co- expression and co-regulation of the genes. Stability selection is a subsam-pling approach for estimating confidence on inferred network connections. MERLIN was applied on expression data downloaded from the M. truncatula Gene Expression Atlas (MtGEA)69 and created the expression data matrix. The data matrix has 254 samples from 33 experiments.

We quantile normalized70 the expression data and mapped the probeset IDs to Mt4.0 locus IDs. Briefly, we used BWA to align the probe sequences to the Mt4.0 reference transcriptome allowing no more than one mismatch in alignment. We calculated the specificity of each probeset/gene mapping based on the number of individual probes mapping to the target locus vs. off-target. We considered only those probeset/gene mappings such that there were at least five probes specific to the assigned gene and that at least 80% of the probes in the probeset mapped uniquely to this gene. We removed genes that did not vary significantly across samples. Specifically, we computed the mean for each gene, zero-mean transformed its expression level and included the gene only if its value varied at least ±1 from the mean in at least five samples. Our input included 15,643 genes, and 2,821 candidate regulators defined based on literature as well as sequence annotations to identify new kinases. These regulators included transcription factors, kinases, as well additional regula-tors associated with post-translational modifications and genes involved in hormone metabolism. To assess confidence in the inferred network connec-tions we used stability selection. Briefly, we created 40 sub-sets of the data set by randomly selecting 50% columns from the complete data set, and infer a MERLIN network for each sub-set. We estimate the confidence of each edge by calculating the percentage of times out of 40 we observed the edge in the inferred networks. The final inferred network was selected to be that which had an edge confidence of at least 60%. The network connected 1,248 regula-tors to 8,194 target genes. The inference process is time-consuming and it takes ~72 h to infer each of these networks. To speed up this process we used HTCondor71 high-throughput computing cluster in UW-Madison to execute the network inference runs in parallel.

Website. All relevant information from the search results and data analysis were imported into a MySQL database. The results were supplemented

by ordered gene locus identifiers and cross references to public databases. The interface to access the database was built with the Bootstrap front-end framework and data visualization implemented with the D3.js JavaScript library. The WMG Atlas, protein sequence database, mapping file to other public databases and the transcription regulatory network are available via http://compendium.medicago.wisc.edu/.

51. Broughton, W.J. & Dilworth, M.J. Control of leghaemoglobin synthesis in snake beans. Biochem. J. 125, 1075–1080 (1971).

52. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).

53. Marx, H., Lemeer, S., Klaeger, S., Rattei, T. & Kuster, B. MScDB: a mass spectrometry-centric protein sequence database for proteomics. J. Proteome Res. 12, 2386–2398 (2013).

54. Bairoch, A. et al. The Universal Protein Resource (UniProt). Nucleic Acids Res. 33, D154–D159 (2005).

55. Flicek, P. et al. Ensembl 2013. Nucleic Acids Res. 41, D48–D55 (2013).56. Tatusova, T., Ciufo, S., Fedorov, B., O’Neill, K. & Tolstoy, I. RefSeq microbial

genomes database: new representation and annotation strategy. Nucleic Acids Res. 42, D553–D559 (2014).

57. Tang, H. et al. An improved genome release (version Mt4.0) for the model legume Medicago truncatula. BMC Genomics 15, 312 (2014).

58. Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 (Suppl. 2), ii215–ii225 (2003).

59. Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526 (2014).

60. Tyanova, S. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016).

61. Phanstiel, D.H. et al. Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nat. Methods 8, 821–827 (2011).

62. Olsen, J.V. et al. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006).

63. Wenger, C.D., Phanstiel, D.H., Lee, M.V., Bailey, D.J. & Coon, J.J. COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA. Proteomics 11, 1064–1074 (2011).

64. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

65. Schug, J. et al. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33 (2005).

66. Ihaka, R. & Gentleman, R.R. A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).

67. Roy, S. et al. Integrated module and gene-specific regulatory inference implicates upstream signaling networks. PLoS Comput. Biol. 9, e1003252 (2013).

68. Meinshausen, N. & Bahlmann, P. Stability selection. J. R. Stat. Soc. Ser. A Stat. Soc. 72, 417–473 (2010).

69. He, J. et al. The Medicago truncatula gene expression atlas web server. BMC Bioinformatics 10, 441 (2009).

70. Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

71. Thain, D., Tannenbaum, T. & Livny, M. Distributed computing in practice: the Condor experience. Concurr. Comput. 17, 323–356 (2005).

© 2

016

Nat

ure

Am

eric

a, In

c., p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.