molecular cell review - mrc laboratory of molecular biology · molecular cell review a million...

9
Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa, 1,2, * Norman E. Davey, 3 Toby J. Gibson, 4 and M. Madan Babu 5, * 1 VIB Structural Biology Research Center (SBRC), Vrije Universiteit Brussel, Brussels 1050, Belgium 2 Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest 1519, Hungary 3 Department of Physiology, University of California, San Francisco, San Francisco, CA 94158, USA 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany 5 MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK *Correspondence: [email protected] (P.T.), [email protected] (M.M.B.) http://dx.doi.org/10.1016/j.molcel.2014.05.032 A molecular description of functional modules in the cell is the focus of many high-throughput studies in the postgenomic era. A large portion of biomolecular interactions in virtually all cellular processes is mediated by compact interaction modules, referred to as peptide motifs. Such motifs are typically less than ten residues in length, occur within intrinsically disordered regions, and are recognized and/or posttranslationally modified by structured domains of the interacting partner. In this review, we suggest that there might be over a million instances of peptide motifs in the human proteome. While this staggering number suggests that peptide motifs are numerous and the most understudied functional module in the cell, it also holds great opportu- nities for new discoveries. Introduction A molecular description of the full complement of functional modules of proteins in the cell, ranging from protein complexes to posttranslational modification sites (Figure 1), is the focus of many current high-throughput studies. Due to the relentless pace of the generation of novel information, a census of distinct molecular entities at various levels of description is needed to understand and appreciate how organisms use this basic toolkit to increase their functional versatility (Hartwell et al., 1999). In an early and seminal paper addressing this problem, Chothia (1992) suggested that there are about a thousand distinct protein folds in nature. For instance, the 22,000 or so protein coding genes in the human genome are estimated to contain 35,000 instances of folded domains belonging to about 10,000 families (Sammut et al., 2008). Their functions are often realized through interactions with other proteins, mediated by an estimated 10,000 distinct interaction types (Aloy and Russell, 2004), resulting in several hundred thou- sands of binary interactions (Hart et al., 2006; Stumpf et al., 2008; Venkatesan et al., 2009) and 600–1,000 distinct protein complexes (Havugimana et al., 2012; Levy et al., 2009; Perica et al., 2012). While the exact number of instances of these func- tional modules will likely be influenced by the nuances of the definition used and by biological processes such as alternative splicing, these estimates nevertheless represent a cornerstone of our concepts of modularity of protein structure, function, and evolution. In recent years it has become increasingly clear that the functional and molecular toolkit of eukaryotes extends beyond structured domains (Babu et al., 2012; Dyson and Wright, 2005; Tompa, 2012; Uversky and Dunker, 2010). A large portion of functional biomolecular interactions is mediated by compact interaction interfaces within intrinsically disordered regions (IDRs; i.e., segments of the polypeptide chain that do not autonomously fold into a defined tertiary structure) that are recognized and/or posttranslationally modified by structured domains of the interacting partner (Bhattacharyya et al., 2006; Davey et al., 2012a; Diella et al., 2008; Dinkel et al., 2014; Horn- beck et al., 2012). These short interaction interfaces (which we herein refer to as ‘‘peptide motifs’’ to include both binding motifs and posttranslational modification (PTM) sites (see next section) are typically less than ten residues in length and enable both high functional diversity and functional density to polypeptide segments containing them. On evolutionary time- scales, they can evolve very rapidly by appearing de novo or conversely disappearing with equal rapidity, conferring excep- tional evolutionary plasticity on the interactome (Davey et al., 2012b; Mosca et al., 2012). In this review, we discuss that an inventory of peptide motifs in proteins lags far behind that of domains and protein complexes. We then describe the current understanding of peptide motifs and provide estimates that there might be more than a hundred thousand—and possibly up to a million—instances in the human proteome. This overwhelming number implies that the quest for their identification and functional characterization is a formidable challenge. However, acquiring a deeper understanding of pep- tide motifs is imperative for gaining a more complete molecular description of the physiological and pathological processes of the cell. Attributes of Peptide Motifs The known instances of peptide motifs have been shown to mediate a wide range of important cellular tasks (Table 1). They typically fall into two major classes: binding motifs (Figures 2A–2C), which mediate interactions with globular domains, and posttranslational modification sites (Figure 2D), which are recog- nized and altered by modifying enzymes. These distinct terms (which might also be referred to as linear motifs [LMs] or short/ eukaryotic linear motifs [ELMs/SLiMs]) denote functional sites that often occur within disordered regions of proteins. The two terms are frequently used synonymously, although they describe somewhat distinct, but overlapping, sets of functional modules. Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc. 161

Upload: others

Post on 18-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Molecular Cell

Review

A Million Peptide Motifs for the Molecular Biologist

Peter Tompa,1,2,* Norman E. Davey,3 Toby J. Gibson,4 and M. Madan Babu5,*1VIB Structural Biology Research Center (SBRC), Vrije Universiteit Brussel, Brussels 1050, Belgium2Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest 1519, Hungary3Department of Physiology, University of California, San Francisco, San Francisco, CA 94158, USA4Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany5MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK*Correspondence: [email protected] (P.T.), [email protected] (M.M.B.)http://dx.doi.org/10.1016/j.molcel.2014.05.032

A molecular description of functional modules in the cell is the focus of many high-throughput studies in thepostgenomic era. A large portion of biomolecular interactions in virtually all cellular processes is mediated bycompact interactionmodules, referred to as peptidemotifs. Suchmotifs are typically less than ten residues inlength, occur within intrinsically disordered regions, and are recognized and/or posttranslationally modifiedby structured domains of the interacting partner. In this review, we suggest that there might be over a millioninstances of peptide motifs in the human proteome. While this staggering number suggests that peptidemotifs are numerous and the most understudied functional module in the cell, it also holds great opportu-nities for new discoveries.

IntroductionA molecular description of the full complement of functional

modules of proteins in the cell, ranging from protein complexes

to posttranslational modification sites (Figure 1), is the focus of

many current high-throughput studies. Due to the relentless

pace of the generation of novel information, a census of distinct

molecular entities at various levels of description is needed

to understand and appreciate how organisms use this basic

toolkit to increase their functional versatility (Hartwell et al.,

1999). In an early and seminal paper addressing this problem,

Chothia (1992) suggested that there are about a thousand

distinct protein folds in nature. For instance, the �22,000 or

so protein coding genes in the human genome are estimated

to contain 35,000 instances of folded domains belonging to

about 10,000 families (Sammut et al., 2008). Their functions

are often realized through interactions with other proteins,

mediated by an estimated 10,000 distinct interaction types

(Aloy and Russell, 2004), resulting in several hundred thou-

sands of binary interactions (Hart et al., 2006; Stumpf et al.,

2008; Venkatesan et al., 2009) and 600–1,000 distinct protein

complexes (Havugimana et al., 2012; Levy et al., 2009; Perica

et al., 2012). While the exact number of instances of these func-

tional modules will likely be influenced by the nuances of the

definition used and by biological processes such as alternative

splicing, these estimates nevertheless represent a cornerstone

of our concepts of modularity of protein structure, function, and

evolution.

In recent years it has become increasingly clear that the

functional and molecular toolkit of eukaryotes extends beyond

structured domains (Babu et al., 2012; Dyson and Wright, 2005;

Tompa, 2012; Uversky and Dunker, 2010). A large portion of

functional biomolecular interactions is mediated by compact

interaction interfaces within intrinsically disordered regions

(IDRs; i.e., segments of the polypeptide chain that do not

autonomously fold into a defined tertiary structure) that are

recognized and/or posttranslationally modified by structured

domains of the interacting partner (Bhattacharyya et al., 2006;

Davey et al., 2012a; Diella et al., 2008; Dinkel et al., 2014; Horn-

beck et al., 2012). These short interaction interfaces (which we

herein refer to as ‘‘peptide motifs’’ to include both binding

motifs and posttranslational modification (PTM) sites (see

next section) are typically less than ten residues in length and

enable both high functional diversity and functional density to

polypeptide segments containing them. On evolutionary time-

scales, they can evolve very rapidly by appearing de novo or

conversely disappearing with equal rapidity, conferring excep-

tional evolutionary plasticity on the interactome (Davey et al.,

2012b; Mosca et al., 2012).

In this review, we discuss that an inventory of peptide motifs in

proteins lags far behind that of domains and protein complexes.

We then describe the current understanding of peptide motifs

and provide estimates that there might be more than a hundred

thousand—and possibly up to a million—instances in the human

proteome. This overwhelming number implies that the quest for

their identification and functional characterization is a formidable

challenge. However, acquiring a deeper understanding of pep-

tide motifs is imperative for gaining a more complete molecular

description of the physiological and pathological processes of

the cell.

Attributes of Peptide MotifsThe known instances of peptide motifs have been shown to

mediate a wide range of important cellular tasks (Table 1).

They typically fall into two major classes: binding motifs (Figures

2A–2C), which mediate interactions with globular domains, and

posttranslational modification sites (Figure 2D), which are recog-

nized and altered by modifying enzymes. These distinct terms

(which might also be referred to as linear motifs [LMs] or short/

eukaryotic linear motifs [ELMs/SLiMs]) denote functional sites

that often occur within disordered regions of proteins. The two

terms are frequently used synonymously, although they describe

somewhat distinct, but overlapping, sets of functional modules.

Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc. 161

Page 2: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Figure 1. Functional Modules of ProteinsModularity of protein function is manifest at distinct levels, from the recurrent use of whole-protein elements (complexes and structural domains [folds]) to proteinparts (bindingmotifs andmodification sites). The figure highlights their typical length scale and estimated number of instances in the human proteome, as outlinedin detail in this manuscript.

Molecular Cell

Review

For binding motifs, amino acid sequence specificity determi-

nants can be described. These determinants are often discov-

ered by identifying recurring patterns (or motifs) of residues

from functionally equivalent peptide sequences. Posttransla-

tional modification sites are less homogeneous, and many

appear to lack specificity determinants outside of the modified

residue (e.g., protein phosphatase sites), although certain en-

zymes have recognizable sequence specificities (e.g., proline-

directed kinases) for the region they modify.

In the ELM database (http://elm.eu.org), the most comprehen-

sive repository of experimentally characterized peptide motifs,

there are about 2,475 documented instances of functional bind-

ing motifs and posttranslational modification sites in 1,547

proteins. These motifs are recognized and/or modified by 86

globular domain families, and the majority of them were discov-

ered by low-throughput binding assays and structure-function

studies of the domain-peptide motif complex (Dinkel et al.,

2014). The number of posttranslational modification sites

discovered by high-throughput studies (HTSs), however, is

significantly higher, in a range of well above 100,000 sites (Bel-

trao et al., 2012; Choudhary and Mann, 2010). The number of

structural domains known to participate in peptide-mediated

interactions has also rapidly increased in recent years (�200

domain families; Stein et al., 2011). The documented instances

of binding motifs, structural domains, and PTM sites form

much of the basis of our understanding of peptide motifs. None-

theless, as we discuss below, the number of actual instances of

peptide motifs and peptide-domain interactions in the entire

human proteome is likely to be significantly higher than what

has been documented in current databases.

Based on the known instances of peptide motifs, the general

features and evolutionary principles of binding motifs and PTM

sites are becoming increasingly clear (Davey et al., 2012b).

Binding motifs are on average 6–7 amino acids in length, with

only 3–4 core positions conferring the majority of the interaction

specificity (see consensus in Figure 2C). The limited binding

surface area results in low-affinity, transient, and modulatable

interactions, usually in the low micromolar range. PTM sites in

general have weaker intrinsic specificity determinants (Perkins

162 Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc.

et al., 2010; Van Roey et al., 2012). Often, experimental ap-

proaches allow only the detection of the modified residue. Sub-

sequent computational analysis is required to identify residues

complementary with the enzyme’s active site and infer the motif

directing the modification (Mahrus et al., 2008; Zielinska et al.,

2010).

As a result of the involvement of a small number of residues

in peptide motifs, these functional modules exhibit high evolu-

tionary plasticity, and they can arise de novo and evolve conver-

gently by one or a few mutations within disordered regions

(Davey et al., 2012b; Holt et al., 2009; Mosca et al., 2012). In

accordance, the peptide motifs are highly evolutionarily variable

and, compared to globular domains within the same protein

sequence, less conserved across large evolutionary distances.

Most domain-binding motifs in human proteins are not typically

conserved outside vertebrates, and PTM sites in particular are

rapidly gained and lost during evolution (Beltrao et al., 2012;

Holt et al., 2009; Kim et al., 2012), though a subset of motifs

involved in critical regulatory processes (e.g., in cell cycle; Galea

et al., 2008) can be conserved over large evolutionary distances

(Davey et al., 2012a, 2012b).

The amino acid composition of the polypeptide segment in

which peptide motifs are embedded shows strong resemblance

to that of IDRs (Fuxreiter et al., 2007). The majority of known

binding motifs and tens of thousands of PTM sites have been

discovered in predicted disordered regions (Beltrao et al.,

2012; Davey et al., 2012b). This preference likely reflects (a)

the accessibility of such motifs within disordered regions, which

allows recognition by their binding partner, and (b) the flexibility

of such motifs, which enables them to adopt a specific confor-

mation upon binding (Davey et al., 2012b; Galea et al., 2008).

Peptide Motifs Guide Proteins through Their LifePeptide motifs provide a functional interface in a compact

module that is structurally and functionally autonomous and

can emerge in a polypeptide sequence without much interfer-

ence with the structural and functional integrity of the rest of

the protein. This modularity, in combination with their regulatory

potential and their easy de novo generation, provides a simple

Page 3: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Table 1. Motif Instances and Their Regulatory Roles

Type Description

Examples of Binding

Partner Motif Function

Binding Motifs (Mediate Interactions with Globular Domains)

Classical binding motifs act as simple binding

modules, often

functioning in the

assembly of protein

complexes as scaffolding

modules

SH2 domain family pYXXf phosphodependent SH2

domain binding motif

UEV domain of TSG101 P[TS]AP Tsg101 UEV domain

binding motif

Functional Subsets

Trafficking motifs mediate binding to

protein necessary for

localization to a particular

subcellular location

exportin-1 fX{2,3}fX{2,3}fXf nuclear export signal

AP2 YXXf endocytic signal

Targeting motifs retain protein at a

subcellular location by

binding to a previously

localized protein

PCNA QXXfXX[FHT][FHY] replication fork

localization motif

microtubule-associated

protein EB family

[ST]X[IL]P microtubule end-binding

motif

Docking motifs recruit enzyme to a

protein and promote

modification at a site

distinct from the docking

site

cyclin family [RK]XLX{0,1}f CDK/cyclin kinase

complex docking motif

calcineurin PXIX[IV] calcineurin phosphatase

docking motif

Degron motifs a functional subset of

docking motifs that

controls the stability of a

protein by recruiting

ubiquitin ligases,

resulting in

polyubiquitination and

subsequent proteasomal

degradation

Cdc20 and Cdh1 APC/C

subunits

RXXLXXf APC/C-specific

degradation motif

Fbw7 SCF ubiquitin

ligase complex subunit

p[ST]PXXp[ST] phosphodependent SCF

degradation motif

Posttranslational Modification Sites (Sites Are Recognized and Altered by Modifying Enzymes)

Moiety addition

sites

recognized by modifying

enzymes for the

attachment or removal of

biochemical functional

groups

SUMO-conjugating

enzyme UBC9

f(K)X[DE] site SUMOylated by

SUMO ligase

Polo-like kinase family [DE]X([ST])f site phosphorylated by

Polo-like kinases

Cleavage sites recognized by proteases

as sites for proteolytic

cleavage

caspase-3 [DE]XXDj[GSAN] site of scission by

caspase protease

Structural modification

sites

recognized by enzymes

that catalyze the

structural conversion of a

region

peptidyl-prolyl

cis/trans-isomerase

Pin1

p[ST](P) site isomerised by

PIN prolyl

cis/trans-isomerase

f represents bulky hydrophobic residues; X represents any amino acid; p signifies a phosphorylation; () signifies a site of modification; j signifies a site

of cleavage; {} signifies howmany times the preceding residue can be used; and [] indicates the use one of the residues within the brackets to create a

peptide motif.

Molecular Cell

Review

evolutionary mechanism for generating proteins that can be sub-

jected to extensive regulation by cellular pathways involved in

protein homeostasis. Consequently, peptide motifs influence

the life of almost all proteins from the cradle to the grave,

regulating and coordinating their processing, localization, and

degradation, as illustrated through the example of the cell cycle

phosphatase CDC25A (Boutros et al., 2007). The biochemical

activity of this enzyme is modulated at multiple levels by its

long disordered region harboring numerous peptide motifs (see

Figure 2A), which mediate regulatory interactions with cognate

domains (Figure 2B). The newly synthesized protein is targeted

within the cell via localization signals (e.g., nuclear localization

signal [NLS] motif to target to the nucleus). The protein is post-

translationally modified both constitutively and conditionally on

peptide motifs that act as modification sites (e.g., Chk1 kinase

phosphorylation sites) or as docking sites (e.g., cyclin box).

Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc. 163

Page 4: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Figure 2. Binding Motifs and Posttranslational Modification Sites in Proteins(A) Architecture of functional modules in the cell cycle phosphatase CDC25A. The protein is largely intrinsically disordered, with the exception of the phosphatasecatalytic domain (residues 376–482, PDB ID: 1C25) in the C terminus. The protein contains multiple peptide motifs that regulate the protein from translation todegradation, such as intracellular localization (a nuclear localization signal [NLS], yellow), regulated degradation by polyubiquitination (degrons KEN box, red, anda phosphodependent noncanonical DSG degron, green), binding to cyclin (cyclin box, blue), and direct regulation of enzymatic activity (phosphodependent 14-3-3 binding motifs, orange). The phosphorylation state of CDC25A is controlled by multiple sites with specificities for various kinases including Chk1 and Cdk1.(B) Structures of instances of the peptide motif classes found in CDC25A, solved in other proteins that have convergently evolved the motifs, bound to theircorresponding protein domains. Peptide motifs are color coded according to the representation of the modular domain architecture (i.e., the order of appearanceof structured domains from N to C terminus in the protein) in (A). The structures from left to right are as follows: a CDK/cyclin docking motif from P53 bound tocyclin A (PDB ID:1H26), a SCFbTrCP degron from b-catenin bound to the SCF substrate recognition subunit bTrCP (PDB ID: 1P22), a KEN box from BUBR1 boundto the APC substrate recognition subunit CDC20 (PDB ID: 4GGD), a NLS from pRb bound to Importin alpha (PDB ID: 1PJM), and two 14-3-3 motifs from PKCεbound to a 14-3-3 dimer (PDB ID: 2WH0).(C) Convergently evolved instances of the regulatory motifs found in CDC25 that have been experimentally validated in other nonhomologous eukaryotic proteinsas documented in the Eukaryotic Linear Motif resource (ELM). Sequences from CDC25 are color coded according to the representation of the domain archi-tecture in (A). The major specificity-determining residues in the motifs are in bold and highlighted in light blue. Peptide motifs were validated in human unlessstated; F, peptide motifs validated in Xenopus laevis; *, sequence of protein used in the structural panel.(D) p300 is a transcription coactivator that regulates hundreds of transcription factors. The protein is 2,414 amino acids in length and contains nine domains(colored) and long connecting disordered regions (white) that constitute more than half of its length. UniProt (UniProt Consortium, 2013) and PhosphoSite(Hornbeck et al., 2012) list 123 PTMs in p300, such as phosphorylation (P, yellow), acetylation (A, blue), methylation, or double methylation (M, purple). Certainresidues undergo two types of modification, such as acetylation and sumoylation (S, pink) or acetylation and ubiquitination (U, green). As it appears, PTMs canoccur within disordered protein regions and also in folded domains, often in their exposed loop regions.

Molecular Cell

Review

The enzyme’s activity is controlled by peptide motifs, which

mediate recruitment of binding partners to form functionally

distinct complexes (e.g., 14-3-3 binding motif). In addition, its

stability and half-life is controlled by degradation motifs, result-

ing in its ubiquitination and degradation by the proteasome

(e.g., TrCP degron). The example of CDC25A highlighted here

illustrates how peptide motifs may form an essential part of the

homeostasis of many—if not all—proteins and are involved in

nearly all cellular processes. This premise makes a large number

of motif instances in the human proteome very likely.

High-Throughput Studies Suggest a Very High Numberof Binding Motif InstancesTransient interactionsmediated by short peptidemotifs are often

overlooked in classical proteomics methods of protein-protein

164 Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc.

interaction discovery that are based on tandem affinity purifica-

tion tag (TAP-tag) or yeast two-hybrid (Y2H) experiments, as

such methods are typically biased toward stable interactions

(Diella et al., 2008; Landry et al., 2013; Liu et al., 2012; Neduva

and Russell, 2006). Motif-domain interactions are inherently

weaker, generally promiscuous, and more transient than

domain-domain interactions (Schreiber et al., 2009; Zhou,

2012). The protein-protein interactions they mediate often

depend on other factors, such as the PTM status of the motif

and the abundance of the interacting protein, both of which are

not generally accounted for in experiments identifying interac-

tions (Landry et al., 2013). This may not be an issue for certain

protein domains, such as SH3 (Tonikian et al., 2009) and PDZ

(Belotti et al., 2013), as the interactions they mediate are rather

stable and are much better represented in HTSs. However, it

Page 5: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Molecular Cell

Review

becomes an important consideration for identifying PTM site-

mediated interactions using Y2H, as the relevant modifying

enzyme that modifies and activates the motif may not be ex-

pressed under standard experimental conditions (Liu et al.,

2012). The bias of human genome-scale Y2H interaction studies

in determining motif-mediated interaction is also highlighted by

the observation that only 1% of the detected interactions are

mediated by peptide motifs, whereas in certain instances, such

as the TGF-b pathway, more than half of the interactions are

actually mediated by peptide motifs (Neduva and Russell, 2006).

Still, recent HTSs and computational studies (e.g., direct

peptide library- and protein chip-based approaches of motif dis-

covery) indicate very high numbers of individual motif instances

(Liu et al., 2012; Neduva and Russell, 2005) in the human prote-

ome. Several distinct structural domain families in the human

proteome (e.g., SH2, SH3, PDZ, WW, and PTB domains) that

function through peptide motif recognition are represented mul-

tiple times; for example, there are approximately 270 PDZ, 280

SH3, and 110 SH2 domains in the human genome (te Velthuis

et al., 2011; Wang et al., 2010). Typically, each member recog-

nizes multiple, often similar, peptide motifs in different proteins.

For example, a single SH3 domain can bind peptide motifs in up

to 80 different partners (Landgraf et al., 2004), and a recent

analysis has shown that �600 different human proteins have a

C-terminal PDZ domain-binding motif (Kim et al., 2012). While

the peptide motifs occur frequently in multiple proteins, it will

be important to infer what proportion of these are actually func-

tionally relevant (see below). Given (i) the above-mentioned ob-

servations on the typical number of motif-mediated interactions

that protein domains participate in and (ii) that no interacting

peptide partner has yet been identified for about 75% of the

�10,000 structural domain families (Stein et al., 2011), we sug-

gest that many novel instances and classes of peptide-domain

interactions remain to be discovered.

Unbiased Estimates on the Number of Binding Motifs inthe Human ProteomeA simple census based on the foregoing fragmentary estimates

might project the number of binding motif instances in the range

of tens or even hundred(s) of thousands. Can we provide inde-

pendent evidence for this formidable number of binding motifs?

To overcome the statistical problem of sequence comparisons

involving peptide motifs, we devised two straightforward ways

of estimating the number of bindingmotif instances in the human

proteome, based on independent sequence/structure attributes.

Both estimates put the number of binding motifs above a hun-

dred thousand.

Binding motifs can be predicted by virtue of their relationship

with local structural disorder (Fuxreiter et al., 2007) and their ten-

dency to undergo induced folding upon binding to their partner

(Fuxreiter et al., 2004; Wright and Dyson, 2009). A predictor

based on thesepremises,a-MoRF-Pred II, identifies short recog-

nition features that adopt a-helical structure in the bound state

(a-MoRFs); the predictor estimates about 3 such motifs in every

1,000 residues in eukaryotes (Cheng et al., 2007). Assuming

�22,000 proteins of 500 amino acids on average, this estimation

suggests �33,000 a-MoRFs in the human proteome. Given that

only about 27%of thebindingmotifs are in ana-helical conforma-

tionwhenbound (Mohanet al., 2006) (the rest canbe inb strandor

another conformation), this allows us to estimate that the human

proteome might contain as many as 122,000 MoRFs.

A different calculation can be made using the ANCHOR pro-

gram that predicts likely binding motifs. ANCHOR works by

analyzing the sequence of a disordered region and estimating

the energy of interaction of such a region with a general interac-

tion partner (Meszaros et al., 2009). Proteome-scale ANCHOR

estimations suggested that higher eukaryotes have about 30%

of their residues in disordered regions, 60% of which are likely

to be involved in binding (Meszaros et al., 2009). By conservative

estimates on the human proteome as above, and with an upper

limit of 15 residues per motif (Tompa et al., 2009), a simple calcu-

lation (22,000 3 500 3 0.3 3 0.6 / 15) estimates a number of

about 132,000 putative binding motifs in the human proteome.

What about the Number of PosttranslationalModification Sites?These estimates all put the number of binding motifs above the

hundred thousand mark. Here we discuss that the number of

PTM sites is several times higher, and together there may easily

be a million instances of peptide motifs in the human proteome.

Eukaryotic proteins are subject tomore than 300 distinct types

of enzyme-catalyzed posttranslational modifications (Jensen,

2004), most of which have yet to be investigated in detail.

PTMs regulate many of the dynamic interactions of the interac-

tome by creating a new binding interface, modulating the affinity

of existing binding surfaces either positively or negatively, or

altering the activity of a folded domain (Beltrao et al., 2012; Bel-

trao et al., 2013; Nishi et al., 2011; Van Roey et al., 2012). Conse-

quently, a large portion of higher eukaryotic proteomes encodes

modifying enzymes. For example, hundreds of kinases, pepti-

dases, and ubiquitin ligases are encoded in the human genome

(Bhowmick et al., 2013; Manning et al., 2002). Any of these

enzymes can modify multiple sites on a large number of sub-

strate proteins. For instance, 6,000 classical NX[TS] glycosyla-

tion motifs are modified by a single oligosaccharyltransferase

complex (Zielinska et al., 2010), as many as 1,001 sites may

be phosphorylated by the acidophilic casein kinase II (Meggio

and Pinna, 2003), and 333 aspartate-directed cleavage sites,

described in 292 substrates, have been attributed to caspase-

like proteases alone (Mahrus et al., 2008). If a single enzyme

can modify such a large number of sites, the presence of such

large numbers of modifying enzymes in the human genome

therefore suggests the presence of very high numbers of PTM

sites. In fact, mass spectrometry (MS)-based HTS studies sug-

gest that we have only seen the tip of the iceberg as character-

ized by low-throughput studies: a recent analysis of human

modification sites cataloged 31,165 phosphorylation, 22,057

ubiquitination, 8,042 acetylation, and 6,422 proteolytic cleavage

sites (Beltrao et al., 2012), and in another study it was suggested

that there may be more than 100,000 sites in the phosphopro-

teome of human cells (Choudhary and Mann, 2010).

Unbiased Estimates of the Number of PTMs in theHuman ProteomeAn important limitation even of HTS studies is that these exper-

iments are generally carried out in a limited set of conditions; i.e.,

Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc. 165

Page 6: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Molecular Cell

Review

they never sample all possible cell states/conditions, and they

invariably introduce a systematic bias against chemically labile

modifications. Furthermore, we know very little about the stoichi-

ometry of the modifications, so many may be very rare and

rapidly demodified (Beltrao et al., 2013). To assess the real

numbers, one might refer to a few actual proteins that have

been extensively studied by a combination of high-throughput

and low-throughput methods due to their biological and/or

biomedical importance. For example, the central transcription

coactivator p300 is 2,414 residues in length and contains 123

identified PTMs (Figure 2D), such as phosphorylation, acetyla-

tion, methylation, ubiquitination, and sumoylation, as compiled

in the PhosphoSite (http://www.phosphosite.org) (Hornbeck

et al., 2012) and UniProt (www.uniprot.org) (UniProt Consortium,

2013) databases. Similar studies of p53 (62 PTMs/393 residues

in length), tau protein (123/758), BRCA1 (88/1863), a-synuclein

(22/140), c-Myc (45/439), Hdm2 (68/491), retinoblastoma protein

(72/928), and Rad18 (43/495) suggest that the PTM density of

the proteome (523/5,507) may be as high as one modification

in every ten residues. This suggests close to a million PTM in-

stances in the human proteome. Given that there are more

than 300 different PTM types (Jensen, 2004), of which we usually

focus on only a handful as mentioned above, this number does

not seem to be exaggerated.

Interplay between Binding Motifs and PTM SitesIncreases Regulatory PotentialBy all the above-mentioned estimates and considerations, the

full complement of peptide motifs (the ‘‘motifome’’) may well

reach into the millions in eukaryotes such as human. Many re-

gions will contain several distinct motifs, including overlapping

autonomous recognition sites, different types of modification

of the same residue by different enzyme classes, and the same

modification on a single site by distinct enzymes under different

cellular conditions. Furthermore, due to the promiscuity of pep-

tide motifs, which allows the recognition of the same binding

motif by multiple members of the same structural domain family,

the number of motif-mediated interactions by a protein region

can far exceed the number of instances of peptide motifs.

The issue of the real number of instances of peptide motifs is

further complicated by the complex interplay between the bind-

ing motifs and PTM sites, in which the two categories cooperate

and complement each other. A PTM within the binding motif, or

more often, in the immediate flanking regions, is a common

mechanism to regulate interactions mediated by a binding

motif—referred to as PTM switches (Van Roey et al., 2012,

2013). For example, the canonical nonconstitutive ligand motifs

(e.g., 14-3-3 binding sites) require modification of specific resi-

dues to be recognized by their recognition domains. Adjacent

or overlapping binding motifs can function competitively and

allow the manifestation of even more complex switches.

Conversely, the cooperative multivalent binding of low-affinity

binding motifs can recruit binding partners in a high-avidity inter-

action. Such conditional multisite interaction interfaces, which

are dependent on the state of the cell, result in complex regula-

tory mechanisms, which are exploited in information processing

and cellular signaling (Van Roey et al., 2012). Higher-order

signaling phenomena might also arise from phase transitions

166 Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc.

mediated by high motif valency (Li et al., 2012; Tompa, 2013).

Complex patterns of modifications at multiple PTM sites in the

interface region can integrate multiple signals to regulate an

interaction in either a binary or rheostatic manner (Wu, 2013).

Not unexpectedly, peptide motifs are enriched in nonconstitu-

tive, alternatively spliced exons, tuning the regulatory potential

of a protein by adding or removing peptide motifs (Buljan et al.,

2012; Romero et al., 2006; Weatheritt et al., 2012) and also facil-

itating the rewiring of the interactome in different tissues (Buljan

et al., 2012, 2013; Ellis et al., 2012).

Discussions of modular protein functions are somewhat

hampered due to the definition of function, as applied to genes

and gene products. For instance, function may be defined

differently by different people in the sense that no objective

definition is offered (Greenspan, 2011). In this sense, just like

the question of what is a gene and thus how many genes are

there in the genome (Pearson, 2006), the question of what is

a peptide motif and how many are there in the proteome is

rendered somewhat philosophical by the above-described

complex combinatorial interplay between nearby binding motifs

and between adjacent binding motifs and PTM sites, and also

due to the issue of defining function at different molecular and

cellular levels.

Are All Binding Motifs and PTM SitesFunctionally Relevant?Given the enormous plasticity in the evolution of binding motifs

and PTM sites, they may arise convergently and be lost rapidly

in a short timescale (Davey et al., 2011, 2012b; Holt et al.,

2009). Not surprisingly, many PTM sites and binding motifs are

not evolutionarily well conserved (Beltrao et al., 2012; Freschi

et al., 2011; Holt et al., 2009). This rapid evolution may give

rise to ‘‘noisy’’ interactions that may not have functional conse-

quence (Landry et al., 2013; Levy et al., 2009), thus raising an

important question: how many of these binding motifs, PTM

sites, and interactions mediated by these modules are function-

ally relevant? While all peptide motifs recognize their binding

partner, and it is unlikely that all these binding events are func-

tionally important and hence evolutionarily conserved across

diverse lineages, it has been noted that their rapid evolution

may lead to new interactions, upon which selection can operate.

This may eventually lead to functional rewiring of molecular inter-

actions and the emergence of species-specific phenotypes dur-

ing the course of organismal evolution (Buljan et al., 2013; Kim

et al., 2012; Mosca et al., 2012). Thus, absence of conservation

of some peptide motifs across orthologous proteins does not

mean they are nonfunctional. Depending on the complexity of

the proteome of an organism, specificity in peptide-mediated

interactions can be achieved by continually eliminating promis-

cuous, nonfunctional binding motifs by negative selection (Zar-

rinpar et al., 2003).

In accord, recent studies on individual proteins suggest that

nonconserved PTM sites may be functional (Holt et al., 2008).

A good example is Securin, an inhibitor of Separase, the prote-

ase responsible for the separation of sister chromatids. In yeast,

two Cdk1 phosphorylation sites inhibit the APC/C-mediated

ubiquitination of Securin. These sites protect Securin from

APC/C-mediated degradation until the appropriate point of the

Page 7: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Molecular Cell

Review

cell cycle when the Cdc14 phosphatase can remove the phos-

phate group (Holt et al., 2008). These Cdk1 phosphorylation

sites are important regulatory modifications in S. cerevisiae.

However, the phosphodependent stabilization mechanism is

absent outside of species closely related to yeast.

Taken together, in addition to the evolutionary conservation of

peptide motifs, the relevance of their occurrence in proteomes

needs to be characterized in the appropriate model organism

to assess their true functionality. A careful choice of experimental

design to test motif functionality is crucial, as their functionality is

sensitive to expression level andmay lead to incorrect inferences

about their functional relevance (Gibson et al., 2013). Therefore,

an important challenge for the community is to not only identify

binding motifs and PTM sites, but also functionally characterize

these peptide motifs by investigating them in the right biological

context.

Conclusions and Future DirectionsAll the points raised here highlight the exceptional evolutionary

agility and functional plasticity of peptide motifs. They are evolu-

tionarily, functionally, dynamically, and contextually different

from other types of functional protein modules, filling a distinct

and self-evidently essential gap in the continuum of functional

protein module types (Figure 1).

When we consider the functional repertoire of peptide motifs,

the possibility of de novo evolution, and their propensity to be

enriched in alternatively spliced regions and IDRs, the evidence

strongly suggests that they play an important role in creating

functional diversity in the proteome (Buljan et al., 2013; Weather-

itt et al., 2012; Weatheritt and Gibson, 2012). On an evolutionary

timescale, mutations in IDRs with embedded binding motifs

facilitate rewiring of interactomes and tuning of regulation

through the gain and loss of binding motifs and modification

sites, thereby enabling the emergence of complexity and new

phenotypes (Beltrao et al., 2009; Buljan et al., 2012, 2013;Mosca

et al., 2012). If mutations disrupt binding peptides and/or PTM

sites, they may result in a disease outcome, e.g., Noonan syn-

drome, Usher syndrome, cherubism, and aberrant signaling in

cancer (Guettler et al., 2011; Pajkos et al., 2012; Reimand and

Bader, 2013; Vacic et al., 2012). Their evolutionary plasticity is

also exploited by invading pathogens, such as viruses, which

often mimic peptide motifs of their host proteins to hijack their

cellular pathways (Davey et al., 2011; Hagai et al., 2014).

Identifying peptide motifs through sequence similarity

searches is generally subject to high levels of statistical uncer-

tainty, and they are elusive to identify experimentally. However,

to gain a better and more complete description of the complex

physiological and pathological processes of the cell, much

more focus should be placed not only on identifying them, but

also establishing their functionality through a combination of

high- and low-throughput studies. A particular emphasis should

be placed on identifying and functionally characterizing species-

specific peptide motifs that are not evolutionarily conserved, but

may be relevant, by studying them in the right biological context

and in the appropriate model system. This endeavor will have

rich reward not only in terms of a better understanding of cell

physiology, but also in our ability to modulate the regulation of

therapeutically important proteins.

ACKNOWLEDGMENTS

The authors thank Dr. Angela Bekesi for preparing Figure 2D and Tilman Flock,Natasha Latysheva, and Robert Weatheritt for their comments on ourmanuscript. This work was supported by the Odysseus grant G.0029.12from Research Foundation Flanders (FWO) to P.T. M.M.B. acknowledgesthe Medical Research Council (MC_U105185859), HFSP (RGY0073/201),and EMBO YI for funding his research.

REFERENCES

Aloy, P., and Russell, R.B. (2004). Ten thousand interactions for the molecularbiologist. Nat. Biotechnol. 22, 1317–1321.

Babu,M.M., Kriwacki, R.W., and Pappu, R.V. (2012). Structural biology. Versa-tility from protein disorder. Science 337, 1460–1461.

Belotti, E., Polanowska, J., Daulat, A.M., Audebert, S., Thome, V., Lissitzky,J.C., Lembo, F., Blibek, K., Omi, S., Lenfant, N., et al. (2013). The humanPDZome: a gateway to PSD95-Disc large-zonula occludens (PDZ)-mediatedfunctions. Mol. Cell. Proteomics 12, 2587–2603.

Beltrao, P., Trinidad, J.C., Fiedler, D., Roguev, A., Lim, W.A., Shokat, K.M.,Burlingame, A.L., and Krogan, N.J. (2009). Evolution of phosphoregulation:comparison of phosphorylation patterns across yeast species. PLoS Biol. 7,e1000134.

Beltrao, P., Albanese, V., Kenner, L.R., Swaney, D.L., Burlingame, A., Villen, J.,Lim,W.A., Fraser, J.S., Frydman, J., and Krogan, N.J. (2012). Systematic func-tional prioritization of protein posttranslational modifications. Cell 150,413–425.

Beltrao, P., Bork, P., Krogan, N.J., and van Noort, V. (2013). Evolution andfunctional cross-talk of protein post-translational modifications. Mol. Syst.Biol. 9, 714.

Bhattacharyya, R.P., Remenyi, A., Yeh, B.J., and Lim, W.A. (2006). Domains,motifs, and scaffolds: the role of modular interactions in the evolution andwiring of cell signaling circuits. Annu. Rev. Biochem. 75, 655–680.

Bhowmick, P., Pancsa, R., Guharoy, M., and Tompa, P. (2013). Functionaldiversity and structural disorder in the human ubiquitination pathway. PLoSONE 8, e65443.

Boutros, R., Lobjois, V., and Ducommun, B. (2007). CDC25 phosphatases incancer cells: key players? Good targets? Nat. Rev. Cancer 7, 495–507.

Buljan, M., Chalancon, G., Eustermann, S., Wagner, G.P., Fuxreiter, M., Bate-man, A., and Babu, M.M. (2012). Tissue-specific splicing of disordered seg-ments that embed binding motifs rewires protein interaction networks. Mol.Cell 46, 871–883.

Buljan, M., Chalancon, G., Dunker, A.K., Bateman, A., Balaji, S., Fuxreiter,M., and Babu, M.M. (2013). Alternative splicing of intrinsically disorderedregions and rewiring of protein interactions. Curr. Opin. Struct. Biol. 23,443–450.

Cheng, Y., Oldfield, C.J., Meng, J., Romero, P., Uversky, V.N., and Dunker,A.K. (2007). Mining alpha-helix-forming molecular recognition features withcross species sequence alignments. Biochemistry 46, 13468–13477.

Chothia, C. (1992). Proteins. One thousand families for the molecular biologist.Nature 357, 543–544.

Choudhary, C., and Mann, M. (2010). Decoding signalling networks by massspectrometry-based proteomics. Nat. Rev. Mol. Cell Biol. 11, 427–439.

Davey, N.E., Trave, G., and Gibson, T.J. (2011). How viruses hijack cell regu-lation. Trends Biochem. Sci. 36, 159–169.

Davey, N.E., Cowan, J.L., Shields, D.C., Gibson, T.J., Coldwell, M.J., andEdwards, R.J. (2012a). SLiMPrints: conservation-based discovery of func-tional motif fingerprints in intrinsically disordered protein regions. NucleicAcids Res. 40, 10628–10641.

Davey, N.E., Van Roey, K., Weatheritt, R.J., Toedt, G., Uyar, B., Altenberg, B.,Budd, A., Diella, F., Dinkel, H., and Gibson, T.J. (2012b). Attributes of shortlinear motifs. Mol. Biosyst. 8, 268–281.

Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc. 167

Page 8: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Molecular Cell

Review

Diella, F., Haslam, N., Chica, C., Budd, A., Michael, S., Brown, N.P., Trave, G.,and Gibson, T.J. (2008). Understanding eukaryotic linear motifs and their rolein cell signaling and regulation. Front. Biosci. 13, 6580–6603.

Dinkel, H., Van Roey, K., Michael, S., Davey, N.E., Weatheritt, R.J., Born, D.,Speck, T., Kruger, D., Grebnev, G., Kuban, M., et al. (2014). The eukaryoticlinearmotif resource ELM: 10 years and counting. Nucleic Acids Res. 42 (Data-base issue), D259–D266.

Dyson, H.J., and Wright, P.E. (2005). Intrinsically unstructured proteins andtheir functions. Nat. Rev. Mol. Cell Biol. 6, 197–208.

Ellis, J.D., Barrios-Rodiles, M., Colak, R., Irimia, M., Kim, T., Calarco, J.A.,Wang, X., Pan, Q., O’Hanlon, D., Kim, P.M., et al. (2012). Tissue-specific alter-native splicing remodels protein-protein interaction networks. Mol. Cell 46,884–892.

Freschi, L., Courcelles, M., Thibault, P., Michnick, S.W., and Landry, C.R.(2011). Phosphorylation network rewiring by gene duplication. Mol. Syst.Biol. 7, 504.

Fuxreiter, M., Simon, I., Friedrich, P., and Tompa, P. (2004). Preformed struc-tural elements feature in partner recognition by intrinsically unstructuredproteins. J. Mol. Biol. 338, 1015–1026.

Fuxreiter, M., Tompa, P., and Simon, I. (2007). Local structural disorderimparts plasticity on linear motifs. Bioinformatics 23, 950–956.

Galea, C.A., Wang, Y., Sivakolundu, S.G., and Kriwacki, R.W. (2008). Regula-tion of cell division by intrinsically unstructured proteins: intrinsic flexibility,modularity, and signaling conduits. Biochemistry 47, 7598–7609.

Gibson, T.J., Seiler, M., and Veitia, R.A. (2013). The transience of transientoverexpression. Nat. Methods 10, 715–721.

Greenspan, N.S. (2011). Attributing functions to genes and gene products.Trends Biochem. Sci. 36, 293–297.

Guettler, S., LaRose, J., Petsalaki, E., Gish, G., Scotter, A., Pawson, T., Rotta-pel, R., and Sicheri, F. (2011). Structural basis and sequence rules for substraterecognition by Tankyrase explain the basis for cherubism disease. Cell 147,1340–1354.

Hagai, T., Azia, A., Babu, M.M., and Andino, R. (2014). Use of host-like peptidemotifs in viral proteins is a prevalent strategy in host-virus interactions. CellRep 7, 1729–1739.

Hart, G.T., Ramani, A.K., andMarcotte, E.M. (2006). How complete are currentyeast and human protein-interaction networks? Genome Biol. 7, 120.

Hartwell, L.H., Hopfield, J.J., Leibler, S., andMurray, A.W. (1999). Frommolec-ular to modular cell biology. Nature 402 (Suppl ), C47–C52.

Havugimana, P.C., Hart, G.T., Nepusz, T., Yang, H., Turinsky, A.L., Li, Z.,Wang, P.I., Boutz, D.R., Fong, V., Phanse, S., et al. (2012). A census of humansoluble protein complexes. Cell 150, 1068–1081.

Holt, L.J., Krutchinsky, A.N., and Morgan, D.O. (2008). Positive feedbacksharpens the anaphase switch. Nature 454, 353–357.

Holt, L.J., Tuch, B.B., Villen, J., Johnson, A.D., Gygi, S.P., and Morgan, D.O.(2009). Global analysis of Cdk1 substrate phosphorylation sites providesinsights into evolution. Science 325, 1682–1686.

Hornbeck, P.V., Kornhauser, J.M., Tkachev, S., Zhang, B., Skrzypek, E.,Murray, B., Latham, V., and Sullivan, M. (2012). PhosphoSitePlus: a compre-hensive resource for investigating the structure and function of experimentallydetermined post-translational modifications in man and mouse. Nucleic AcidsRes. 40 (Database issue), D261–D270.

Jensen, O.N. (2004). Modification-specific proteomics: characterization ofpost-translational modifications by mass spectrometry. Curr. Opin. Chem.Biol. 8, 33–41.

Kim, J., Kim, I., Yang, J.S., Shin, Y.E., Hwang, J., Park, S., Choi, Y.S., and Kim,S. (2012). Rewiring of PDZ domain-ligand interaction network contributed toeukaryotic evolution. PLoS Genet. 8, e1002510.

Landgraf, C., Panni, S., Montecchi-Palazzi, L., Castagnoli, L., Schneider-Mer-gener, J., Volkmer-Engert, R., and Cesareni, G. (2004). Protein interactionnetworks by proteome peptide scanning. PLoS Biol. 2, E14.

168 Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc.

Landry, C.R., Levy, E.D., Abd Rabbo, D., Tarassov, K., and Michnick, S.W.(2013). Extracting insight from noisy cellular networks. Cell 155, 983–989.

Levy, E.D., Landry, C.R., and Michnick, S.W. (2009). How perfect can proteininteractomes be? Sci. Signal. 2, pe11.

Li, P., Banjade, S., Cheng, H.C., Kim, S., Chen, B., Guo, L., Llaguno, M., Hol-lingsworth, J.V., King, D.S., Banani, S.F., et al. (2012). Phase transitions in theassembly of multivalent signalling proteins. Nature 483, 336–340.

Liu, B.A., Engelmann, B.W., and Nash, P.D. (2012). High-throughput analysisof peptide-binding modules. Proteomics 12, 1527–1546.

Mahrus, S., Trinidad, J.C., Barkan, D.T., Sali, A., Burlingame, A.L., and Wells,J.A. (2008). Global sequencing of proteolytic cleavage sites in apoptosis byspecific labeling of protein N termini. Cell 134, 866–876.

Manning, G.,Whyte, D.B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002).The protein kinase complement of the human genome. Science 298,1912–1934.

Meggio, F., and Pinna, L.A. (2003). One-thousand-and-one substrates ofprotein kinase CK2? FASEB J. 17, 349–368.

Meszaros, B., Simon, I., and Dosztanyi, Z. (2009). Prediction of protein bindingregions in disordered proteins. PLoS Comput. Biol. 5, e1000376.

Mohan, A., Oldfield, C.J., Radivojac, P., Vacic, V., Cortese, M.S., Dunker, A.K.,and Uversky, V.N. (2006). Analysis of molecular recognition features (MoRFs).J. Mol. Biol. 362, 1043–1059.

Mosca, R., Pache, R.A., and Aloy, P. (2012). The role of structural disorder inthe rewiring of protein interactions through evolution. Mol. Cell. Proteomics11, 014969.

Neduva, V., and Russell, R.B. (2005). Linear motifs: evolutionary interactionswitches. FEBS Lett. 579, 3342–3345.

Neduva, V., and Russell, R.B. (2006). Peptidesmediating interaction networks:new leads at last. Curr. Opin. Biotechnol. 17, 465–471.

Nishi, H., Hashimoto, K., and Panchenko, A.R. (2011). Phosphorylation inprotein-protein binding: effect on stability and function. Structure 19,1807–1815.

Pajkos, M., Meszaros, B., Simon, I., and Dosztanyi, Z. (2012). Is there a biolog-ical cost of protein disorder? Analysis of cancer-associated mutations. Mol.Biosyst. 8, 296–307.

Pearson, H. (2006). Genetics: what is a gene? Nature 441, 398–401.

Perica, T., Marsh, J.A., Sousa, F.L., Natan, E., Colwell, L.J., Ahnert, S.E., andTeichmann, S.A. (2012). The emergence of protein complexes: quaternarystructure, dynamics and allostery. Colworth Medal Lecture. Biochem. Soc.Trans. 40, 475–491.

Perkins, J.R., Diboun, I., Dessailly, B.H., Lees, J.G., and Orengo, C. (2010).Transient protein-protein interactions: structural, functional, and networkproperties. Structure 18, 1233–1243.

Reimand, J., and Bader, G.D. (2013). Systematic analysis of somaticmutations in phosphorylation signaling predicts novel cancer drivers. Mol.Syst. Biol. 9, 637.

Romero, P.R., Zaidi, S., Fang, Y.Y., Uversky, V.N., Radivojac, P., Oldfield, C.J.,Cortese, M.S., Sickmeier, M., LeGall, T., Obradovic, Z., and Dunker, A.K.(2006). Alternative splicing in concert with protein intrinsic disorder enablesincreased functional diversity in multicellular organisms. Proc. Natl. Acad.Sci. USA 103, 8390–8395.

Sammut, S.J., Finn, R.D., and Bateman, A. (2008). Pfam 10 years on: 10,000families and still growing. Brief. Bioinform. 9, 210–219.

Schreiber, G., Haran, G., and Zhou, H.X. (2009). Fundamental aspects ofprotein-protein association kinetics. Chem. Rev. 109, 839–860.

Stein, A., Mosca, R., and Aloy, P. (2011). Three-dimensional modeling ofprotein interactions and complexes is going ’omics. Curr. Opin. Struct. Biol.21, 200–208.

Page 9: Molecular Cell Review - MRC Laboratory of Molecular Biology · Molecular Cell Review A Million Peptide Motifs for the Molecular Biologist Peter Tompa,1,2,* Norman E. Davey,3 Toby

Molecular Cell

Review

Stumpf, M.P.H., Thorne, T., de Silva, E., Stewart, R., An, H.J., Lappe, M., andWiuf, C. (2008). Estimating the size of the human interactome. Proc. Natl.Acad. Sci. USA 105, 6959–6964.

te Velthuis, A.J., Sakalis, P.A., Fowler, D.A., and Bagowski, C.P. (2011).Genome-wide analysis of PDZ domain binding reveals inherent functionaloverlap within the PDZ interaction network. PLoS ONE 6, e16047.

Tompa, P. (2012). Intrinsically disordered proteins: a 10-year recap. TrendsBiochem. Sci. 37, 509–516.

Tompa, P. (2013). Hydrogel formation by multivalent IDPs. A reincarnation ofthe microtrabecular lattice? Intrinsically Disordered Proteins 1, e24068.

Tompa, P., Fuxreiter, M., Oldfield, C.J., Simon, I., Dunker, A.K., and Uversky,V.N. (2009). Close encounters of the third kind: disordered domains and theinteractions of proteins. Bioessays 31, 328–335.

Tonikian, R., Xin, X., Toret, C.P., Gfeller, D., Landgraf, C., Panni, S., Paoluzi, S.,Castagnoli, L., Currell, B., Seshagiri, S., et al. (2009). Bayesian modeling of theyeast SH3 domain interactome predicts spatiotemporal dynamics of endocy-tosis proteins. PLoS Biol. 7, e1000218.

UniProt Consortium (2013). Update on activities at the Universal ProteinResource (UniProt) in 2013. Nucleic Acids Res. 41 (Database issue), D43–D47.

Uversky, V.N., and Dunker, A.K. (2010). Understanding protein non-folding.Biochim. Biophys. Acta 1804, 1231–1264.

Vacic, V., Markwick, P.R., Oldfield, C.J., Zhao, X., Haynes, C., Uversky, V.N.,and Iakoucheva, L.M. (2012). Disease-associated mutations disrupt function-ally important regions of intrinsic protein disorder. PLoS Comput. Biol. 8,e1002709.

Van Roey, K., Gibson, T.J., and Davey, N.E. (2012). Motif switches: decision-making in cell regulation. Curr. Opin. Struct. Biol. 22, 378–385.

Van Roey, K., Dinkel, H., Weatheritt, R.J., Gibson, T.J., and Davey, N.E. (2013).The switches.ELM resource: a compendium of conditional regulatory interac-tion interfaces. Sci. Signal. 6, rs7.

Venkatesan, K., Rual, J.F., Vazquez, A., Stelzl, U., Lemmens, I., Hirozane-Kish-ikawa, T., Hao, T., Zenkner, M., Xin, X., Goh, K.I., et al. (2009). An empiricalframework for binary interactome mapping. Nat. Methods 6, 83–90.

Wang, C.K., Pan, L., Chen, J., and Zhang, M. (2010). Extensions of PDZdomains as important structural and functional elements. Protein Cell 1,737–751.

Weatheritt, R.J., and Gibson, T.J. (2012). Linear motifs: lost in (pre)translation.Trends Biochem. Sci. 37, 333–341.

Weatheritt, R.J., Davey, N.E., and Gibson, T.J. (2012). Linear motifs conferfunctional diversity onto splice variants. Nucleic Acids Res. 40, 7123–7131.

Wright, P.E., and Dyson, H.J. (2009). Linking folding and binding. Curr. Opin.Struct. Biol. 19, 31–38.

Wu, H. (2013). Higher-order assemblies in a new paradigm of signal transduc-tion. Cell 153, 287–292.

Zarrinpar, A., Park, S.H., and Lim, W.A. (2003). Optimization of specificity in acellular protein interaction network by negative selection. Nature 426,676–680.

Zhou, H.X. (2012). Intrinsic disorder: signaling via highly specific but short-livedassociation. Trends Biochem. Sci. 37, 43–48.

Zielinska, D.F., Gnad, F., Wi�sniewski, J.R., and Mann, M. (2010). Precisionmapping of an in vivo N-glycoproteome reveals rigid topological and sequenceconstraints. Cell 141, 897–907.

Molecular Cell 55, July 17, 2014 ª2014 Elsevier Inc. 169