post-translational modifications regulatory networks : evolution, … · 2018. 4. 24. · beltrao,...

Post-translational modifications regulatory networks: evolution, mechanisms and implications

Thèse

Luca Freschi

Doctorat en biologie Philosophiae Doctor (Ph.D.)

Québec, Canada

© Luca Freschi, 2015

III

Résumé

Les modifications post-traductionnelles (PTM) sont des modifications chimiques des

protéines qui permettent à la cellule de réguler finement ses fonctions ainsi que de coder et

d’intégrer des signaux environnementaux. Les progrès récents en ce qui a trait aux

techniques expérimentales et bioinformatiques nous ont permis de determiner les profils de

PTM pour des protéomes entiers ainsi que d’identifier les molécules qui sont responsables

d’ « écrire » ou d’« effacer » ces PTM. Avec ces donnés, il a été possible de commencer à

definir des réseaux de régulation cellulaire par PTM. Ici, nous avons étudié l’évolution de

ces réseaux pour mieux comprendre comment ils peuvent contribuer à expliquer la

complexité et la diversité des organismes ainsi que pour mieux comprendre leurs

mecanismes d’action. Avant tout, nous avons abordé la question de comment les réseaux de

régulation des PTM peuvent être recablés après un évenement de duplication des gènes en

étudiant comment le réseau de phosphorégulation de la levure bourgeonnante a été récablé

après un évenement de duplication complète du génome qui a eu lieu il y a 100 milions

d’années. Nos résultats mettent en évidence le rôle de la duplication des gènes comme

mécanisme clé pour l’innovation et la complexification des réseaux de régulation par PTM.

Par la suite, nous avons abordé la question de comment les PTM peuvent contribuer à la

diversité des organismes en comparant les profils de phosphorylation de l’homme et de la

souris. Nous avons trouvé des différences substantielles dans les profils de PTM de ces

deux espèces qui ont le potentiel d’expliquer, au moins en partie, les différences

phénotypiques observées entre eux. Nous avons aussi trouvé des évidences qui supportent

l’idée que les PTM peuvent « sauter » vers des nouvelles localisations et quand même

réguler les mêmes fonctions biologiques. Ce phénomène doit être pris en considération

dans les comparaisons des profils de PTM qui appartiennent à des espèces différentes, pour

éviter de surestimer la divergence causée par la régulation par les PTM. Enfin, nous avons

investigué comment plusieures PTM alternatives pour un même residu pouvent interagir

pour réguler des fonctions cellulaires. Nous avons examiné deux des PTM les plus connus,

la phosphorylation et la O-GlcNAcylation, qui modifient les sérines et les thréonines, et

nous avons étudié les mécanismes potentiels d’interaction entre ces deux PTM. Nos

résultats supportent l’hypothèse que ces deux PTM contrôlent plusieurs fonctions

biologiques plutôt qu’une seule fonction. Globalement, les résultats présentés dans cette

IV

thèse permettent d’élucider les dynamiques évolutives, les mécanismes de fonctionnement

et les implications biologiques des PTM.

V

Abstract

Post-translational modifications (PTMs) are chemical modification of proteins that allow

the cell to finely tune its functions as well as to encode and integrate environmental signals.

The recent advancements in the experimental and bioinformatic techniques have allowed us

to determine the PTM profiles of entire proteomes as well as to identify the molecules that

write or erase PTMs to/from each protein. This data have made possible to define cellular

PTM regulatory networks. Here, we study the evolution of these networks to get new

insights about how they may contribute to increase organismal complexity and diversity

and to better understand their molecular mechanisms of functioning. We first address the

question of how and to which extent a PTM network can be rewired after a gene

duplication event, by studying how the budding yeast phosphoregulatory network was

rewired after a whole genome duplication event that occurred 100 million years ago. Our

results highlight the role of gene duplication as a key mechanism to innovate and

complexify PTM regulatory networks. Then, we address the question of how PTM

networks may contribute to organismal diversity by comparing the human and mouse

phosphorylation profiles. We find that there are substantial differences in the PTM profiles

of these two species that have the potential to explain, at least in part, the phenotypic

differences observed between them. Moreover, we find evidence supporting the idea that

PTMs can jump to new positions during evolution and still regulate the same biological

functions. This phenomenon should be taken into account when comparing the PTM

profiles of different species, in order to avoid overestimating the divergence in PTM

regulation. Finally, we investigate how multiple and alternative PTMs that affect the same

residues interact with each other to control proteins functions. We focus on two of the most

studied PTMs, protein phosphorylation and O-GlcNAcylation, that affect serine and

threonine residues and we study their potential mechanisms of interactions in human and

mouse. Our results support the hypothesis that these two PTMs control multiple biological

functions rather than a single one. Globally this work provides new findings that elucidate

the evolutionary dynamics, the functional mechanisms and the biological implications of

PTMs.

VII

Table of Contents

RÉSUMÉ .......................................................................................................................................... III

ABSTRACT ....................................................................................................................................... V

TABLE OF CONTENTS .................................................................................................................... VII

LIST OF FIGURES ............................................................................................................................ XI

LIST OF ABBREVIATIONS .............................................................................................................. XIII

ACKNOWLEDGEMENTS ................................................................................................................ XVII

FOREWORD ................................................................................................................................... XIX

CHAPTER 1 - INTRODUCTION ............................................................................................ 1

1.1 - POST-TRANSLATIONAL MODIFICATIONS ................................................................................. 1

1.2 - HOW PTMS REGULATE PROTEIN FUNCTIONS .......................................................................... 2

1.3 - PTM REGULATORY NETWORKS ............................................................................................... 3

1.4 - CROSS-TALK BETWEEN PTMS ................................................................................................. 4

1.5 - PTM NETWORKS AND THE EVOLUTION OF BIOLOGICAL COMPLEXITY AND DIVERSITY ......... 4

1.6 - THE TECHNOLOGICAL ADVANCEMENTS OF THE LAST DECADE MAKE POSSIBLE THE STUDY

OF PTM NETWORKS ........................................................................................................................ 5

1.7 – AIMS OF THIS THESIS .............................................................................................................. 6

CHAPTER 2 - PHOSPHORYLATION NETWORK REWIRING BY GENE DUPLICATION ................................................................................................................................................... 9

2.1 – RÉSUMÉ ................................................................................................................................ 10

2.2 - ABSTRACT ............................................................................................................................. 11

2.3 – INTRODUCTION ..................................................................................................................... 12

2.4 - MATERIALS AND METHODS .................................................................................................. 13

2.5 – RESULTS AND DISCUSSION ................................................................................................... 18

2.5.1 - Paralogous phosphoproteins substantially diverged after WGD .................................. 18

2.5.2 - Conservation and compensation of phosphosite loss by site-position turnover ............ 21

2.5.3 - Life after WGD: rewiring the cellular regulatory networks .......................................... 22

2.5.4 - Phosphosite loss dominates site-divergence .................................................................. 24

2.6 - CONCLUSION ......................................................................................................................... 27

2.7 - ACKNOWLEDGEMENTS .......................................................................................................... 28

CHAPTER 3 - WHERE DO PHOSPHOSITES COME FROM AND WHERE DO THEY GO AFTER GENE DUPLICATION? ............................................................................................ 29

3.1 – RÉSUMÉ ................................................................................................................................ 30

3.2 – ABSTRACT ............................................................................................................................ 31

3.3 – INTRODUCTION ..................................................................................................................... 32

3.4 - METHODS .............................................................................................................................. 36

3.5 – RESULTS ............................................................................................................................... 37

3.6 - CONCLUSION ......................................................................................................................... 44

3.7 - ACKNOWLEDGEMENTS .......................................................................................................... 47

VIII

CHAPTER 4 - FUNCTIONAL DIVERGENCE AND EVOLUTIONARY TURNOVER IN MAMMALIAN PHOSPHOPROTEOMES ............................................................................. 49

4.1 – RÉSUMÉ................................................................................................................................. 50

4.2 - ABSTRACT ............................................................................................................................. 51

4.3 – INTRODUCTION ..................................................................................................................... 52

4.4 - METHODS .............................................................................................................................. 55

4.5 – RESULTS ................................................................................................................................ 59

4.5.1 - Conservation and divergence between human and mouse phosphoproteomes ............. 59

4.5.2 - A role for state-diverged sites in phosphoproteome divergence .................................... 64

4.5.3 - Evolutionary turnover of mammalian phosphorylation sites ......................................... 70

4.6 – CONCLUSION ......................................................................................................................... 74

4.7 – ACKNOWLEDGEMENTS ......................................................................................................... 76

CHAPTER 5 – CROSS-TALK BETWEEN O-GLCNACYLATION AND PHOSPHORYLATION IN MAMMALIAN PROTEOMES ................................................... 77

5.1 – RÉSUMÉ................................................................................................................................. 78

5.2 -ABSTRACT .............................................................................................................................. 79

5.3 - INTRODUCTION ...................................................................................................................... 80

5.4 - METHODS .............................................................................................................................. 83

5.5 - RESULTS ................................................................................................................................ 85

5.5.1 - An extensive dataset of phosphorylation and O-GlcNAcylated sites ............................. 85 5.5.2 – Phosphorylation and O-GlcNAcylation are found in the same residues more than expected by chance alone .......................................................................................................... 85

5.5.3 - Clues of independent regulation of multiple functions in humans but not in mouse ...... 88

5.5.4 – Three state sites and 2-state ones have different preferences for protein kinases ........ 91

5.6 - CONCLUSION ......................................................................................................................... 93

CHAPTER 6 – GENERAL CONCLUSIONS .......................................................................... 95

6.1 - SUMMARY OF THE STUDY ...................................................................................................... 95

6.2 - PERSPECTIVES ....................................................................................................................... 98

ANNEX 1 – SUPPLEMENTARY INFORMATION FOR CHAPTER 2 .............................. 101



ANNEX 4 - QPCA: A SCALABLE ASSAY TO MEASURE THE PERTURBATION OF PROTEIN-PROTEIN INTERACTIONS IN LIVING CELLS ............................................. 131

ABSTRACT ................................................................................................................................... 131

INTRODUCTION ............................................................................................................................ 131

MATERIALS AND METHODS ......................................................................................................... 134

RESULTS AND DISCUSSION .......................................................................................................... 139

The DHFR-qPCA signal reflects the amount of protein complex formed in the cell .............. 139

DHFR-qPCA allows to study the effect of metabolic, drug and genetic perturbations on protein complexes ................................................................................................................................ 143

IX

Conclusions ............................................................................................................................. 147

ACKNOWLEDGEMENTS ................................................................................................................ 147

REFERENCES ....................................................................................................................... 157

XI

List of Figures

FIGURE 2.1. CONSERVATION AND DIVERGENCE OF PHOSPHOREGULATION AMONG WGD PARALOGS...................................... 21 FIGURE 2.2. GAINS AND LOSSES OF PHOSPHOSITES AFTER GENE DUPLICATION. ................................................................... 25 FIGURE 2.3. L. KLUYVERI PHOSPHOPROTEOMICS CONFIRMS THAT PHOSPHOSITES ARE PREFERENTIALLY LOST IN PARALOGOUS

PHOSPHOPROTEINS. ....................................................................................................................................... 26 FIGURE 3.1. ALGORITHM USED TO CALCULATE AND COMPARE THE PROPORTIONS OF TRANSITIONS BETWEEN PHOSPHORYLATED AND

PHOSPHOMIMETIC RESIDUES RELATIVE TO CONTROL SITES. .................................................................................... 39 FIGURE 3.2. PHOSPHOSITES THAT ARE DIFFERENTIALLY LOST IN PARALOGOUS PHOSPHOPROTEINS EVOLVE TOWARD NEGATIVELY

CHARGED RESIDUES. ...................................................................................................................................... 40 FIGURE 3.3. DETAILED ANALYSIS OF THE PATTERNS OF EVOLUTION OF PSER AND PTHR SITES. ............................................... 41 FIGURE 3.4. TRANSITIONS BETWEEN PHOSPHORYLATABLE AND PHOSPHOMIMETIC AMINO ACIDS NEED TO GO THROUGH A NON‐

NEGATIVELY CHARGED INTERMEDIATE. ............................................................................................................... 45 FIGURE 3.5. A DUPLICATION EVENT COULD PROVIDE THE CONDITIONS FOR THE INTERMEDIATE NON‐FUNCTIONAL SITE TO BE

NEUTRAL, WHICH WOULD ALLOW A TRANSITION WITHOUT AFFECTING THE FITNESS OF THE ORGANISM. .......................... 46 FIGURE 4.1. PURIFYING SELECTION IS ACTING ON MAMMALIAN PHOSPHORYLATION SITES AND THEIR PHOSPHORYLATION STATUS. 61 FIGURE 4.2. ANALYSIS OF NETPHOREST SCORES FOR THE DIFFERENT CLASSES OF SITES. ....................................................... 67 FIGURE 4.3. COMPARISON OF A PAIR OF STC AND STD SITES. ......................................................................................... 69 FIGURE 4.4. PROPORTION OF SITES THAT ARE PHOSPHORYLATED BY THE SAME PROTEIN KINASE. ........................................... 71 FIGURE 4.5. EVOLUTIONARY HISTORIES OF CANDIDATE FUNCTIONALLY REDUNDANT SITE PAIRS. ............................................ 73 FIGURE 5.1. NUMBER OF 3‐STATE SITES IN HUMAN AND MOUSE AND COMPARISONS TO RANDOM EXPECTATIONS. ................... 86 FIGURE 5.2. FRACTION OF SITES AS A FUNCTION OF PROTEIN ABUNDANCE FOR HUMAN AND MOUSE O‐GLCNACYLATION SITES

AND COMPARISON OF AVERAGE PROTEIN ABUNDANCE BETWEEN ALL PROTEINS AND PROTEINS THAT CONTAIN 3‐STATE SITES

FOR HUMANS AND MOUSE. ............................................................................................................................. 88 FIGURE 5.3. COMPARISON OF RESIDUE CONSERVATION FOR 1‐STATE, 2‐STATE AND 3‐STATE RESIDUES IN THE HUMAN AND MOUSE

PROTEOMES. ................................................................................................................................................ 90 FIGURE 5.4. COMPARISON OF THE EVOLUTIONARY CONSERVATION OF THE REGIONS SURROUNDING 1‐STATE, 2‐STATE AND 3‐STATE

RESIDUES (+/‐ 5 AMINO ACIDS) FOR THE HUMAN PROTEOME. ................................................................................ 91 FIGURE 5.5. KINASE PREFERENCES OF 3‐STATE RESIDUES FOR HUMAN AND MOUSE PROTEINS. .............................................. 92

XIII

List of Abbreviations

cSer: control Serine

cThr: control Threonine

DHFR: Dihydrofolate reductase

FN: False negative

FP: False Positive

My: Million years

PCA: Protein Complementation Assay

PTM: Post Translational Modification

PWM: Position Weight Matrices

StC: State-Conserved

StD: State-Diverged

SiD: Site-Diverged

WGD: Whole Genome Duplication

XV

For Marcello and Elodia

XVII

Acknowledgements

My first thought goes to my advisor, Christian Landry. He has always done all the possible (and, sometimes, even the impossible) to make me a better scientist and a better man.

I would also like to thank the members of my thesis and PhD committees: Prof. Pedro Beltrao, Prof. Yves Bourbonnais, Prof. Nicolas Derome and Prof. Sabine Elowe. Thanks to their suggestions the quality of my PhD and of this thesis have improved a lot.

I cannot forget to mention here all the members of the Landry and Aubin-Horth labs: François-Christophe Marois-Blanchet, Guillaume Diss, José-Francisco Torres-Quiroz, Isabelle Gagnon-Arsenault, Jean-Baptiste Leducq, Alexandre Dubé, Guillaume Charron, Andrée-Ève Chrétien, Francis Rousseau-Brochu, Marie Filteau, Samuel Rochette, Lou Nielly-Thibault, Mélissa Giroux, Jukka-Pekka Verta, Martha Nigg, Nadia Aubin-Horth, François-Olivier Gagnon-Hébert, Sergio Cortez-Ghio, Jennyfer Lacasse, Carole Di Poi and Lucie Grecias. They have been my family in these 5 years in Québec.

A lot of thanks to my lovely wife, Maryam. She have always been on my side during these months, in the good and in the tough moments.

I would also like to thank my parents, Elodia and Marcello, for all the efforts they have made during all these years to allow me following my dreams. What I have accomplished up to now and what I will accomplish in the future is not only my success, but also theirs.

Finally, I would like to spend a few words to say thank you to a special person that unfortunately has not been able to share with me the moments and the feelings of the PhD defence: my beloved grandmother Sara. Without her advice I would not be where I am now.

XIX

Foreword

This thesis is organized in 6 chapters including a general introduction and a general conclusion. Chapters 2, 3 and 4 have already been published as independent scientific articles. Chapter 5 will be submitted for publication to a scientific journal. Annex 4 includes a further paper whose subject is not directly connected with the main theme of this thesis.

Chapter 2 has been published as: Freschi L., Courcelles M., Thibault P., Michnick S.W. and Landry C.R (2011) Phosphorylation network rewiring by gene duplication, Molecular Systems Biology. 7:504

Chapter 3 has been published as: Diss G., Freschi L., Landry C.R (2012) Where do phosphosites come from and where do they go after gene duplication? Journal of Evolutionary Biology - special issue: Molecular Evolutionary Routes that Lead to Innovations 2012: 843167

Chapter 4 has been published as: Freschi L., Osseni M., Landry C.R (2014) Functional Divergence and Evolutionary Turnover in Mammalian Phosphoproteomes, PLoS genetics 10 (1), e1004062

Annex 4 has been published as: Freschi L., Torres-Quiroz F., Dubé A.K., Landry C. R (2013) qPCA: a scalable assay to measure the perturbation of protein-protein interactions in living cells, Molecular Biosystems. 9(1):36-43

The analysis of the results and the writing of the articles have been performed by L. Freschi, under the direction of C. R Landry.

For Chapter 2 C. Landry, M. Courcelles and P. Thibault performed the phosphoproteomics experiments. S. W. Michnick contributed reagents, tools and guidance on the phosphoproteomics experiments.

For Chapter 3 G. Diss participated to the analysis of the results and the writing of the manuscript and he is therefore co-first author in the article.

For Chapter 4 M. Osseni contributed building up the data set used in the study.

1

Chapter 1 - Introduction

1.1 - Post-translational modifications

In cells the blueprint for all cellular functions is stored in the DNA. However, the actual

effectors of the cellular functions are a myriad of different molecules, among which

proteins have a promiment role. The information flux follows this simple rule: the

information stored in the DNA is transcribed into RNA, an intermediate messenger

molecule and then translated into proteins by ribosomes (Crick, 1970). All these steps are

tighty regulated so that each protein is expressed in the right place at the right time. For

instance, transcription factors that bind to the regulatory regions of genes conribute to

define the expression level of genes (Brewster et al, 2014). Different RNA sequences have

different stabilities and are translated at different rates by ribosomes, and this affects both

protein abundance and protein folding (Schwanhausser et al, 2011). Finally, after

translation, proteins can be further modified by the addition of chemical groups

(Prabakaran et al, 2012). This additions are called post-translational modifications or

PTMs. Up to now more than 300 different PTMs that have been reported in the literature,

which affect more than 300,000 residues of proteins in prokaryotes and eukaryotes

proteomes (Khoury et al, 2011). The attached chemical groups may vary in size and span

from small groups like the methyl one to entire proteins like ubiquitin. Moreover, most

PTMs are reversible, meaning that the protein can shuffle between different states

(modified/non-modified) over time or internal conditions (Olsen et al, 2006). Notable

examples of PTMs include phosphorylation, the addition of a phosphate group to serines

and threonies residues of proteins, glycosylation, the addition of sugar moieties to several

amino acid residues and ubiquinitylation, the addition of ubiquitin to lysine residues. PTMs

are an important mechanism through which the cell gets a fine tuning of cellular functions

and we will now go into the details of this aspect.

2

1.2 - How PTMs regulate protein functions

PTMs can modify the properties of a protein in different ways : (i) they can activate or

deactivate one or more functions by determining a conformational change of the protein

(Sprang et al, 1988), (ii) they can allow protein-protein interactions by changing the bulk

charge at the interaction surface of the protein (Khmelinskii et al, 2009), (iii) they can

contribute determining the stability (Vazquez et al, 2000) and the half-life (Koivomagi et

al, 2011) of the protein and (iv) they can determine the localization of the protein (Madeo et

al, 1998). A spectacular example of the first of these scenarios is repesented by the

glycogen phosphorylase, an enzyme involved in glycogen metabolism (Livanova et al,

2002). This protein is present in the cell in two forms, named a and b. The a form of the

enzyme has a low catalytic activity while the b form is characterized by a high one. The

transition from the a form to the b form is made possible by a phosphorylation event on a

specific residue (Ser-14). This molecular event ultimately determines a conformational

change that has as a consequence an increased enzymatic activity of the protein, which

breaks down glycogen chains into glucose molecules that become available for cellular

catabolic processes. An example of how PTMs regulate protein-protein interactions is

represented by the proteins p53 and MDM2. p53 is a tumor suppressor protein that plays

important roles in angiogenesis, genomic stability and apoptosis (Levine & Oren, 2009).

p53 interacts with MDM2, a p53 specific E3 ubiquitin ligase, to form the p53-MDM2

complex (Moll & Petrenko, 2003) that prevents p53 activation in unstressed cells. The

phosphorylation of the Ser-106 residue of p53 under stress conditions inhibits the

interaction between these two proteins (Hsueh et al, 2013) ultimately contributing to p53

activation. PTMs can also affect the half-life of a protein. The attachment of several

ubiquitin units to lysine residues is a general mechanism for the cell to target proteins to

proteasome mediated degradation. An example of the importance of this mechanism is

provided by cell cycle regulation. Indeed, the progression on the cell cycle is made possible

by the expression and subsequent ubiquitin-mediated proteolysis of cyclins and CDK

inhibitors (Glotzer et al, 1991). Finally, PTMs can also affect the localization of proteins.

For instance the phosphorylation of two key serine residues (Ser-358 and Ser-361) of the

Ikaros protein, a hematopoietic-specific transcription factor, determines the re-localization

of this protein in the nucleus where it can promote the transcription of the genes involved in

3

lymphocite differentiation (Uckun et al, 2012). The different mechanisms described above

show how PTMs can change the properties of proteins in many different ways, thus

representing a versatile mechanism to tune and regulate protein functions.

1.3 - PTM regulatory networks

One of the most important characteristics of PTMs is that, in general, they are reversible

modifications of proteins. This means that the cell has to possess molecular mechanisms to

add and remove PTMs in order to be able to regulate proteins function. Further, these

mechanisms need to be specific, since different PTMs (e.g. phosphorylation and

ubiquitylation) occur at different residue types (phosphorylation occurs on serine, threonine

and tyrosine residues while ubiquitylation occurs on lysine residues). In the last decades we

started to unravel the molecular details of these mechanisms for many PTMs and to

understand the principles beyond them, the most well known example being protein

phosphorylation. For this PTM we now know that the phosphate groups are added by a

specific set of proteins called protein kinases and are removed by another specific set of

proteins called protein phosphatases. In the human genome there are about 500 protein

kinases (that correspond to the 2% of all human genes) and 200 phosphatases (Manning et

al, 2002). Indeed, for each PTM type (e.g. phosphorylation, acetylation, ubiquitylation,

etc.) there is a set of proteins, called writers, that can add the PTM to other proteins and

another set, called erasers, that can remove the PTM (Lim & Pawson, 2010). In order to

understand how the cell regulates its functions through PTMs we need to consider and

study the entire network composed by all PTMs, writers and erasers, also called PTM

regulatory network. At the moment we are still far from having a complete understanding

of the cell PTM regulatory network and even the most recent studies mostly focus on a

single PTM network or a few of them (Hunter, 2007).

4

1.4 - Cross-talk between PTMs

An important aspect that characterizes PTM regulatory networks is that they are not

independent from each other. Indeed, several studies reviewed in (Hunter, 2007) have

revealed that the presence of one PTM at one residue can interfere with the the addition of

other PTMs at the same residue or at adjacent residues. This interference is often referred as

cross-talk. Two types of cross-talk have been described in the literature: positive cross-talk

and negative cross-talk (Hunter, 2007). The term positive cross-talk refers to a situation in

which the presence of one PTM at one residue favours the addition or the removal of

another PTM. The term negative cross-talk, instead, describes a situation in which one

residue can potentially undergo two or more PTMs, so there is a direct competition between

the writers of the differents PTM networks to modify that residue. An example of positive

cross-talk between PTMs is represented by the phosphorylation-dependent ubiquitylation of

cyclin D1 (Lin et al, 2006). Cyclin D1 is a regulator of the CDK4/6 kinases. The cyclin

D1/CDK complexes trigger the G1/S transition through the cell cycle. However, during the

S-phase cyclin D1 has to be degraded. This event is primed by the phosphorylation of Thr-

286, which promotes the subsequent ubiquitylation of cyclin D1 by an E3 ligase, targeting

the protein for degradation. An example of the second type of cross-talk, negative cross-

talk, is represented by the protein p53. This tumor suppressor is stabilized by the

acetylation of multiple specific lysine residues at the C-terminus (Lys-370, Lys-371, Lys-

372, Lys-381, Lys-382) (Li et al, 2002). The acetylation of these residues impedes their

ubiquitylation by MDM2, thus contributing to stabilize p53 and increase its half-life. These

examples show that PTM regulatory networks are interconnected and that in order to

understand how the whole cellular PTM regulatory network works, we need to take into

account and study the cross-talk between PTMs.

1.5 - PTM networks and the evolution of biological complexity and

diversity

The organisms that populate our planet have evolved from simpler ones through

evolutionary trajectories determined by natural selection and PTM regulatory networks

5

appearance and complexification is thought to have a role in the emergence of biological

complexity and diversity. A notable example that supports this scenario has been revealed

by recent studies about the evolution of tyrosine phosphorylation. This PTM regulatory

network, indeed, is the result several evolutionary steps that started more than a billion

years ago in a single-cell eukaryotic organism. Pincus and collaborators (Pincus et al, 2008)

shed light on these evolutionary steps. Limited tyrosine phosphorylation calalyzed by

Ser/Thr kinases cross-phosphorylation was observed in premetazoan organisms.

Premetazoan organisms also possessed a reduced set of erasers for tyrosine phosphorylation

(tyrosines phosphatases). However, the complete PTM network that included the set of

writers (tyrosine kinases) was only observed in metazoans and choanoflagellates. The

emergence of the set of writers on metazoans was also associated to an expansion of the set

of erasers. The emergence of the tyrosine phosphorylation PTM network is thought to have

had a huge impact for the emergence of multicellular organisms (metazoa), since tyrosine

phosphorylation is a key component of the molecular machinery that allows cell-cell

communications. This example shows how by studying the evolution of PTM networks we

can understand the basis of organismal complexity and diversity.

1.6 - The technological advancements of the last decade make possible the

study of PTM networks

The advancements in mass-spectrometry, genomics, biochemistry and bioinformatics of the

last decade allowed us to have an unprecedent set of tools to study PTM networks. While

classic techniques that rely on antibody-based Western blot analysis are still useful to detect

specific PTM events, the development of protocols that allow to enrich the sample for

peptides carrying a PTM of interest coupled to mass spectrometry (Zhao & Jensen, 2009)

have allowed to determine for the first time proteome-wide PTM profiles at high-

throughput and in particular those of model organisms like yeast (Gruhler et al, 2005; Holt

et al, 2009; Li et al, 2007), C. elegans (Zielinska et al, 2009), mouse (Huttlin et al, 2010)

and human (Sharma et al, 2014). Further, techniques like peptide arrays (Chen & Turk,

2010) have allowed us to explore the specifity of the writers and provided the basis for

6

determining writer-site associations. These associations allowed us to recostruct the

topology and the organization of some PTM networks. The results of these studies have

also lead to the developmement of algorithms (e.g. (Miller et al, 2008)) that can predict

PTM sites on proteins or associations writer-site. While each one of these tools and

techniques have several limitations, they allowed us to investigate for the first time whole

PTM networks at high-throughput.

1.7 – Aims of this thesis

The general aim of this thesis is to study the evolution of PTM regulatory networks in order

to understand how they rewire over time and in different organisms and how they cross-talk

to each other. Sheding light on these aspects of PTM networks will allow us to achieve a

better understanding of (i) what are the molecular mechainisms that contribute to increase

biological complexity, (ii) what are the molecuar basis of species divergence and (iii)

improve our knowledge about how the cell integrates different signals to take decisions.

We will now review more in detail the specific questions addressed in each chapter.

Chapter 2 addresses the question of how eukaryotic PTM networks are rewired after gene

duplication. Gene duplication is a mechanism that provides raw genetic material that can be

shaped by evolution and it is thought to be one of the mechanisms at the origin of

organismal complexity. By using budding yeast as model system we study to which extent

gene duplication changed the phosphoregulatory network of this model organism. We

chose this PTM network because it is the one for which we have the most complete data. In

this chapter we also investigate the molecular mechanisms involved in the rewiring of this

phosphoregulatory network and we discuss how these mechanisms may have contributed to

increase its complexity.

In Chapter 3 we further develop the analyses on the evolutionaries trajectories of yeast

phosphorylation sites after the duplication event (Chapter 2) and we study how some of

these trajectories that imply the loss of phosphorylation sites may actually contribute to

complexify the cellular regulatory network. We then discuss these results in the context of

7

how gene duplication may lead to biological innovations.

In Chapter 4 we address the question of how a mammalian PTM regulatory network has

been rewired by evolution, by comparing the mouse and human phosphoproteomes. In this

case also we focus on this regulatory network because it is the best known one. Comparing

the PTMs of human and mouse represents an important step to both understand the

regulatory differences between these species and, more in general, the molecular basis of

species divergence.

Finally, in chapter 5 we study the cross-talk of two mammalian PTM regulatory networks

that share the same target residues, the phosphorylation and O-GlcNAcylation PTM

networks (the target residues being Ser and Thr), in mouse and human. While examples of

cross-talk between these two PTMs had already been reported in the literature, a global

assessment of the cross-talk between these two PTM networks is not available yet. In our

analysis we first find evidence for a global the cross-talk between these two PTM networks

and we then determine if phosphorylation and O-GlcNAcylation could act as two molecular

switches that regulate a single function or two molecular switches for two functions. By

answering these questions we can understand some of the mechanisms by which different

PTM networks interact with each other allowing the cell to integrate different signals and to

take decisions.

9

Chapter 2 - Phosphorylation network rewiring by gene duplication

Published on: Freschi L., Courcelles M., Thibault P., Michnick S.W. and Landry C.R

(2011) Phosphorylation network rewiring by gene duplication, Molecular Systems Biology.

7:504

10

2.1 – Résumé

Pour comprendre comment des réseaux de régulation complexes se sont assemblés au fil de

l’évolution, nous avons besoin d’avoir une compréhension détaillée des dynamiques qui

suivent les événements de duplication des gènes, entre autres les changements des profils

des modifications post-traductionnelles. Nous avons comparé le profil de phosphorylation

des protéines paralogues de la levure bourgeonnante à celui d’une espèce qui a divergé de

Saccharomyces cerevisiae avant que l’événement de duplication des gènes se soit produit.

Nous avons trouvé que 100 millions d’années de divergence après l’événement de

duplication sont suffisants pour déterminer que la majorité des sites de phosphorylation

soient perdus ou gagnés par un paralogue ou l’autre, avec une forte tendance pour les

pertes. Toutefois, certaines pertes peuvent être en partie compensées par l’évolution

d’autres sites de phosphorylation, étant donné que les paralogues tendent à préserver le

même nombre de sites au fil du temps. Nous avons aussi trouvé qu’environ 50% des

relations kinase-substrat peuvent avoir changé durant cette période. Nos résultats suggèrent

qu’après la duplication, les protéines tendent à subir des événements de

subfonctionnalisation au niveau des modifications post-traductionnelles. De plus, même

lorsque les sites de phosphorylation sont conservés au cours de l’évolution, il y a une

rotation des kinases qui phosphorylent ces sites.

11

2.2 - Abstract

Elucidating how complex regulatory networks have assembled during evolution requires a

detailed understanding of the evolutionary dynamics that follow gene duplication events,

including changes in post-translational modifications. We compared the phosphorylation

profiles of paralogous proteins in the budding yeast Saccharomyces cerevisiae to that of a

species that diverged from the budding yeast prior to the duplication of those genes. We

found that 100 million years of post-duplication divergence are sufficient for the majority

of phosphorylation sites to be lost or gained in one paralog or the other, with a strong bias

towards losses. However, some losses may be partly compensated for by the evolution of

other phosphosites, as paralogous proteins tend to preserve similar numbers of phosphosites

over time. We also found that up to 50% of kinase-substrate relationships may have been

rewired during this period. Our results suggest that after gene duplication, proteins tend to

subfunctionalize at the level of posttranslational regulation and that even when

phosphosites are preserved, there is a turnover of the kinases that phosphorylate them.

12

2.3 – Introduction

Genomes and organisms gain in complexity during evolution by gene duplication followed

by the functional divergence of the duplicates (Hurles, 2004). Signalling and regulatory

proteins are thought to play a particularly important role in the evolution of organismal

complexity (Gough & Wong, 2010). We know very little about the early evolutionary

steps that follow the duplication of regulatory proteins and of the substrates they regulate.

Studies on short time scales and on well-characterized organisms are needed in order to

estimate the contribution of the different evolutionary forces to the assembly of novel

regulatory pathways and networks.

Here we address the evolution of phosphoregulatory networks by directly studying

phosphoproteins and their associated protein kinases. Protein phosphorylation regulates

several if not most of protein functions by affecting their stability, localization, activity and

ability to interact (Moses & Landry, 2010). When maintained, paralogous proteins may

diverge in function following two evolutionary paths, which are not mutually exclusive.

First, one paralog may evolve new functions (neofunctionalization) (Conant & Wolfe,

2008). Second, degenerative mutations may accumulate in one or both paralogs leading to

the loss of redundant functions (subfunctionalization) (Force et al, 1999; Lynch & Force,

2000). If we assume a model under which each phosphosite in a protein has a function

(Holmberg et al, 2002), neofunctionalization would correspond to sites acquired after the

duplication event and subfunctionalization to sites lost in one of the two paralogs. In the

first case, new connections are created in the kinase-substrate network; in the second case,

no new function has evolved and regulatory links are lost rather than created. We (Landry

et al, 2009) and others (Lienhard, 2008) have recently suggested that a fraction of

phosphorylation sites may have no specific functions and represent the result of kinase-

substrate interactions that evolved neutrally or nearly neutrally. Accordingly, a fraction of

the links that are created or lost after gene duplication in these networks would represent

gains and losses of phosphosites without sub- or neofunctionalization of the proteins.

In this study we used the budding yeast Saccharomyces cerevisae phosphorylation network

as a model. The lineage leading to the budding yeast underwent a whole genome

13

duplication (WGD) 100 million years (My) ago (Wolfe & Shields, 1997) that affected its

signalling networks significantly: while only 10% of all genes (~500 pairs) were

maintained as duplicates, 30% and 33% of protein kinases and phosphatases have been

retained as duplicates respectively (Seoighe & Wolfe, 1999). Furthermore,

phosphoproteins were significantly more likely to be retained as paralogs than

nonphosphorylated proteins (Amoutzias et al, 2010). Finally, duplicated kinases and their

regulatory proteins differ in sequence and functions (Musso et al, 2008) and many of them

show accelerated amino acid changes after the WGD (Kellis et al, 2004). Using

computational and experimental analyses, we examined the extent to which phosphosites

diverged after gene duplication, we addressed whether there have been accelerated gains

and losses of phosphosites among these phosphoproteins and whether kinase-substrate

relationships have been modified since the WGD.

2.4 - Materials and Methods

We compiled a set of 20342 phosphorylation sites on 2688 proteins from 8 large-scale

studies using 21068 phosphopeptides from 6 studies (Albuquerque et al, 2008; Bodenmiller

et al, 2007; Chi et al, 2007; Gruhler et al, 2005; Li et al, 2007; Reinders et al, 2007), as

compiled by Amoutzias et al. (Amoutzias et al, 2010) to which we added 3616

phosphopeptides from Beltrao et al. (Beltrao et al, 2009) and 3620 phosphopeptides from

Gnad et al. (Gnad et al, 2009). Raw phosphopeptides from these studies were filtered

according to the following criteria: for the Gnad dataset, we considered peptides with a

probability score above 0.95; for the Beltrao dataset we selected the peptides with score

greater than 0.02 and not being acetylated at the amino or carboxy terminus; then for all

datasets we selected all the peptides that matched one exact hit on S. cerevisiae proteins

using Blat searches (Kent, 2002). Peptides that matched more than one protein were

eliminated because they could not be assigned unambiguously to a single protein. We used

this data to assemble a first dataset. Thus, we compiled another dataset using the same data

about the phosphosites, but this time we did not apply the filtering step with Blat. To our

knowlegde these data sets of phosphorylation sites are the most comprehensive ones

currently available. Finally, we compiled a third dataset of manually curated phosphosites

14

that have been shown to be phosphorylated in small scale experiments and whose function

has been determined (Ba & Moses, 2010). The compiled data and all the other data

described below are available at: http://www.bio.ulaval.ca/landrylab/download/.

We estimated the state-divergence of phosphosites between paralogous proteins by

comparing cross-study conservation and reproducibility. Our data set comes from 8 distinct

studies, so there are 28 possible pairwise comparisons. We only considered sites that were

S/T in both paralogs. For each pair of studies we considered 2 sets of concatenated

paralogous proteins, para.1 and para.2. We counted the number of sites found in para.1 in

study 1 and examined how many were also found in para.1 in study 2 (cross-study

reproducibility) and para.2 in study 2 (cross-study conservation) (Annex1, Figure S2.1).

We did the same comparison for these two studies between sites identified in para.2 of

study 1 and also in para.2 of study 2 (cross-study reproducibility) and of para.1 of study 2

(cross-study conservation). Each pair of studies therefore yields two ratios of cross-study

conservation/cross-study reproducibility and this ratio gives a measure of the extent of

conservation between paralogs while taking into account the reproducibility of the two

studies.

State conservation ≈ cross-study conservation/cross-study reproducibility

State conservation ≈ (Study.1 para.1 Study.2 para.2)/Study.1 para.1

(Study.1 para.1 Study.2 para.1)/Study1. para1

A regression of the cross-study conservation on the cross-study reproducibility provides a

rough estimate of the state-conservation between paralogs while taking reproducibility into

account (Figure 2.1A).

15

Local phosphosite turnover was tested as follows. We took all the pairs of WGD

phosphoproteins where both paralogs had one or more phosphosites. For each phosphosite

present in the first paralog, we examined a window of length l centered on the site, thus

defining a range of positions along the sequence. Excluding all state-conserved sites (at the

exact same position), we counted all the phosphosites present in the aligned second paralog

inside the corresponding range of positions within a window. A site was conserved if for a

given phosphosite in the first paralog there was at least one phosphosite in the second

paralog inside the range of positions. We then determined the ratio of conserved sites over

all sites for each window size. The random expectation was estimated using 100

randomizations of phosphosites as described below.

The Position Weight Matrices (PWM) used for the prediction of the protein kinases

associated with each of the phosphosites were derived empirically by Mok et al. (Mok et al,

2010) through in vitro peptide screening using 61 of the 122 kinases from S. cerevisiae.

While this data is incomplete, it is the best currently available as it relies on empirically

derived consensus motifs rather than completely in silico predictions. In order to assign all

of the phosphosites to their most likely corresponding kinases, we extracted all of the 15-

mers of the yeast proteome that correspond to the phosphosite and their 14 flanking (±7)

residues. All phosphosites were then scored by summing the logarithm of the values

present in each kinase PWM matrix corresponding to each of the amino acids of the 15-

mer. We then assigned a protein kinase to a particular site based on the highest score for

that site (Annex1, Figure S2.2). Data on kinase-substrate interactions were obtained from

Ptacek (Ptacek et al, 2005) and Ubersax (Ubersax et al, 2003). In the first case the data

represents microarray interactions between 87 different kinases and more than 4000

potential substrates. We estimated the fraction of paralogs that were phosphorylated by the

same kinase, considering only paralogs that were both phosphorylated by at least one

kinase. The second data comes from an in vitro experiment testing for interactions between

Cdc28 and the yeast proteome. We calculated the number of times both paralogs were

phosphorylated by the kinase among all cases where at least one of the two was

phosphorylated.

16

Gains and losses of phosphosites were inferred as described in Figure 2.2A. We estimated

the expected numbers of gains and losses by randomly sampling S/T sites. We divided the

phosphosites in the four classes according to the type of the residue (S or T) and the type of

region where the residue was located (ordered or disordered), and the representation of each

class was respected in the resampling. Disordered regions of proteins were predicted using

DISOPRED (Ward et al, 2004b) using all the fungal protein sequences as a reference

database. We performed a random sampling of S/T positions 1000 times, calculating the

number of gains and losses after each resampling. The ancestral residues occupying the

phosphosite position were determined as follows. We aligned all of S. cerevisiae genes to

the Lachancea kluyveri and Zygosaccharomyces rouxii orthologs, these two species having

diverged from the S. cerevisiae lineage prior to the whole-genome duplication. All the

sequences and the orthology relationships were obtained from YGOB (Gordon et al, 2009)

and alignments performed with MUSCLE (Edgar, 2004) using default parameters.

Orthology relationships were found for 4401 genes (among which 516 out of 553 of S.

cerevisiae paralogous genes). For each quartet of sequences, we inferred the ancestral

sequence at the first node joining the two paralogs (Figure 2.2A). The ancestral protein

sequences were inferred using the codeml method implemented in PAML (Yang, 2007)

using the following parameters (fix_alpha = 0, alpha = 0.04, fix_blength = 2). We

reconstructed ancestral sequences using two different substitution matrices (wag and

dayhoff) and both gave similar results so we are presenting only results derived from the

wag matrix. We examined the robustness of the reconstruction by performing the same

analyses including an additional pre-WGD species (K. thermotolerans) to our set. In this

case we were able to reconstruct the orthology relationships and the ancestral sequence for

4388 genes (among which 516 out of 551 of S. cerevisiae paralogous genes) (Dataset 4).

All analyses were performed using Perl (http://www.perl.org) and R (http://www.r-

project.org/) scripts.

The Lanchacea kluyveri phosphosites were identified as follows. L. kluyveri (formerly

known as Saccharomyces kluyveri) strain FM628 (MATa ura3) was obtained from Marc

Johnston (Washington University). Pre-cultures of 75 ml were grown to OD600 ~ 3

overnight in standard yeast YPD medium at 30°C, agitated at 600 rpm and diluted to OD600

17

= 0.1 in the morning in 1L of YPD. Cells were harvested at OD600~0.6-0.8 by

centrifugation at 4,000 rpm for 20 minutes. The pellet (about 2-3 grams) were suspended in

20 ml of lysis buffer following (Albuquerque et al, 2008) with slight modifications: 50 mM

Tris-HCl (pH 8.0), 150 mM NaCl, 0.2% NonidetP-50, 1.5 mM MgCl2, 0.2 mM EDTA.

The lysis buffer also contained phosphatase inhnibitors phosSTOP (Roche), protease

inhibitors, complete protease cocktail (Roche) and 1 mM PMSF. Samples were quickly

frozen directly in liquid nitrogen drop-by-drop to make 1cm3 frozen pellets and conserved

at -80 °C. Yeast powder extracts were then produced using a Frezzer-Mill (Spex

SamplePrep), which pulverizes cryogenically small pellets with a magnetically driven

impactor submerged in liquid nitrogen. The fine powder was then centrifuged at 14,500

RPM (rotor SA600) for 30 minutes at 4°C. The clear supernatant was treated with

Benzonase (Novagen) to eliminate nucleic acids overnight at 4°C and then cold acetone

precipitated.

Protein pellets were resuspended in 1% SDS/50 mM ammonium bicarbonate (AB) and

microBCA (Pierce) was used to determine protein concentration. Proteins extracts (1 mg)

were reduced for 20 min at 37°C with 0.5 mM Tris (2-carboxyethyl)phosphine,

TCEP(Pierce), alkylated with 50 mM iodoacetamide for 20 min at 37°C and quenched by

adding 50 mM DTT. Samples were diluted 10x with 50 mM AB, digested overnight at

37°C with sequencing grade trypsin (enzyme:substrate, 1:100) (Promega). The digestion

was stopped by adding trifluoro acetic acid (TFA) and was followed by evaporation on a

SpeedVac (Thermo Fisher Scientific, San Jose, CA). Phosphopeptides were enriched on

home-made TiO2-affinity columns (1.25 mg Titansphere, 5 µm, GL Sciences), using 250

mM lactic acid (Fluka) and eluted with 30 µl of 1% ammonium hydroxide, as described

previously (Thingholm et al, 2006). Samples were acidified with 1 µL of TFA, desalted

using 30 mg HLB cartridge (Waters Corporation, Milford, MA), dried and resuspended in

2% acetonitrile, can (Thermo Fisher Scientific), 0.2% formic acid, FA (EMD Chemicals

Inc., Gibbstown) prior to analysis.

Triplicate 2D-nanoLC-MS/MS analysis of phosphopeptides was performed on an LTQ-

Orbitrap XL mass spectrometer (Thermo Fisher Scientific) coupled to an Eksigent LC

18

system. Online SCX separation (Opti-Guard 1mm cation column, Optimize Technologies)

was performed using five different ammonium acetate salt fractions, pH 3.0 (0, 250, 500,

1000 & 2000 mM) in 2% ACN (0.2% FA). Peptides eluted from each salt fraction were

transferred to a pre-column reverse phase trap (4 mm length, 360 µm i.d.) and injected on a

reverse phase analytical column (10 cm length, 150 µm i.d.) (Jupiter C18, 3µm, 300 Å,

Phenomenex). A linear gradient (2 to 25% ACN over 63 min followed by 25 to 40% ACN

over the next 15 min) was applied to separate phosphopeptides, which were directly

injected into the mass spectrometer at a flow rate of 600 nL/min. Detailed MS operation

procedure is described in (Marcantonio et al, 2008). Mascot Distiller v2.1.1 (Matrix

Science, London, UK) was used to extract and preprocess MS/MS spectrum from raw data

file. Peptide identification was done with Mascot v2.2 using Lachancea (Saccharomyces)

kluyveri protein sequence database (http://www.ebi.ac.uk/embl/). The following parameters

were used: parent and fragment tolerance of 0.02 Da and 0.5 Da respectively, trypsin with 2

missed cleavages and the following modifications: carbamidomethyl (C), deamidation

(NQ), oxidation (M), phosphorylation (STY). ProteoConnections (Courcelles et al, 2011)

was used to limit peptide false discovery rate to 1% and evaluate the confidence of

phosphorylation site localisation. MS/MS of all peptide identifications are available at

http://www.thibault.iric.ca/proteoconnections. Phosphosites with a confidence score above

60% were considered for the evolutionary analyses (711 sites in 396 proteins).

2.5 – Results and discussion

2.5.1 - Paralogous phosphoproteins substantially diverged after WGD

Our dataset consists of 2726 phosphosites (serines (S), 82%; threonines (T), 16%; tyrosines

(Y), 2%) that belong to one or the other member of the 352 pairs of yeast WGD paralogs

for which at least one of the two proteins is a phosphoprotein. In this work we focused on

S/T phosphosites as they make up 98% of all phosphosites. Among these sites, 2445 are

unique to one paralog and 118 (that correspond to 236 phosphosites) occur at homologous

positions, a number 7.4 times higher than expected by chance (P<< 0.001, Annex 1, Figure

19

S2.3). Phosphosites diverge in two ways. First are cases where a S/T residue is

phosphorylated in a protein and a residue that cannot be phosphorylated occupies the

homologous position in its paralog (site-divergence). Site-divergence accounts for 69% of

the sites that are unique to one paralog. Second, a S/T is phosphorylated in one paralog and

its homologous position is conserved (S/T) but not observed to be phosphorylated (state-

divergence). Eighty-six percent of homologous sites that are phosphorylated are in fact

state-diverged. This measure of state-divergence is strongly upwardly biased by false

negative (FN) and false positive (FP) identifications and also by the fact that

phosphopeptides that match more than one protein are not included in this dataset. We

considered these issues by comparing the cross-study conservation with the cross-study

reproducibility. We found that state-conservation between paralogs is around 36% for

filtered peptides (considering only phosphopeptides that match a single position in the

proteome) and 54% for unfiltered peptides (considering all phosphopeptides) (Figure

2.1A). Protein sequence, function, localization and/or recognition by protein kinases have

diverged to such extent in 100 My that only 36-54% of their post-translational regulation

by phosphorylation appears to be conserved despite a conservation of the actual residues.

21

Figure 2.1. Conservation and divergence of phosphoregulation among WGD paralogs.

(A) The state-conservation of paralogous proteins was estimated as a regression of the cross-study

conservation on the cross-study reproducibility. A 1:1 relationship is expected if all phosphosites

were state-conserved. Deviation from this 1:1 relationship provides an estimate of state divergence.

Filtered data: phosphopeptides that match a single protein; unfiltered data: all phosphopeptides. (B)

Positive correlation in the number of phosphosites of WGD paralogous proteins. Red dots indicate

average numbers in binned data and green dots the actual data. Green intensities indicate the

number of points at these positions. (C) Proportion of paralogous pairs with significant conservation

as a function of the window size considered. A site is considered conserved if there is a

phosphorylated site in the other paralog within the window (excluding the exact position). (D) Case

of putative local compensation. The fraction of conserved sites as a function of window size is

shown. Blue: observed value; Grey: 95th quantile (100 permutations); Red: average of the expected

distribution. (E) Fraction of paralogous phosphosites or phosphoproteins assigned to the same

protein kinase. Assignments are based on PWMs from (Mok et al, 2010). The observed fraction is

calculated using these assignments while the expected fraction is estimated after shuffling the

assigned kinases among the pairs of paralogous sites. Ptacek: large-scale in vitro kinase-substrate

interactions on microarrays (Ptacek et al, 2005). Ubersax: in vitro Cdc28-substrate interactions

(Ubersax et al, 2003). (F) Distributions of the PWM scores for different classes of sites.

2.5.2 - Conservation and compensation of phosphosite loss by site-position turnover

Surprisingly, despite the low level of site-conservation between paralogous proteins, there

is a highly significant correlation in the number of phosphorylation sites between paralogs

(rho = 0.35, p-value < 2.2×10-16; Figure 2.1B). This correlation remains significant when

the number of phosphosites is normalized by protein length (rho = 0.32 p-value < 6.9×10-

14) or the length of disordered regions (rho = 0.27 p-value < 3.8×10-10), which both tend to

be preserved between paralogs. The correlation is also significant when only site-diverged

phosphosites are considered (rho = 0.28, p-value = 2.0 ×10-11). This correlation suggests

that stabilizing selection is acting to maintain the overall number of phosphosites. This

result is in agreement with a recent study (Beltrao et al, 2009) reporting that the

phosphorylation levels of orthologous protein complexes or pathways between Candida

22

albicans and S. cerevisiae tend to be conserved. The turnover of phosphosite position over

time could be made possible by the fact that sites that appear at a position nearby a site that

is lost can compensate for the loss (Serber & Ferrell Jr, 2007), particularly when the charge

of a region rather than that of a particular residue is important. The redundancy in the

position of phosphosites has been previously proposed to explain the weak site-

conservation among species (Landry et al, 2009), but so far there has been limited evidence

for this (Ba & Moses, 2010; Moses & Landry, 2010).

If this local turnover model is responsible for the overall conservation of the number of

phosphosites, the proportion of conservation between paralogs should increase significantly

if we consider regions of proteins rather than actual positions. We found that to be the case

for a significant but limited number of paralogous pairs. We re-considered the proportion of

state-conserved sites as the proportion of sites in a protein that have a phosphosite in the

homologous region of a given window size in its paralog. We first found that the window

size that maximizes the signal is about 33 amino acids in length (Figure 2.1C). Then, we

found that among the 167 pairs of paralogous proteins where both paralogs have at least

one phosphosite, 11 of them (6.6%) showed a significant level of conservation at that

window length (an example is shown in Figure 2.1D). This result may suggest either that

compensation by near-by sites is relatively uncommon and is specific to some types of

proteins, or that the relatively limited coverage of the yeast phosphoproteome leaves us

with limited power to detect significant compensation. Another possibility is that such

compensation takes place only in highly phosphorylated proteins. Indeed, we found that

paralogous pairs for which there is significant functional compensation have significantly

more phosphosites (mean: 9.28 vs. 3.87; Wilcoxon test: p-value < 9.5×10-11) and also tend

to contain a larger proportion of disordered residues (mean: 53% versus 42%, p = 0.01)

compared to all pairs.

2.5.3 - Life after WGD: rewiring the cellular regulatory networks

Phosphosites are phosphorylated by a variety of kinases that recognize specific motifs

surrounding the phosphosites. As for many eukaryotes, around 2% (120 total) of yeast

23

protein-coding genes code for protein kinases (Zhu et al, 2000). We examined the

conservation of the relationships between our set of phosphosites and yeast kinases by

assigning each phosphosite to a kinase using empirically derived Position Weight Matrices

(PWM) for 61 yeast kinases (Data set S1 from (Mok et al, 2010)). We first found that

WGD paralogs are generally not biased in terms of the protein kinases that regulate them

(rho = 0.99, p-value < 2×10-16, Annex 1, Figure S2.4). Secondly, we found that state-

conserved sites are assigned to the same kinase 44% of the time, a twenty fold increase

over what is expected if phosphosites were randomly matched between paralogs (p-value <

0.0001; Figure 2.1E). This number drops to 23% for state-diverged sites, again supporting

the fact that state-divergence does not entirely result from FN identifications. These sites

are either not being phosphorylated or being phosphorylated by a different kinase in a

different condition not addressed so far in phosphoproteomics studies. This first hypothesis

is supported by the fact that, for state-diverged sites, the assigned scores are significantly

higher for the phosphorylated sites than the non-phosphorylated ones (Figure 2.1F). We

estimated that the state-diverged nonphosphorylated S/T sites in reality comprise 50% of

nonphosphorylated sites (Annex 1, Figure S2.5).

The low percentage of assignment (44%) of the same kinase to state-conserved sites

suggests that the kinases that phosphorylate paralogous sites have changed since the WGD

(Moses & Landry, 2010). We found independent support for this from large scale and

small-scale kinase-substrate interaction experiments (Ptacek et al, 2005; Ubersax et al,

2003) in which kinase-substrate relationships are also conserved in similar proportions

(Figure 2.1E). Overall, these analyses suggest that while a significant fraction of sites is

conserved and phosphorylated in both paralogs, the flanking sequences and/or protein

structure and/or localization have diverged enough for the substrate to be regulated by a

different protein kinase, a regulatory network turnover that is similar to what is observed

for transcriptional networks (Gasch et al, 2004; Moses & Landry, 2010). After 100 My of

evolution, up to 50% of kinase-substrate relationships may be rewired, while preserving the

phosphorylation status of the substrates.

24

2.5.4 - Phosphosite loss dominates site-divergence

A recent study on the budding yeast reported putative cases of neo- and

subfunctionalization of phosphosites (Amoutzias et al, 2010), but did not compare the

extent of those changes to a null model. We therefore sought to quantify whether site

divergence resulted from losses or gains of phosphosites by reconstructing the ancestral

sequences of the paralogous proteins and comparing the observed proportions to the neutral

expectations (Figure 2.2A, 2.2B & 2.2C). We found that 25% of sites correspond to gains

and 31% of sites correspond to losses. These proportions are, respectively, significantly less

and more than expected by chance alone, based on the resampling of phosphorylatable sites

in the same set of phosphoproteins (Figure 2.2C). This remains true for ordered and

disordered regions of proteins, which have been shown to evolve at different rates. We

consider that these losses represent several subfunctionalization events as non-functional

phosphosites (Landry et al, 2009) are expected to evolve as randomly selected S/T. These

results are also unlikely to result from false positives, as we performed the same analyses

on a smaller number of manually curated phosphosites (Nguyen Ba & Moses, 2010);

Annex 1, Figure S2.6) and we observed similar results. Our results are also robust to data

filtering (Annex1, Figure S2.7) and variation in ancestral sequence reconstruction (Annex

1, Figure S2.8).

25

Figure 2.2. Gains and losses of phosphosites after gene duplication.

(A) Inference of gains and losses of phosphosites. Serines (S) and threonines (T) are considered

equivalent with respect to phosphorylation. !S/!T indicates residues that are not a S nor a T, and

pS/pT indicates phosphorylated S/T . (B) Examples of lost (S72), gained (S121) and conserved site

(S103) from the curated dataset (Dataset 2). (C) The number of observed losses is greater than

expected by chance alone and the number of gains shows the opposite result. Results in ordered and

disordered regions agree with each other.

26

A limitation of this analysis is that we have to assume that the phosphorylatable sites (S/T)

of the ancestral sequence that correspond to phosphorylation sites in S. cerevisiae were

phosphorylated in the ancestor. Only a direct observation of the phosphorylation state of

the ancestral proteins would alleviate this problem. We therefore performed a

phosphoproteomics experiment on Lachancea kluyveri (Souciet et al, 2009), a species that

diverged from S. cerevisiae before the WGD event and that can be used as a proxy for

ancestral functions (van Hoof, 2005). We identified 855 phosphosites on 429 proteins

(Annex 1, File S2.1) that we mapped on our alignments. We found that a smaller

proportion of phosphosites identified in L. kluyveri are also phosphorylated in the S.

cerevisiae WGD paralogs (1:2) compared to the 1:1 S. cerevisiae orthologs (Figure 2.3A).

Figure 2.3. L. kluyveri phosphoproteomics confirms that phosphosites are preferentially lost in

paralogous phosphoproteins.

(A) L. kluyveri phosphosites are more likely to be phosphorylated in S. cerevisiae if they are in 1:1

orthologs (142/469 sites in 108 proteins) than in 1:2 orthologs (31/181 sites in 45 proteins). (B)

27

Ratios of the number of sites unique to S. cerevisiae to the number of shared ones with L. kluyveri

for 1:1 orthologs (142/6644 sites in 108 proteins) and 2:1 orthologs (62/2681 sites in 45 proteins).

Assuming that the rate of phosphosite gain in the L. kluyveri lineage was similar in these

two categories of genes (1:1 and 1:2 L.kluyveri-S. cerevisiae orthologs), this result confirms

that phosphosites were more likely to be lost in the S. cerevisiae WGD paralogs and thus

that gene duplication has significantly accelerated the rate of phosphosite divergence. We

also found that the proportion of sites that are uniquely phosphorylated in S. cerevisiae (not

found to be phosphorylated in L. kluyveri) in the WGD paralogs is actually comparable to

the one for the 1:1 orthologs (Figure 2.3B). Under a scenario where phosphosite gains

accelerated the divergence of the WGD paralogs, we would have expected to see a

significantly higher fraction of gains for the 2:1 orthologs compared to the 1:1 ones. Our

phosphoproteomics results therefore support our bioinformatics analyses based solely on

ancestral sequence reconstruction and confirm the prevalence of phosphosite losses in the

divergence of paralogous phosphoproteins.

2.6 - Conclusion

A previous study considering the ancestral function of duplicated WGD proteins has shown

the importance of subfunctionalization in shaping the function of WGD paralogs acting at

the level of protein functions (van Hoof, 2005), whereas investigations of transcriptional

regulation have also found a significant contribution of neofunctionalization in the

divergence of paralogs (Papp et al, 2003; Tirosh & Barkai, 2007). Our results suggest that

at the level of post-translational regulation, subfunctionalization may have been the most

important driving force in shaping the yeast regulatory network. One limitation of our

analysis is that we consider that, when functional, each phosphosite has an independent

function, which may not be necessarily the case, as several cooperative effects among

phosphosites have been reported (Kapoor et al, 2000). The combined and individual effects

of the sub- and neofunctionalized sites will need to be addressed experimentally to estimate

the functional effects of these divergences. Further integrative analyses will also be

required to elucidate the importance of neo- and subfunctionalization that take place at

28

multiple levels (transcription, protein function, PTMs), as these may be largely dependent

on each other (Jensen et al, 2006). Another key finding of our study is that 100 My may be

sufficient to rewire half of the kinase-substrate relationships in a cell. This result is in

agreement with the idea that protein-protein interaction networks evolve rapidly. In about

300 My of evolution, half of all the interactions are supposed to be replaced by new

interactions (Wagner, 2001b).

2.7 - Acknowledgements

We thank H. Wurtele and A. Verreault for the use of their facilities and N. Lartillot and A.

Moses for comments on the manuscript. This work was supported by a Canadian Institute

of Health Research (CIHR) grant GMX-191597 and FRSQ to C. R Landry. C. R Landry is

a CIHR New Investigator. L. Freschi was supported by a Quebec Research Network on

Protein Function, Structure and Engineering (PROTEO) fellowship.

29

Chapter 3 - Where Do Phosphosites Come from and Where Do They Go after Gene Duplication?

Published on: Diss G., Freschi L., Landry C. R (2012), Where do phosphosites come from

and where do they go after gene duplication? Journal of Evolutionary Biology - special

issue: Molecular Evolutionary Routes that Lead to Innovations 2012: 843167.

30

3.1 – Résumé

La duplication des gènes, suivie par la divergence, est un mécanisme important pour

promouvoir des innovations au niveau moléculaire. Si la divergence au niveau de la

régulation transcriptionnelle est bien documentée, nous ne connaissons pas beaucoup de

détails à propos de la divergence causée par les modifications post-traductionnelles

(PTMs). Ici, nous testons si des gains et des pertes d’acide aminés phosphorylées après la

duplication des gènes peuvent modifier de façon spécifique la régulation de ces protéines

dupliquées. Nous montrons que lorsqu’un site de phosphorylation est perdu dans un

paralogue, les transitions vers les acides aminés chargés négativement (qui peuvent mimer

l’état phosphorylé de façon constitutive) sont significativement favorisées. Ces transitions

ne peuvent pas se produire avec une seule mutation, signifiant que la fonction doit être

perdue avant d’être regagnée avec les résidus phosphomimétiques. En conclusion, nous

discutons de comment la duplication des gènes peut faciliter les transitions entre acides

aminés phosphorylés et acides aminés phosphomimétiques.

31

3.2 – Abstract

Gene duplication followed by divergence is an important mechanism that leads to

molecular innovation. Whereas regulatory divergence at the transcriptional level is well

documented, little is known about divergence of posttranslational modifications (PTMs).

Here we test whether gains and losses of phosphorylated amino acids after gene duplication

may specifically modify the regulation of these duplicated proteins. We show that when

phosphosites are lost in one paralog, transitions from phosphorylated serines and threonines

are significantly biased toward negatively charged amino acids, which can mimic their

phosphorylated status in a constitutive manner. Surprisingly, these favoured transitions

cannot be reached by single mutational steps, which suggests that the function of a

phosphosite needs to be completely abolished before it is restored through substitution by

these phosphomimetic residues. We conclude by discussing how gene duplication could

facilitate the transitions between phosphorylated and phosphomimetic amino acids.

32


Gene duplication is one of the most prominent mechanisms by which organisms acquire

new functions (Ohno, 1970). Spectacular examples of such gains of function resulting from

gene duplications are the evolution of trichromatic vision in primates (Dulai et al, 1999),

the evolution of human beta-globin genes that are involved in the oxygen transport at

different developmental stages (Efstratiadis et al, 1980) as well as the expansion of the

family of immunoglobulins and other immunity related genes that shaped the vertebrate

immune system (Boulais et al, 2010; Zhang, 2003). Because of the central role of gene

duplication in evolution, there has been a profound interest for a better understanding of

how these new functions evolve at the molecular level (Hurles, 2004), for determining at

what rate gene duplication occurs (Lynch & Conery, 2000; Lynch et al, 2008; Wagner,

2001a) and for testing whether the retention of paralogous genes necessarily requires the

evolution of new functions (Force et al, 1999; Hurles, 2004; van Hoof, 2005). One of the

most important challenges has been to determine mechanistically how specific mutations

translate into new functions, as establishing sequence-function relationships remains a

difficult task (Dean & Thornton, 2007).

After a gene duplication event, the two sister paralogs are identical copies of their ancestor

and encode two identical functions, thus relaxing the selective constraints on each paralog

(Lynch & Conery, 2000). Under most evolutionary models, both paralogs have to diverge

to be retained on evolutionary time scales, otherwise one paralog would be lost and the

system would return to its ancestral state (non-functionalization) (Hurles, 2004). There are

two ways for paralogs to diverge in function. The first one is the acquisition of new

functions by one or both of the two paralogs, a mechanism called neofunctionalization

(Force et al, 1999; Lynch & Conery, 2000; Ohno, 1970). The second mechanism, called

subfunctionalization, implies the complementary partitioning of the ancestral function

between the two paralogs by losses of functions (Force et al, 1999; Lynch & Conery, 2000;

Lynch & Force, 2000). These two mechanisms are not mutually exclusive because the

ancestral function can be partitioned by subfunctionalization and then one or both paralogs

may acquire new functions by neofunctionalization, a mechanism called

neosubfunctionalization (He & Zhang, 2005). An increase in the dosage of a gene product

33

by the addition of a second identical copy of the ancestral gene can also contribute to the

retention of paralogous pairs, without the need for the gain or loss of functions

(Kondrashov & Koonin, 2004; Kondrashov et al, 2002).

Divergence between paralogs does not necessarily imply a divergence in a specific function

but can also involve a change in the regulation of that function. For instance, the regulatory

control of a protein function can be modified at the transcriptional or at the

posttranslational level. Divergence in expression pattern of duplicated transcript is well

documented (Ferris & Whitt, 1979; Force et al, 1999; Gu et al, 2002; Ohno, 1970). For

example, Gu et al. showed that a large fraction of ancient duplicated gene pairs in yeast

shows divergent gene expression patterns (Gu et al, 2002). A more recent study showed

that nearly half of the genes that duplicated after a Whole Genome Duplication event

(WGD) in a forest tree species have diverged in expression by a random degeneration

process (Rodgers-Melnick et al, 2012). However, little is known about the divergence of

regulation by posttranslational modifications (PTMs), which take place after transcription

and translation and directly affects protein activities (Moses & Landry, 2010).

PTMs are covalent modifications of one or more amino acids that affect the activity of a

protein, its localization in the cell, its turnover rate, and its interactions with other

molecules (Mann & Jensen, 2003). Cells use a wide range of different PTMs to exert

distinct regulations on proteins. Although only 20 amino acids are encoded by the genetic

code, more than 200 amino acid variants or their derivatives are found in proteins after

PTMs (Seo & Lee, 2004). Phosphorylation, the addition of a phosphate moiety from an

ATP donor to a serine (Ser), threonine (Thr) or tyrosine (Tyr) residue by a protein kinase, is

by far the best-known PTM, as it is the most common and is involved in the regulation of

key biological processes of fundamental and medical interest, such as signal transduction

and cell-cycle regulation (Hunter, 2000). Phosphorylation of these amino acids modifies

their biochemical properties in several manners. Of particular interest for this study is the

fact that the addition of a phosphate group brings two new negative charges that allow the

formation of a salt-bridge or contribute to the local charge of the protein (Serber & Ferrell,

2007). Given that a phosphate group is a relatively large molecule, phosphorylation can

34

also have sterical effects. Such properties can notably induce conformational changes of the

protein, modify its catalytic activity or block the access to its catalytic site, which result in

the activation or inhibition of the activity of the target protein by direct or allosteric effects

(Serber & Ferrell, 2007).

Several of the effects of protein phosphorylation can be mimicked by the negatively

charged amino acids aspartic acid (Asp) and glutamic acid (Glu). Indeed, the biochemical

properties of these amino acids are close to those of phosphorylated Ser or Thr residues

(Tarrant & Cole, 2009). In particular conditions, Asp and Glu are constitutive functional

equivalents of phosphosites in a phosphorylated state. This functional resemblance has

been exploited by biochemists by replacing Ser and Thr residues by Asp and Glu in

proteins of interest in order to mimic their phosphorylated status. This molecular mimicry

led them to call Asp and Glu phosphomimetic amino acids (Tarrant & Cole, 2009). This

trick appears to have been also used by nature to evolve new phosphosites. A striking

example comes from the evolution of the Activation Induced cytidine Deaminase (AID)

across vertebrates, an enzyme involved in the generation of antibody diversity. The

interaction of this enzyme with the Replication Protein A (RPA) promotes AID access to

transcribed double-stranded DNA during immunoglobulin class switch recombination. This

interaction requires a negative charge on AID, which is provided by an Asp in bony fish. In

these organisms, the enzyme is constitutively capable of interacting with RPA. In

amphibians and mammals, the function of the Asp residue is carried out by a

phosphorylatable Ser (pSer), which allows the regulation of the protein interaction by

protein kinases in a condition specific fashion (Basu et al, 2008). It was recently suggested

that this type of evolutionary transitions might be common. Globally, it was shown that

pSer tend to evolve from or to phosphomimetic amino acids (Asp and Glu) when gained

and lost respectively throughout the evolution of eukaryotes (Kurmangaliyev et al, 2011;

Pearlman et al, 2011).

Protein phosphoregulation has been suggested to play a role in the evolutionary fate of

paralogous proteins. Most studies done so far focused on the paralogous genes of the

budding yeast Saccharomyces cerevisiae because its phosphoproteome has been intensely

35

studied (Albuquerque et al, 2008; Beltrao et al, 2009; Gnad et al, 2009). Using the yeast

paralogs that derive from the WGD event, Amoutzias et al. showed that the number of

phosphosites on a phosphoprotein is an important determinant for the retention of its

duplicated descendants (Amoutzias et al, 2010). In a following study, Freschi et al. studied

the gains and losses of phosphosites in paralogous phosphoproteins and found that the great

majority of them are present in one paralog and not in the other. This divergence was

shown to be principally driven by losses rather than gains of phosphosites on one paralog

(Freschi et al, 2011). Finally, Kaganovich and Snyder found that phosphosites tend to

diverge more asymmetrically than non-phosphorylated amino acids, playing thus an

important role in paralogous genes divergence and retention (Kaganovich & Snyder, 2012).

These observations raise the question of where do phosphosites come from and where do

they go after a gene duplication. According to the observations on phosphomimetic amino

acids described above, gains and losses of phosphosites could represent two distinct types

of divergence. On the one hand, the gain or the loss of phosphosites from or to a non-

phosphomimetic residue would represent a divergence in the function of the protein. On the

other hand, a gain or a loss could occur from or to phosphomimetic residues, leading to a

modification of the control of the charged residue by the cell rather than a modification of

function per se. Here we test whether this second scenario could have contributed to the

divergence of paralogous proteins using the yeast phosphoproteome as a model.

3.4 - Methods

Dataset

All analyses were performed using the dataset we compiled in a previous study (Freschi et

al, 2011) and that is available at http://www.bio.ulaval.ca/landrylab/download/. This dataset

contains 20,342 phosphosites on 2688 proteins from eight large-scale studies (Albuquerque

et al, 2008; Beltrao et al, 2009; Bodenmiller et al, 2007; Chi et al, 2007; Gnad et al, 2009;

Gruhler et al, 2005; Li et al, 2007; Reinders et al, 2007). It also provides the alignments of

all S. cerevisiae WGD paralogous genes with their ancestral sequence and with the

orthologs of L. kluyveri and Z. rouxii. The aligments were performed using MUSCLE

(Edgar, 2004) while the ancestral sequence was inferred using the Codeml method

implemented in PAML (Yang, 2007). We chose to analyze only two species that diverged

before the Whole Genome Duplication event for the following reasons. The majority of

phosphorylation sites are located in disordered regions (Landry et al, 2009) and these

regions are fast evolving. Alignment of sequences from distantly related species leads to

spurious alignments or to alignments that may contain several indels. Indels decrease the

number of phopshorylation sites available for the analysis, as ancestral sequences cannot be

computed at these positions. Further, in Freschi et al. 2011 (Freschi et al, 2011), we

performed the analyses including an additional species that diverged prior to the whole-

genome duplication and we found that this did not significantly affect our results. Finally,

this dataset also provides information about the localization of each residue in ordered or

disordered regions of the protein, according to predictions made with DISOPRED (Ward et

al, 2004b).

Approaches to study gains and losses of phosphosites

We applied different approaches to study gains and losses coming from, or going to

negatively charged amino acids. In the first approach, we used the ancestral sequence as a

reference to assess the presence of a gain or a loss at a specific position. For the gains, we

compared the proportion of phosphomimetic amino acids in the ancestral sequence (Asp or

Glu) going to pSer or pThr to the proportion of phosphomimetic amino acids going to cSer

and cThr. For the losses, we compared the proportion of phosphorylated residues (pSer and

37

pThr) coming from Asp or Glu to the proportion of non-phosphorylated residues (cSer and

cThr) coming from Asp or Glu, respectively. We required the ancestral sequence to have a

phosphorylatable residue and one of the two paralogs to be phosphorylated at the

homologous position. Comparisons of proportions were performed using Fisher’s exact

tests as implemented in R. In our second approach, we used a parsimony method to

calculate the same proportions. This time we used the sequences of S. kluyveri and Z. rouxii

as reference. In the case of a gain of phosphosites, we required the presence of the same

negatively charged residue (Asp or Glu) in the reference species as well as in one of the

two paralogs and a phosphorylatable residue (Ser or Thr) in the other paralog. In the case of

losses of phosphosites, we required the presence of the same phosphorylatable residue (Ser

or Thr) in the reference species as well as in one of the two paralogs and a negatively

charged residue (Asp or Glu) in the other paralog. All proportions were calculated by

dividing the number of sites coming from or going to an Asp or a Glu by the number of

sites that come from or go to any of the 17 non-phosphorylatable amino acids following the

same criteria (Figure 3.1).

3.5 – Results

The phosphoproteome of S. cerevisiae is the best described among eukaryotes and has been

mapped by mass spectrometry, leading to the identification of high-confidence

phosphosites (Albuquerque et al, 2008; Beltrao et al, 2009; Gnad et al, 2009). We

assembled a data set (Freschi et al, 2011) that consists of 2,726 phosphosites (Ser, 82%;

Thr, 16%; Tyr, 2%) that belong to one or the other member of the 352 pairs of yeast WGD

paralogs for which at least one of the two proteins is a phosphoprotein. We inferred the

ancestral sequence for each pair of paralogs using alignments with orthologous sequences

from Lachancea kluyveri and Zygosaccharomyces rouxii, two species that diverged from S.

cerevisiae before the WGD event. For each pair, we aligned all five sequences, we mapped

the phosphosites on the sequences of the paralogs and analysed phosphosites that diverged,

i.e. cases where a phosphorylatable residue was present in only one paralog.

38

Under a scenario where gains of phosphosites would result from selection for transitions

from phosphomimetic amino acids to phosphorylated residues, we would expect

phosphorylated Ser or Thr (pSer and pThr, respectively) to evolve more often from Asp or

Glu than non-phosphorylated ones (cSer and cThr, respectively). Similarly, under a

scenario where losses of phosphosites would result from transitions from phosphorylated

residues to phosphomimetic amino acids, we would expect pSer and pThr to evolve more

often to Asp and Glu than equivalent cSer and cThr. We tested these two hypotheses as

described in (Figure 3.1).

In the first case, we compared the proportion of pSer and pThr that were gained from Asp

and Glu with that of cSer and cThr, i.e. all serines and threonines from the same set of

proteins that were gained from Asp and Glu but that are not known to be phosphorylated. In

the second case, we compared the ratio of sites that were lost and replaced by

phosphomimetic residues in only one paralog with the ratios derived from cSer and cThr.

We performed the analysis using paralogous ancestral sequences inferred with a likelihood

method and also using a parsimonious approach, whereby the ancestral state of phoshosites

was inferred based on the conservation of the site in one of the two paralogs and its two

orthologs (Figure 3.1A). Global results are presented in Figure 3.2 and detailed analyses are

presented in Figure 3.3.

39

Figure 3.1. Algorithm used to calculate and compare the proportions of transitions between

phosphorylated and phosphomimetic residues relative to control sites.

(A) Phosphosite (pS, pT) gains from phosphomimetic amino acids were identified as cases where

only one of the paralog has a phosphosite and the ancestral sequence has a phosphomimetic residue

at the same position. Control sites (cS, cT) were identified in the same way but considering Ser and

Thr that are not known to be phosphorylated. The ancestral sequence was inferred using likelihood

or parsimony approaches. Phosphosites losses to phosphomimetic amino acids were identified as

cases where one paralog has a phosphosite in a position that is occupied by a phosphomimetic

amino acid in the other paralog and a phosphorylatable amino acid at the same position in the

ancestral sequence. (B) The proportion of pS or pT that evolved from or to D or E was compared to

the proportion of cS or cT that evolved from or to D or E. X represents any amino acid with the

exception of Ser, Thr and Tyr.

40

Figure 3.2. Phosphosites that are differentially lost in paralogous phosphoproteins evolve

toward negatively charged residues.

Each bar represents the percentage of sites (pSer and pThr, cSer and cThr) that evolved from or to

Asp or Glu. Numbers above the bars represent the total number of pSer, cSer, pThr or cThr sites

that were gained or lost. Numbers above the arrows indicate p-values of the Fisher’s exact tests,

bold ones being below 0.05.

41

Figure 3.3. Detailed analysis of the patterns of evolution of pSer and pThr sites.

Each bar represents the percentage of sites (pSer, cSer, pThr or cThr) that evolved from or to Asp or

Glu. Numbers above the bars represent the total number of pSer, cSer, pThr or cThr sites that were

gained or lost. Numbers above the arrows indicate p-values of the Fisher’s exact tests, bold ones

being below 0.05. The top panel shows results obtained by ancestral sequence reconstruction using

a likelihood approach and the bottom panel using parsimony.

42

A gobal analysis of pSer, pThr, Asp and Glu shows that phosphosites tend to be lost to Asp

and Glu more frequently than cSer and cThr, and this holds true for both likelihood (16.6%

vs 12.1%, respectively, p = 0.002) and parsimony (17.1% vs 9.6%, respectively, p = 0.006)

reconstruction methods (Figure 3.2). However, although there is a tendency towards the

gains of phosphosites form Asp and Glu, the observed differences are not significant

(Figure 3.2). When studied separately, phosphosites in ordered and disordered regions

show the same global tendency to go toward phosphomimetic amino acids (Likelihood:

17.5% vs 10.0% in ordered regions, p = 0.058; 16.5% vs 13.7% in disordered regions, p =

0.086. Parsimony: 20.0% vs 8.1% in ordered regions, p = 0.076; 16.7% vs 11.7% in

disordered regions, p = 0.110). This suggests that the effect might be more important in

ordered regions of proteins, as would be expected if these residues were playing structural

roles. Further, we found that phosphosites are not preferentially gained from

phosphomimetic amino acids in disordered regions, while there is a non significant

tendency for this type of transition in ordered regions (Likelihood: 16.0% vs 15.7% in

disordered regions, p = 0.943; 18.8% vs 13.7% in ordered regions, p = 0.294. Parsimony:

14.1% vs 14.2% in disordered regions, p = 1.000; 11.8% vs 10.2% in ordered regions, p =

0.691). Because the distinction between order and disorder reduces the number sites in each

category and does not provide opposite results, we considered both regions simultaneously

in the following analyses.

We also examined which class of substitution could be contributing to this overall result

(Figure 3.3). We first found that pSer and pThr that were gained after gene duplication

follow trends that are in the expected direction although some of the comparisons are not

statistically significant and other results are in the opposite direction (Figure 3.3). However,

this detailed analysis showed that pSer are significantly more likely to evolve to Glu than

cSer (11.6% vs 5.3%, p = 0.008) and pThr evolve significantly more frequently to Asp than

cThr (9.8% vs 4.3% respectively, p = 0.013).

Protein phosphorylation is known to have a key role in regulating protein activities (Cohen,

2000). Evolutionary events such as gains and losses of phosphosites can lead to changes in

protein regulation, thus rewiring the protein regulatory network of the cell (Freschi et al,

43

2011). In the literature, there is evidence for gains of new phosphosites coming from

negatively charged residues among orthologs (Basu et al, 2008; Pearlman et al, 2011) as

well as cases of losses of phosphosites to these amino acids (Kurmangaliyev et al, 2011).

The biochemical properties of Glu and Asp mimic the ones of pSer and pThr with the

exception that their charge is not regulatable (Tarrant & Cole, 2009). These observations

led us to hypothesize that coding sequence divergence of paralogous genes by neo and

subfunctionalization does not strictly involve the apparition or the partitioning of protein

function. Paralogous genes could also diverge in how these functions are regulated.

Divergence in the regulatory control is well known at the transcriptional level (Gu et al,

2005; Rodgers-Melnick et al, 2012), but has not been specifically addressed at the

posttranslational level. We tested this hypothesis on the complete set of WGD

phosphoproteins of the buddying yeast S. cerevisiae.

Using two different methods to infer the ancestral state of phosphorylated and non-

phosphorylated Ser and Thr, we found that pSer and pThr globally have a tendency to

evolve from negatively charged amino acids in paralogous phosphoproteins compared to

their non-phosphorylated counterparts. The tendencies observed are in agreement with our

hypothesis and with the observations made by Pearlman et al. across eukaryotes (Pearlman

et al, 2011). However, the observed differences are not significant, which could be

explained by a few non-exclusive scenarios. First, we are looking at a narrow evolutionary

window (100 My), which contrasts with the analysis conducted by Pearlman et al., who

used aligned sequences from organisms spanning the entire tree of life (Pearlman et al,

2011). Further, the mechanism proposed may apply primarily to few sites and in ordered

regions of proteins. Only few phosphosites in these regions could be analysed here since

the majority of them is found in disordered regions [37], which reduces the statistical power

of our analysis. Our results regarding losses of phosphosites are in line with this hypothesis.

Finally, a significant fraction of phosphosites are thought to be non-functional (Landry et

al, 2009). Because these non-functional sites are not under selective pressure, they may

contribute to decrease the signal coming from functional sites. Nevertheless, from our

results, we cannot rule out the possibility that gains of phosphosites are not more likely to

derive from phosphomimetic residues after gene duplications. A larger sample size, the

44

study of a time window of a different length and a better knowledge of the functional

importance of phosphosites may be needed to provide a final answer.

Following the same approach, we examined whether phosphorylated residues, when lost,

are more likely to be replaced by Asp and Glu than when non-phosphorylated equivalent

residues are lost. We found that this is the case globally and also when considering

individual cases for both pSer and pThr; pSer are more likely to be replaced by Glu

residues while pThr by Asp residues. A similar trend was detectable for the transitions from

pThr to Glu. These results are in agreement with those from Kurmangaliyev et al.

(Kurmangaliyev et al, 2011) who also showed that pSer are more likely to evolve to

phosphomimetic amino acids than cSer in the divergence of orthologs between species. Our

results show that the evolutionary trajectories of pSer and pThr provide a mechanism for

paralogous protein divergence. Our analyses support the hypothesis that divergence

between paralogs can be generated by a loss of the posttranslational regulatory control on a

function rather than by the complete loss of the function itself. Indeed, the substitution of a

phosphosite for an Asp or a Glu residue may block one paralog into a single constitutive

functional state whereas the other one remains regulatable by protein kinases and

phosphatases.

3.6 - Conclusion

Our results raise the question of how these transitions are made possible during evolution.

The genetic code is organized in such a way that transitions between phosphorylatable and

phosphomimetic amino acids involve a transition state with an amino acid that is not

negatively charged, except for transitions between two Asp and two Ser codons that

involve a Tyr residue (Figure 3.4).

45

Figure 3.4. Transitions between phosphorylatable and phosphomimetic amino acids need to

go through a non-negatively charged intermediate.

However, Tyr is only rarely phosphorylated in yeast and Tyr residues are not

phosphorylated by the serine/threonine kinases (Ubersax & Ferrell, 2007), which suggests

that this path would not be favoured. A non-negatively charged intermediate could lead to a

complete loss of the function that was performed by the negative charge and could thus be

deleterious (Figure 3.5A).

46

Figure 3.5. A duplication event could provide the conditions for the intermediate non-

functional site to be neutral, which would allow a transition without affecting the fitness of the

organism.

(A) Without a duplication event, the loss of a negative charge could have deleterious effects if the

charge is important for the function of the protein. (B) The redundant paralogous gene copy could

serve as a backup and prevent deleterious effects created by the loss of the charge. The backup copy

could then be retained or lost. In the latter case, the system would be different from its ancestor.

Here we propose that the relaxed constraints that follow a gene duplication event could

provide the mean to reach this intermediate state and to go beyond (Figure 3.5B). After

gene duplication, when one of the duplicated copies is lost, the system is assumed to go

back to its ancestral state, a process called non-functionalization (Lynch & Conery, 2000).

However, following our model, the duplicated copy could serve as a backup for a transition

period, which would allow the other copy to reach a state that would have been unreachable

otherwise (Gordon, 1994; Hansen et al, 2000; Scannell & Wolfe, 2008). After the loss of

the backup copy, the system would remain different from its ancestral state since the

phosphorylation profile and thus the phosphoregulation of this protein has changed. The

47

term non-functionalization may thus not be suitable for such cases. In the case of a WGD

event, where the vast majority of the duplicated genes are eventually lost and are thought to

return back to their ancestral state, these 2-step transitions could potentially lead to a great

burst in the evolution of phosphoregulation. Further studies at different time points

following gene duplication would be needed to determine how important this mechanism

could be for the evolution of phosphosites.

3.7 - Acknowledgements

This work was supported by a Canadian Institute of Health Research (CIHR) Grant GMX-

191597 and Natural Sciences and Engineering Research Council of Canada discovery grant

to C. R Landry. C. R Landry is a CIHR New Investigator. G. Diss and L. Freschi were

supported by fellowships from the Quebec Research Network on Protein Function,

Structure and Engineering (PROTEO). We thank the members of the Landry laboratory,

two anonymous referees and N. Aubin-Horth for comments on the manuscript.

49

Chapter 4 - Functional Divergence and Evolutionary Turnover in Mammalian Phosphoproteomes

Published on: Freschi L., Osseni M., Landry C.R (2014) Functional Divergence and

Evolutionary Turnover in Mammalian Phosphoproteomes, PLoS genetics 10 (1), e1004062

50

4.1 – Résumé

Ici, nous avons étudié l’évolution de la phosphorégulation chez les mammifères en

comparant les phosphoprotéomes de l’homme et la souris. Nous avons trouvé que 84% des

positions qui sont phosphorylées dans une espèce ou l’autre sont conservées au niveau des

résidus. Vingt pourcent de ces sites conservés sont phosphorylés dans les deux espèces.

Cette proportion est 2.5 fois plus grande que ce qui est attendu par chance. Cela suggère

que la sélection purificatrice tend à préserver la phosphorégulation. L'autre 80% des sites

qui sont conservés au niveau du résidu sont différentiellement phosphorylés chez l’homme

et la souris. Nos résultats suggèrent qu’au moins 5% de ces sites ont le potentiel d’être des

vrais cas de divergence entre les réseaux de phosphorylation de ces deux espèces et cela

même si le résidu est conservé dans les protéines orthologues des deux espèces. Nous avons

aussi montré que le turn-over évolutif des sites de phosphorylation qui se trouvent dans des

positions adjacentes chez l’humain ou la souris mène à une surestimation de la divergence

de phosphorégulation dans ces deux espèces. Notre étude propose des analyses avancées

des phosphoprotéomes et un cadre pour l’étude de leur contribution à l’évolution

phénotypique.

51

4.2 - Abstract

Here, we studied the evolution of mammalian phosphoregulation by comparing the human

and mouse phosphoproteomes. We found that 84% of the positions that are phosphorylated

in one species or the other are conserved at the residue level. Twenty percent of these

conserved sites are phosphorylated in both species. This proportion is 2.5 times more than

expected by chance alone, suggesting that purifying selection is preserving

phosphoregulation. The other 80% sites that are conserved at the residue level are

differentially phosphorylated between species. We showed that least 5% of them are likely

to reflect true cases of phosphoregulatory divergence between mouse and humans.

Moreover, we showed that evolutionary turnover of phosphosites at adjacent positions in

human or mouse leads to an over estimation of the divergence in phosphoregulation

between these two species. Our study provides a framework for the study of

phosphoregulatory divergence contribution to phenotypic evolution.

52


Most proteins undergo chemical modifications after their synthesis (post-translational

modifications, PTMs). These modifications allow a fine-tuning of protein functions and

represent a mechanism to expand the coding capacity of genes (Nussinov et al, 2012). Over

the past decade, methods based on mass spectrometry have accelerated the discovery of

PTMs (Beausoleil et al, 2004; Choudhary et al, 2009; Huttlin et al, 2010; Kim et al, 2011;

Olsen et al, 2006; Zielinska et al, 2010). Each experiment can now detect thousands of

modified residues, allowing to probe the functional state of entire proteomes. The PTM that

has been studied the most is protein phosphorylation: the addition of a phosphate group to

specific amino acids (serine (S), threonine (T) and tyrosine (Y) in eukaryotes).

Phosphorylation has been shown to affect protein functions, interactions, stability and

localization (Khmelinskii et al, 2009; Madeo et al, 1998; Sprang et al, 1988; Vazquez et al,

2000). It is thus of fundamental importance to understand how protein phosphorylation

evolves within and between species because changes in phosphorylation profiles may cause

changes in protein function and regulation and in organismal phenotypes, including disease

development (e.g. (Herbig et al, 2000)).

There have been several reports recently on the evolution of phosphoproteomes. For

instance, Kim and Hahn (Kim & Hahn, 2011) identified phosphorylation sites that emerged

after the split between humans and chimpanzees and found that these sites are located in

proteins involved in crucial biological processes such as cell division and chromatin

remodelling. Other studies have looked at the evolution of a subset of phosphoproteomes

on a broader evolutionary scale (Boulais et al, 2010; Malik et al, 2008). For example,

Boulais and collaborators (Boulais et al, 2010) performed a phosphoproteomics analysis of

mouse phagosomal proteins and then compared these proteins to their orthologs from 10

model organisms, from Drosophila to mouse (Boulais et al, 2010). They observed that the

phagosomal phosphoproteome was extensively rewired during evolution, but that some

phosphorylation sites have been maintained for more than a billion years, suggesting their

importance for phagosomal functions. Finally, other studies looked at the conservation and

divergence of entire phosphoproteomes over a broad evolutionary scale (Boekhorst et al,

2008; Gnad et al, 2007; Landry et al, 2009; Tan et al, 2009) (and reviewed in (Levy et al,

53

2012)) in order to understand the evolutionary mechanisms and the constraints acting on

phosphorylation sites. These studies found that phosphorylated residues tend to be on

average more conserved than their non-phosphorylated counterparts (Gnad et al, 2007;

Landry et al, 2009) and that this is particularly true for those that were experimentally

shown to play functional roles (Landry et al, 2009).

Most studies that aimed at studying the evolution of phosphoproteomes so far have looked

at the evolutionary conservation of phosphorylation sites in several species without

knowing if these sites are actually phosphorylated in species other than the reference. In

other words, if a phosphorylation site in one species corresponds to a phosphorylatable

amino acid in another species, both residues were considered as conserved phosphorylation

events. This assumption was necessary because of the lack of phosphorylation data

available for more than one species. However, we can hypothesize that residue

conservation does not always imply phosphoregulatory conservation. Indeed, sites could be

conserved at the residue level but differ in their phosphoregulation due to changes

elsewhere in the protein, for instance, the recognition motifs of the protein by kinases and

phosphatases (Ubersax & Ferrell, 2007) or upstream (in trans) in the signalling cascade.

This aspect has not been addressed by previous studies, except in a few cases (Beltrao et al,

2009; Boulais et al, 2010). However, identifying such sites is of great interest since sites

that differ in their phosphoregulation despite being conserved at the residue level could lead

to changes in the architecture of phosphorylation networks and, ultimately, contribute to

phenotypic evolution. We examine this issue here.

Another aspect of phosphoproteomes that can be studied using evolutionary analysis is how

phosphorylation sites alone or in combination may affect the function of a protein

(Nussinov et al, 2012). Many models of phosphorylation site function stress the importance

of conformational changes by protein phosphorylation (Barr & Bogoyevitch, 2001;

Nussinov et al, 2012; Skou, 1965). In other models, phosphorylation sites regulate protein

functions without the need for conformational changes but rather through changes in the

local charge of the protein (Serber & Ferrell Jr, 2007), i.e. simply through bulk

electrostatics. A corollary of this last model is that the protein phosphorylation code is

54

redundant, i.e. that phosphorylation sites can change their position over time and still

maintain their biological function as long as the number of sites in a given protein region is

preserved, without affecting organismal phenotypes. By looking at the patterns of evolution

of phosphorylation sites, one could find traces of this redundancy by studying rapid

phosphorylation site evolutionary turnover (phosphorylation site gains and losses). This

evolutionary turnover has been invoked for interpreting the global rapid pattern of

evolution in different species (Ba & Moses, 2010; Freschi et al, 2011; Gnad et al, 2007;

Landry et al, 2009; Macek et al, 2008; Malik et al, 2008). However, evidence for positional

redundancy of phosphorylation sites is relatively limited. Two independent pieces of

evidence come from the cell cycle phosphorylation networks. Moses and collaborators

(Moses et al, 2007) studied the evolution of cyclin-dependent kinase (CDK) consensus

phosphorylation sites of the yeast pre-replicative complex (Bell & Dutta, 2002). They

found that although orthologous proteins contained clusters of CDK consensus sites, the

position and the number of phosphorylatable sites were not conserved, suggesting that

phosphorylation sites tend to shift their positions during evolution. In a more recent

investigation, Holt and collaborators (Holt et al, 2009) compared the positions of 547

phosphorylation sites on 308 Cdk1 substrates in vivo in the budding yeast and their

orthologous sites in other fungi. They found that the precise positioning is conserved only

in the very closely related species. However, in both cases the phosphorylation status of the

sites in other species was not investigated so it is not clear whether the phosphorylation

sites were absent from the orthologous proteins or if they actually shifted during evolution

through gains or losses to another position. The extent to which phosphorylation site

positional redundancy plays a role in overall phosphoproteome turnover therefore awaits

comprehensive phosphorylation data from closely related species, which we have

assembled here.

We performed an integrated analysis of phosphorylation site evolution between the human

and mouse proteomes using a large dataset of phosphorylation sites (Beltrao et al, 2012;

Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava

Prasad et al, 2009; Minguez et al, 2012). These two phosphoproteomes are the ones for

which we have the greatest amount of phosphoproteomics data between closely related

55

species. We estimated the extent of divergence and conservation between the two

phosphoproteomes and we investigated whether phosphorylation site evolutionary turnover

could contribute to this divergence.

4.4 - Methods

Phosphoproteomics and sequence data

An extensive dataset of human and mouse phosphorylation sites was built by combining

data from 7 different databases and experimental studies (Beltrao et al, 2012; Dinkel et al,

2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava Prasad et al,

2009; Minguez et al, 2012). All protein sequences and orthology relationships were

retrieved from ENSEMBL (version 69). In this study, only protein sequences for which we

could find orthology relationships between a human protein and at least a mouse, dog and

opossum protein were considered. This step allowed us to study the evolutionary history of

phosphorylation sites. For humans and mouse, orthology relationships were determined for

the longest isoforms of each protein. Each group of orthologous sequences was aligned

using MUSCLE (Edgar, 2004). Disordered and ordered regions of proteins were predicted

using DISOPRED (Ward et al, 2004a). In order to map phosphorylation sites to our

sequences, the following procedure was applied. The sites that were already mapped onto

proteins associated with ENSEMBL IDs in the original datasets were directly mapped to

our sequences. For all other cases, phosphopeptides were mapped onto proteins using

BLAT (Kent, 2002). All peptides that mapped to more than one protein were removed at

this step. Mapped phosphorylation sites and information about protein disorder are

available in Annex 2, Dataset S2.

Calculating random expectations for phosphorylation sites

In order to calculate the random expectation for the number of sites belonging to each one

of the different categories (StC, StD and SiD), statuses (0: non-phosphorylated, 1:

phosphorylated) of phosphorylatable amino acid were shuffled in each protein by

preserving the overall proportion of sites for each residue (S, T or Y) and the localization in

disordered/ordered regions. The null distributions were estimated by iterating this

56

procedure 100 times, calculating each time the number of sites belonging to each category.

We calculated random expectations by shuffling the mouse sites only. We also performed

the calculations by independently shuffling both human and mouse sites and found similar

results.

Protein abundance data and classes of abundance

Data on protein abundance were taken from PaxDb (Wang et al, 2012) (H. sapiens whole

organism integrated dataset). In the analysis presented on Figure 4.1D, proteins were

ordered by their abundance and divided in four equal bins.

Housekeeping proteins, tissue specific proteins and sites with known function

Data on housekeeping genes were retrieved from Eisenberg and Levanon (Eisenberg &

Levanon, 2003) who identified 575 human genes that are expressed in 47 different tissues

and cell lines based on microarray data. Data on tissue-specific genes derive from an

independent dataset and were retrieved from the TiGER database (Liu et al, 2008). About

5.3 millions human ESTs were mapped to UniGene clusters and the expression pattern of

the all UniGenes in 30 human tissues was determined using the NCBI EST database. 7,261

tissue-specific genes were identified. Manually curated data on functional phosphorylation

sites (n = 156) were retrieved from Landry et al. (Landry et al, 2009). These sites were

derived from the manual curation of the primary literature.

NetPhorest and position weight matrices scores

NetPhorest (Miller et al, 2008) was downloaded from (http://netphorest.info) and was run

locally using default options. In order to calculate position weight matrices scores, 29

position weight matrices which scores are based on the same metric were obtained from

Benjamin Turk (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al,

2009; Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010;

Pike et al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012). These

matrices were used to score all 10-mer amino acids in the mouse and human proteomes that

have a phosphorylatable amino acid on the sixth position. The score reflects the probability

of each 10-mer to be phosphorylated by a specific kinase.

57

Comparison of proportions, distributions and correlations

Proportions were compared with 2-sample tests for equality of proportions with continuity

correction. Distributions were compared with non-parametric Wilcoxon Rank Sum tests.

Correlations were calculated with the Spearman method. All these statistical analyses were

performed as implemented in R.

Algorithm to identify evolutionary clustered sites phosphorylation sites pairs

Site colocalization in orthologous proteins was estimated using a window of positions

(centered on each human phosphorylation site). The fraction of colocalized sites over the

total number of sites was calculated for a range of window sizes. In order to determine

which sites were closer in sequence linear space than expected by chance alone, the mouse

phosphorylation sites were shuffled in each protein by preserving the overall proportion of

sites for each residue (S, T or Y) and disordered/ordered regions, and the fraction of

colocalized sites was calculated for each window length. One thousand iterations were

performed in order to generate the null model. Also, we masked all the positions in which a

phosphorylatable amino acid was present at a given position in both human and mouse.

Evolutionary clustered sites were defined as sites that were more likely to be colocalized

than expected by chance alone (null model). The closest pair of phosphorylation sites

present in these windows was then selected (see also Annex 2, Figure S4.1). The

phosphorylatable amino acids serine (S) and threonine (T) differ in biochemical properties

compared to tyrosine (Y), another phosphorylatable amino acid (Taylor et al, 1995).

Therefore, S/T and Y sites were considered as belonging to separate classes and not

considered to be able to compensate each other. Only 1,529 pairs of orthologous proteins

that had at least two phosphorylation sites that diverged (site-divergence) in human and

mouse respectively were considered. Among these pairs, 563 had at least one SiD site that

involves a phospho-serine or phospho-threonine in both humans and mouse. Only one

single pair had a SiD site that involves a phospho-tyrosine in both humans and mouse.

58

Testing if evolutionary clustered sites tend to be phosphorylated by the same kinase or

group of kinases

The kinase that was the most likely to phosphorylate each one of the evolutionary clustered

sites was inferred using NetPhorest (Miller et al, 2008) and proportion of evolutionary

clustered site pairs phosphorylated by the same kinase was determined. This number was

compared to a null distribution obtained by randomly shuffling (10,000 iterations) the

kinase-phosphorylation site associations between different evolutionary clustered sites.

Analogous analyses were performed for StC and StD sites. We then performed the same

analysis but this time using the three best kinases predicted by NetPhorest, as proposed by

Tan et al. (Tan et al, 2009). We therefore considered two evolutionary clustered sites as

being phosphorylated by the same group of kinases if they shared one or more kinases

(kinase group) among the three best kinases predicted to be associated to each site

according to NetPhorest. This number was compared to a null distribution obtained by

randomly shuffling (100 iterations) the kinases-phosphorylation site associations between

different evolutionary clustered sites. Analogous analyses were performed for StC and StD

sites. Finally, we performed again all the analyses described above but this time using

position weight matrices from the literature (see section NetPhorest and position weight

matrices scores for further details) instead of NetPhorest to infer the kinase that was the

most likely to phosphorylate each one of the StD, StC and evolutionary clustered sites.

59

4.5 – Results

4.5.1 - Conservation and divergence between human and mouse phosphoproteomes

We assembled a dataset of human (n = 106,877) and mouse (n = 54,400) phosphorylation

sites by collecting data from 7 different databases and experimental studies (Beltrao et al,

2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010;

Keshava Prasad et al, 2009; Minguez et al, 2012) (Annex 2, Table S4.1). We successfully

mapped 128,705 sites onto 11,150 human and mouse orthologous proteins: 86,065 in

humans and 42,640 in mouse (Annex 2, Figure S4.2). As previously observed (Iakoucheva

et al, 2004; Landry et al, 2009), phosphorylation sites are preferentially located in

disordered regions of proteins (observed vs. expected proportions: 0.69 vs. 0.62, p-value =

2.2 × 10-16). Given this asymmetry in the localization of phosphorylation sites, we

generated all the null models of our analyses by respecting the proportion of sites in these

two structural categories. Our dataset allows comparing the human and mouse

phosphoproteomes using both sequence information and the phosphorylation status of each

site. Accordingly, we classified orthologous sites into three classes following Freschi et al.

(Freschi et al, 2011) (Figure 4.1A): i) Site-diverged (SiD): sites phosphorylated in one

species and non-phosphorylatable in the other; ii) State-conserved (StC): sites

phosphorylated in both species; iii) State-diverged (StD): sites that are conserved at the

residue level but that have been reported to be phosphorylated in only one of the two

species.

In order to examine the extent of conservation of phosphorylation between human and

mouse, we estimated the fraction of sites belonging to each of these three categories

compared to the total number of sites that are phosphorylated in human, mouse or in both

species. We first looked at phosphorylation site divergence. We found that 16,863 sites

(16% of the sites that are phosphorylated in human or mouse or both species) are SiD

(Figure 4.1B). These sites are about 1% less abundant than random expectations obtained

by shuffling the phosphorylation statuses of S/T/Y residues (Figure 4.1B), suggesting that

purifying selection is acting on phosphorylation sites to maintain their function but to a

limited extent, as previously observed with different approaches (e.g. (Landry et al, 2009)).

These sites, if functional, are expected to reflect differences in phosphoregulation between

60

human and mouse. However, a fraction of these SiD sites might be positionally redundant

site pairs such that the functional divergence may be overestimated (see below).

We examined other types of conservation and divergence. We first found that 20,146

phosphorylation sites (18% of the sites that are phosphorylated in human or mouse or both

species, Figure 4.1B) are StC. This proportion is 2.5 times greater than what is expected by

chance alone (Figure 4.1B). We observed this strong signal for conservation in both

disordered and ordered regions (Annex 2, Figure S4.3). These results suggest an overall

conservation of the phosphorylation profiles between the two species, most likely as a

result of purifying selection acting to maintain the phosphoregulation of these sites. We

performed a similar analysis on clusters of poly-S/T/Y (stretches of two or more

consecutive S/T/Y residues) rather than single residues and found the same patterns of

conservation and divergence (Annex 2, Figure S4.4).

61

Figure 4.1. Purifying selection is acting on mammalian phosphorylation sites and their

phosphorylation status.

(A) Site-diverged (SiD) sites are orthologous residues where one is phosphorylated and the other is

a non-phosphorylatable amino acid (any amino acid but S, T and Y). State-conserved (StC) sites are

62

orthologous phoshorylatable residues (S, T, Y) that are both reported to be phosphorylated. Finally,

state-diverged (StD) sites are orthologous phosphorylatable residues for which only one of the two

is phosphorylated. Circles with the P symbol indicate residue phosphorylation. Colors indicate the

different categories of sites. (B) Number of observed SiD, StC and StD sites and their respective

expected distributions as estimated by randomizing mouse phosphorylation sites. (C) Three

scenarios for StD sites: false positive and false negative identifications; rapidly evolving non-

functional phosphorylation sites; divergence in phosphoregulation. (D) Relationship between state-

conservation and protein abundance. The four classes of protein abundance have the same number

of proteins. (E) Comparison of the proportion of StC and StD sites in housekeeping and tissue-

specific proteins. (F) Comparison of the proportion of sites with known functions present in StC and

StD sites.

Despite an overall signal of conservation on the phosphorylation status of proteins, the

most represented category of sites in our dataset is StD sites (71,550 sites or 66% of the

sites that are phosphorylated in human, mouse or both species). Three different non-

exclusive scenarios could explain this large number of StD sites (Figure 4.1C). The first

one implies that state divergence results from an incomplete coverage of phosphoproteomic

data, which means that the phosphoproteomes of the two species might have been

undersampled, for instance sampled at different depths or in different conditions or tissues

(e.g. (Huttlin et al, 2010)). The second scenario is that a large fraction of the StD sites

identified might result from non-functional phosphorylation sites. Non-functional

phosphorylation sites evolve rapidly (Landry et al, 2009) and could therefore lead to the

poor conservation on the phosphorylation status we observed. The third scenario is that a

fraction of StD phosphorylation sites is actually diverging in its regulation. Finally, state-

divergence could also be inflated by false positive identifications in one species or the

other.

We examined which scenario or scenarios were compatible with our data. According to the

first scenario, StD may mostly result from false-negative phosphorylation sites in the data.

This is certainly the case for an important part of the data as our dataset contains twice as

much phosphorylation data for humans than mouse, and humans are not expected to have

63

more phosphorylation sites than mouse. We reasoned that if state-divergence is caused by

false-negatives in the datasets, we would expect to see the fraction of StC to increase as a

function of protein abundance, since highly abundant proteins are more likely to be

sampled in both species than rare proteins. Indeed, we found that the proportion of state

conserved sites almost doubles between the two extreme classes of abundance (Figure

4.1D, see also Figure 4.2A). Admittedly, this effect could also be caused by the fact that

phosphoregulation is more conserved on highly-expressed proteins but it is unlikely, as it

was recently shown that abundant proteins are enriched in non-functional phosphorylation

sites (Levy et al, 2012) that evolve relatively rapidly (Landry et al, 2009). In addition, only

conserved residues are considered in this analysis.

We also examined whether StC or StD phosphorylation sites were more likely to be found

in housekeeping or tissue-specific proteins. Housekeeping proteins are expressed in all

tissues, while tissue-specific ones are expressed in one or a few tissues. Accordingly, if StD

sites are affected by false negatives we would expect to find them preferentially in tissue-

specific proteins. We examined the dataset of housekeeping genes (Eisenberg & Levanon,

2003) and tissue-specific genes (Liu et al, 2008) and found that StC sites are preferentially

found in housekeeping proteins compared to StD sites (proportions: 0.027 vs. 0.019, p-

value = 0.005, Figure 4.1E), while the trend is reversed if we look at tissue specific proteins

(proportions: 0.268 for state diverged vs. 0.219 for StC, p-value = 6.1 × 10-5, Figure 4.1E).

This result is in agreement with our hypothesis that StD sites are affected by false

negatives, although this effect could be due to the fact that phosphoregulation is more

conserved on housekeeping proteins.

In order to examine whether non-functional phosphorylation sites could contribute to poor

state-conservation between species, we used a manually curated dataset of functional

phosphorylation sites compiled by Landry and collaborators (Landry et al, 2009).

Functional sites were identified as sites for which a phenotype was observed when

phosphorylatable residues were mutated. If non-functional sites contribute to state-

divergence, we would expect functional sites to be overrepresented in StC sites. We found

that StC sites are enriched in functional phosphorylation sites compared to StD sites

64

(proportions: 0.0025 vs. 0.00046, p-value < 1.19 × 10-14, Figure 4.1F). This observation

suggests that a fraction of the StD sites we identified might be non-functional

phosphorylation sites, which would explain their poor conservation status between species.

It is important to consider that in both cases these observations are not biased by residue

conservation as both StC and StD categories are composed of only phosphorylatable

residues.

4.5.2 - A role for state-diverged sites in phosphoproteome divergence

Our observation that the majority of StD sites might result from false-negative

phosphorylation site identifications or might be non-functional does not rule out the

possibility that at least some of these sites could be actual StD sites that diverge in

regulation, for instance due to the sequences surrounding the phosphorylated residues.

Kinase recognition motifs on substrates are difficult to compare directly due to their

degeneracy (Ubersax & Ferrell, 2007). We therefore relied on kinase prediction tools for

our analyses. We assigned each site to a protein kinase using the NetPhorest classifier

(Miller et al, 2008) to associate protein kinases with all phosphorylation sites based on the

site flanking sequences. NetPhorest classification is based on an atlas of consensus

sequence motifs that covers 179 kinases and 104 phosphorylation-dependent binding

domains and was built using in vivo and in vitro experimental data (Miller et al, 2008). If a

site is phosphorylated in one species but not in the other, the sequences surrounding the

phosphorylatable residue should match a kinase consensus motif better for the

phosphorylated site than for the orthologous non-phosphorylated one. Given that

NetPhorest provides a score (from 0 to 1) for many possible kinase-substrate associations,

we selected the kinase having the best NetPhorest score and we used this score as a proxy

to assess the probability of a given site to be phosphorylated. We relaxed this assumption in

some of our analyses. In addition, we performed the same analyses directly using a

collection of position weight matrices derived from mammalian kinases and the results are

in agreement with what we find with the NetPhorest predictions (Figure S4.5).

We first examined whether there was an association between S/T/Y phosphorylation and

NetPhorest scores and found that the probability for a site to be phosphorylated strongly

65

increases with increasing NetPhorest scores in both mouse and human data (Figure S4.6).

Another result in support of this observation is that the fraction of state conserved sites

increases as a function of NetPhorest scores (Figure 4.2A) and this relationship is

independent from protein abundance. We also found that prediction scores are very similar

for StC sites (median scores: 0.32 for the human phosphorylation sites vs. 0.32 in mouse

ones, p-value = 0.54) and higher than those of sites conserved at residue level but non-

phosphorylated in both species (median scores: 0.32 for StC vs. 0.20 for non-

phosphorylated residues, p-value = 2.2 × 10-16; Figure 4.2B and Figure S4.7A-B). This

confirms again a strong association between NetPhorest scores and the probability that a

site is phosphorylated. Surprisingly, we found that scores of StC sites were also higher than

the scores of the phosphorylated residues in the StD class (median scores: 0.32 vs. 0.22 for

humans, p-value = 2.2 × 10-16; 0.32 vs. 0.26 for mouse, p-value = 2.2 × 10-16; Figure 4.2B-

C and Figure S4.7A-B). This means that sites that are conserved and phosphorylated in

both species have a significantly better match to consensus kinase motifs than those that are

conserved at the residue level but phosphorylated in one species only.

67

Figure 4.2. Analysis of NetPhorest scores for the different classes of sites.

(A) Fraction of StC sites as a function of NetPhorest scores and protein abundance. (B) Comparison

of NetPhorest scores for human and mouse phosphorylated and non-phosphorylated residues

(Wilcoxon tests). (C) Comparison of NetPhorest scores for StD sites (Wilcoxon tests). (D)

Correlation between human and mouse NetPhorest scores for StC sites (red) and StD sites

phosphorylated in human but not in mouse (black). (E) Correlation between human and mouse

NetPhorest scores for StC sites (red) and StD sites phosphorylated in mouse but not in human

(black). (F) Proportion of phosphorylated sites that have higher NetPhorest scores compared to their

corresponding site in the other species for StC and StD sites. Comparisons of human and mouse

scores calculated with position weight matrices are shown in Figure S4.5. *: p-value < 0.05; **: p-

value < 0.01; ***: p-value < 0.001.

There are several possible explanations for these differences. First, this result could derive

from how predictive tools have been developed. For instance, phosphorylation sites may be

more often studied on abundant proteins, which would imply that kinase prediction tools

are better trained at recognizing phosphorylation sites present on abundant proteins. We

tested this hypothesis and found that there is no increase in the average NetPhorest scores

as a function of protein abundance (Figure S4.8), showing that the NetPhorest classification

is not biased towards sites present in highly abundant proteins. Another possibility is that

StD sites contain a significantly higher proportion of false-positive phosphorylation sites

compared to StC sites, as the latter have been found to be phosphorylated in the two species

in completely independent experiments and thus have much stronger experimental support.

Indeed, false positive sites would have low NetPhorest scores, similar to non-

phosphorylated ones and would therefore contribute lowering the average NetPhorest score

for the residues that are phosphorylated in StD sites compared to StC sites. A third

possibility is that StD sites could contain a proportion of non-functional phosphorylation

sites with non-consensus motifs as shown before by Landry and collaborators (Landry et al,

2009) who found that phosphorylation sites matching kinase motifs have a higher degree of

evolutionary conservation and are thus more likely to be functional. Altogether, these

results suggest that the match to a consensus sequence motif could be used to the

68

prioritization of phosphorylate sites for downstream functional analysis in

phosphoproteomics experiments.

Despite these potentially confounding factors, we found evidence that StD is at least partly

caused by divergence in regulatory motifs. We found that scores of phosphorylated StD

sites are significantly higher than those of their non-phosphorylated orthologous

counterparts in both pairwise comparisons (phosphorylated in human vs. non-

phosphorylated in mouse, median scores: 0.216 vs. 0.214, p-value = 3.93 × 10-5;

phosphorylated in mouse vs. non-phosphorylated in humans, median scores: 0.255 vs.

0.245, p-value = 6.38 × 10-5; Figure 4.2C). The fact that we see the effects in both

directions rules out the possibility that NetPhorest scores are systematically higher in

humans. In order to identify among the set of StD sites the ones that have the potential to be

true StD sites, we directly compared matching orthologous NetPhorest scores of StC and

StD sites. We found a strong correlation between the NetPhorest scores for StC sites (rho =

0.95, p-value < 2.2 × 10-16) and a weaker correlation between the scores of the StD sites,

and this both for those phosphorylated in humans but not in mouse (rho = 0.89, p-value <

2.2 × 10-16, Figure 4.2D) and for those phosphorylated in mouse but not in humans (rho =

0.88, p-value < 2.2 × 10-16, Figure 4.2E). This result is confirmed when comparing the

proportion of StD sites having higher scores in humans than in mouse to the same

proportion calculated for StC. We found a slight but significant excess of StD sites having

higher scores in human than in mouse compared to StC sites (proportions: 0.284 vs. 0.258,

p-value = 8.69 × 10-13, Figure 4.2F). We found similar results for the StD sites having

higher scores in mouse compared to humans (proportions: 0.291 vs. 0.261, p-value = 8.69 ×

10-11, Figure 4.2F). By summing up all these excess StD sites that show high NetPhorest

scores in one organism but low scores in the other we concluded that that at least 5% of the

StD sites (either phosphorylated in human or mouse) present in our dataset have the

potential to be sites that are differentially regulated between species, despite a conservation

of the actual phosphorylatable residues. Our results do not depend on the NetPhorest

algorithm as we performed the same analyses using position weight matrices available from

the literature (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al,

2009; Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010;

69

Pike et al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012) and all of

our conclusions about StC and StD sites were mirrored in these tests, as shown in Figure

S4.5. Overall, our results show that in addition to the actual divergence in phosphorylated

sites (SiD), a significant fraction of the mouse and human phosphoproteomes have diverged

through changes in the kinase recognition motifs. These changes in the phosphoregulatory

status of proteins represent changes in the protein regulatory network, as illustrated for a

particular subnetwork in Figure 4.3.

Figure 4.3. Comparison of a pair of StC and StD sites.

(A) Example of StC site (human protein: NUCL; site S28). Both sites are predicted to be

phosphorylated by the same kinase (CK2) by NetPhorest. The human and mouse kinase-

phosphorylation networks are shown for the 10 StC sites with the highest NetPhorest scores (Table

S2). The width of the edges is proportional to the NetPhorest score. (B) Example of StD site

(human protein: NIN; site S1145). The two phosphorylation sites are predicted to be phosphorylated

by different kinases (human: CK2, mouse DMPK) by NetPhorest. The human and mouse kinase-

phosphorylation networks are shown for the 10 StD sites with the highest difference in NetPhorest

70

scores (Annex 2, Table S4.2). Dotted lines represent predicted kinase-phosphorylation site

associations that have been rewired in mouse considering the human network as reference.

Potential StD sites are located in proteins that have fundamental cellular functions, making

them good candidates for the investigation of species-specific mechanisms of regulation.

Further examples are available in Annex 2, Table S4.2.

4.5.3 - Evolutionary turnover of mammalian phosphorylation sites

We next examined whether the positional turnover of phosphorylation sites could

contribute to SiD between mouse and humans. One prediction of this model is that sites

that are lost in one lineage could be compensated for by the gain of other sites in the

proximity (Freschi et al, 2011). Similarly, sites could change their positions as a result of

insertions and deletions in the surrounding regions. In order to test this prediction, we

developed an algorithm to identify evolutionary clustered sites (Freschi et al, 2011), i.e.

pairs of sites that are SiD between mouse and humans and that are closer to each other in

the linear protein space than expected by chance alone (Annex 2, Figure S4.1).

We found that 123 site pairs belonging to 68 proteins show significant evolutionary

clustering of SiD phosphorylation sites (Annex 2, Table S4.3; alignments are available in

Annex 2, Dataset S1). Ninety percent of the proteins that contain evolutionary clustered site

pairs have only one or two of them (Annex 2, Figure S4.9) with few exceptions (Annex2,

Table S4.4). This number also excludes proteins for which we found a high number of

evolutionary clustered site pairs due to large clusters of sites that we did not consider

(NOL8, 10; KI67, 27; MDC1, 180 site pairs). The median NetPhorest score for these sites

is 0.29, suggesting that they are likely to be phosphorylated and not false-positives (0.20 is

the median score for non-phosphorylated residues while 0.32 is the median score for

phosphorylated residues). The typical window within which we found significant clustering

between SiD sites is 10 amino acids (Annex 2, Figure S4.10) and approximately 80% of the

sites are less than 40 amino acids distant in the alignment. The observed number of site

pairs (n = 123) is likely an underestimate of the contribution of evolutionary site turnover

71

because we need many possible configurations in the neutral model to identify them and

phosphorproteomes have likely been under sampled. We found that the proportion of

proteins that show significant evolutionary clustering increases with the proportion of

available sites (Annex 2, Figure S4.11). Furthermore, we found that the number of

evolutionary clustered sites is correlated with protein size (rho = 0.26, p-value = 0.03) and

may thus be biased towards large proteins.

If these clustered SiD sites were functionally equivalent at the network level between the

two species, we would expect them to be phosphorylated by the same kinases or group of

kinases. We used again NetPhorest to test this hypothesis. We determined the proportion of

StC, StD and evolutionary clustered sites that were likely to be phosphorylated by the same

kinases or group of kinases (overlap of one or more kinases among the three best kinases

predicted by NetPhorest) (Tan et al, 2009) and we compared these observations to the

random expectations obtained by shuffling the mouse kinase-substrate associations. We

found that the proportion of StC and StD sites predicted to be phosphorylated by the same

kinases or group of kinases was more than 7 times greater than expected by chance alone,

suggesting that, globally, these sites tend to be phosphorylated by the same kinases or

group of kinases (Figure 4.4A-B).

Figure 4.4. Proportion of sites that are phosphorylated by the same protein kinase.

(A) Proportion of sites phosphorylated by the same kinases (NetPhorest predictions) for the

different categories of sites (StD: state diverged, StC: state conserved, ECS: evolutionary clustered

72

sites). Black dots represent the observed proportion. Orange lines represent the range of proportions

expected by chance alone. P-values for StC and StD: < 0.0001; p-value for ECS: 0.03. The

histogram shows the distribution expected proportions for ECS. A similar analysis was performed

using position weight matrices (Figure S4.5). (B) Proportion of sites phosphorylated by one or more

shared kinases (kinase group) among the three best kinases predicted to be associated with each site

according to NetPhorest. P-values for StC, StD and ECS: < 0.01.

We found a slightly significant tendency (p-value = 0.03) for the evolutionary clustered

sites to be phosphorylated by the same kinase (Figure 4.4A). We then performed the same

analysis, but considering the three best kinases found by NetPhorest assuming that

phosphorylation sites could be functionally conserved if they are phosphorylated by closely

related kinases as well, as in Tan et al. (Tan et al, 2009). We found that evolutionary

clustered sites were 1.4 times more likely to be phosphorylated by the same group of

kinases than expected by chance alone (p-value < 0.01; Figure 4.4B). This result suggests

that, in general, many evolutionary clustered sites may actually be functionally equivalent.

Finally, we performed this analysis using position weight matrices available from the

literature (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al, 2009;

Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010; Pike et

al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012) and found

qualitatively similar results (Annex 2, Figure S4.5F).

Evolutionary clustered sites could arise through losses and gains of phosphorylation sites in

the two lineages. Our algorithm identifies evolutionary clustered sites, but it cannot tell

whether these represent gains of phosphorylation sites that compensated for deleterious

losses in the same lineage or whether they were simply the result of indels that affected the

position of the sites in the human and mouse protein alignments. We therefore aligned the

mouse and human proteins with several orthologs belonging to species that diverged after

the human-mouse divergence (Figure 4.5A) and manually curated the data in order to

identify the possible evolutionary steps that led to these configurations of phosphorylation

sites.

73

Figure 4.5. Evolutionary histories of candidate functionally redundant site pairs.

(A) Phylogeny of the species considered for the analysis of evolutionary clustered sites. For all

species we show the species name, the three-letter identifier and the common name. (B) Alignment

of the Fanconi anemia group M protein (FANCM). Evolutionary clustered sites are indicated in

bold. Residues that have been reported to be phosphorylated are on a green background. (C)

Alignment of the disabled homolog 2 protein (DAB2). (D) Alignment of the low-density lipoprotein

receptor-related protein (LRP2).

74

We manually identified many cases (n = 17, 14%) of evolutionary clustered sites that were

most likely caused by indels changing protein length and thus alignment. An example is in

the Fanconi anemia group M protein, an ATPase implicated in DNA repair (Meetei et al,

2005) in which S1673 and S1674 are shifted towards the C-terminal in the mouse lineage

(Figure 4.5B). The remaining 86% (n = 106) of the cases of evolutionarily clustered sites

could not be simply explained by indels and may thus represent compensatory evolutionary

events. We observed such a case in the protein DAB2 (human site: S723; mouse site:

S731), which plays a potential role in ovarian carcinogenesis (Fazili et al, 1999) (Figure

4.5C). The human S723 has been gained after the split of the Haplorrhini from the other

primates, while the second one (S731) has been lost after the split between the rodents and

the primates. Another example involves the human T4634 and the mouse site S4632 on

LRP2 (Figure 4.5D). This protein is a membrane receptor of absorptive epithelial cells.

Mutations in this protein are associated with Donnai-Barrow syndrome, a genetic syndrome

that leads to defects in vision, hearing, craniofacial features and structural abnormalities in

brain (Kantarci et al, 2007). In this case the human T4634 site appeared in primates after

the split from rodents, while the mouse S4632 site was lost after the split of the

Strepsirrhini from the other primates. The biological function of these phosphorylation sites

has not been determined but they represent prime candidates for exploring, at the molecular

level, the positional redundancy of phosphorylation sites.

4.6 – Conclusion

Here we compared the human and mouse phosphoproteomes in order to gain a detailed

picture of phosphoregulatory conservation and divergence between these two species. We

found that, globally, phosphorylation sites tend to be conserved between human and mouse.

By using phosphorylation data from both species, we showed that the number of the sites

that are phosphorylated in both human or mouse is 2.5 times higher than expected by

chance alone. In addition, we estimated phosphorylation status divergence. We found that

the majority of phosphorylation sites that are conserved at the residue level between human

and mouse are actually divergent with respect of their phosphorylation status (StD sites).

While this is most likely largely due to incomplete coverage between the two species, we

75

showed that at least 5% of the StD sites are actually diverging at the kinase-substrate

interaction level. We also found that phosphorylation sites that are phosphorylated in both

species are more likely to be functional and have higher kinase assignment scores,

suggesting that this conservation criterion could be used to prioritize phosphorylation sites

for further characterization (Beltrao et al, 2012; Landry et al, 2009). Taken together, these

results suggest that more data is needed in these two species to be able to completely assess

the conservation and divergence of their phosphoproteomes. Furthermore, the candidate

StD sites might have specific regulatory properties that still have to be characterized and

understood. A better understanding of these properties will allow us to make an important

step towards in our attempt to describe and explain how small regulatory differences map

to the important phenotypic differences among species. Mouse is the best model system to

study human biology and diseases. It is therefore important to understand how these two

species diverge and phosphoregulatory evolution may play an important role.

We identified sites that are phosphorylated in one species but that have diverged in the

other so that the site is not phosphorylatable (SiD sites). While the biological meaning of

the majority of these sites still remains to be assessed, our analysis suggests that many of

them could be functionally redundant. This result supports the finding by Moses and

collaborators that phosphorylation site evolutionary turnover has a role in shaping

phosphoregulation (Moses et al, 2007). If the redundancy hypothesis holds true, we might

need to revisit estimations of phosphorylation conservation, since omitting positional

redundancy may lead to an underestimation of phosphorylation site functional

conservation. Moreover, this implies that we should consider different categories of

phosphorylation sites: the ones for which the position along the protein is a determinant for

their function (positionally-dependent phosphorylation sites) and those for which the global

charge rather than the exact position is responsible for their function (positionally-flexible

phosphorylation sites).

76

4.7 – Acknowledgements

We thank A. Moses, A. Nguyen Ba and all members of the Landry laboratory for their

comments on the manuscript. We also thank B. Turk (Yale University) for providing the

position weight matrices used in this study. This work was supported by Canadian

Institutes of Health Research (CIHR) (GMX-191597). C. R Landry is a CIHR New

Investigator. L. Freschi was supported by a fellowship from the Fonds de Recherche du

Québec - Nature et Technologies (FRQ-NT) and L. Freschi and M. Osseni by the Quebec

Research Network on Protein Function, Structure and Engineering (PROTEO).

77

Chapter 5 – Cross-talk between O-GlcNAcylation and phosphorylation in mammalian proteomes

78

5.1 – Résumé

Les modifications post-traductionnelles sont des interrupteurs moléculaires qui permettent à

la cellule d’exercer un contrôle fin sur la fonction de ses protéines. Dans certains cas un

résidu peut subir plusieurs de ces modifications qui peuvent activer/désactiver la même

fonction de la protéine ou des fonctions différentes. C’est le cas, par exemple, de la

phosphorylation et de la glycosylation qui affectent les sérines et thréonines des protéines.

Ici, nous avons étudié si ces deux modifications pouvaient agir comme des interrupteurs

pour la même fonction biologique ou pour des fonctions différentes. Nous avons trouvé que

les résidus qui peuvent atteindre trois états (non modifié, phosphorylé, O-GlcNAcylé) ont

un niveau de conservation plus élevé comparé comparativement à ceux qui ne peuvent

atteindre que deux états (non modifié, phosphorylé ou non modifié, O-GlcNAcylé). De

plus, nous avons trouvé que les résidus qui peuvent atteindre trois états ont tendance à être

phosphorylés par des kinases différentes comparativement aux résidus qui peuvent

atteindre deux états seulement. Nos résultats supportent l’hypothèse que la phosphorylation

et la O-GlcNAcylation contrôlent deux fonctions différentes plutôt que la même fonction.

79

5.2 - Abstract

Post-translational modifications (PTMs) are molecular switches that allow the cell to finely

tune proteins functions. In some cases a residue can be modified by multiple and alternative

PTMs that can activate/deactivate the same protein function or different functions. This is

the case for serine and threonine residues, that can be phosphorylated and O-GlcNAcylated.

Here, we investigate wheather these two PTMs may act as switches for the same biological

function or different functions. We found that there is a greater evolutionary constraint for

the residues that can shuttle between 3 states (non-modified, phosphorylated, O-

GlcNAcylated) compared to the ones that can shuttle between 2 states only (non-modified,

phosphorylated or non-modified, O-GlcNAcylated). Moreover, we found that 3-state and 2-

state residues are likely to be regulated by different sets of kinases. Our results support the

hypothesis that at least in humans, phosphorylation and O-GlcNAcylation control multiple

functions rather than the same one.

80

5.3 - Introduction

Post-translational modifications (PTMs) are chemical modifications of proteins that allow

the modulation of protein functions and represent a mean to extend the coding capacity of

genes (Prabakaran et al, 2012). PTMs modulate protein activity, localization, degradation

and interactions (Khmelinskii et al, 2009; Madeo et al, 1998; Sprang et al, 1988; Vazquez

et al, 2000). Proteins can undergo several PTMs and progresses achieved in mass

spectrometry technologies in the last decade allow to screen entire proteomes for the

identification and quantification of these PTMs (Olsen & Mann, 2013). Examples of PTMs

include protein phosphorylation, the addition of a phosphate group to serines, threonines

and tyrosines and O-GlcNAcylation, the addition of an O-linked β-N-acetylglucosamine

moiety to serines and threonins (Zeidan & Hart, 2010). Given the large number of

modifications any protein can bear, one major question that emerged recently is whether

these PTMs affect each other’s function, i.e. whether they cross talk to each other (Beltrao

et al, 2013; Brooks & Gu, 2003; Hunter, 2007; Latham & Dent, 2007). This interaction

would in principle define a PTM “code” that would allow the cell to implement complex

regulatory programs at the level of single proteins. Indeed, each PTM allows the protein to

assume a new configuration or state that often determines changes in protein function

(Deribe et al, 2010).

Two general modes of cross-talk have been reported in the literature: positive and negative

cross-talk (Hunter, 2007). Positive cross-talk refers to a scenario in which one PTM

promotes the direct or indirect addition or removal of a second modification. An example

of this mode of cross-talk is the phosphorylation-dependent ubiquitynation of the Sic1p

protein in yeast (Nash et al, 2001; Verma et al, 1997). Sic1p is an inhibitor of the Cyclin

Dependent Kinases (CDK), important regulators of cell cycle progression. This inhibition

has to be released in order to to start DNA replication. The phosphorylation of Sic1p by

CDKs at multiple sites allows the ubiquitinylation of Sic1 by Cdc4. This event determines

Sic1p degradation, thus allowing the cell to progress through the cell cycle. Another

notable example of positive cross-talk is the interplay between lysine residues on the

human histones whereby the methylation of Lys-27 of histone H3 increases the probability

of the methylation of Lys-36 (Schwammle et al, 2014).

81

The opposite mode of action, the negative cross-talk, implies that one PTM impedes

another modification to occur. Examples of negative cross talk have been reported and

include again the methylated lysines of histone H3. For instance, the tri-methylation of Lys-

4 inhibits the methylation of Lys-9 (Schwammle et al, 2014). The importance of these

cross-talks is illustrated by their use by microorganisms to take the control of the cell or

shut down the immunitary response. Indeed, an example of this scenario is represented by

the human protein MAPKK6. The phosphorylation of this protein on critical serine and

threonine residues is required to activate the downstream MAPK kinases in the innate

immune response to pathogens. Mukherjee and collaborators (Mukherjee et al, 2006) found

that Yersinia species use the effector protein YopJ to acetylate these critical residues on

MAPKK6. This competition between phosphorylation and acetylation for the same sites

prevents the activation of MAPKK6, allowing Yersinia to usurp the eukaryotic cellular

signalling and block a pathway that is crucial for the innate immune response activation.

Of particular interest are the cross talks among PTMs that occur on the same residues.

PTMs occurring on the same residues are by definition exclusive and thus have the

potential to directly affect each other’s functions. Examples of such PTMs reported in the

literature are acetylation, ubiquitinylation, methylation and SUMOylation in lysines

residues (Latham & Dent, 2007) as well as O-GlcNAcylation and phosphorylation in

serines and threonines residues (Hart et al, 2011). Although different PTMs can regulate the

same function, they could also regulate different protein functions (Beltrao et al, 2013;

Benayoun & Veitia, 2009). For instance, previous studies have shown that the cross-talk

between protein lysine acetylation and ubiquitination has effects on protein stability (Caron

et al, 2005). Acetylation at lysine residues of proteins prevents their ubiquitination and,

ultimately, their degradation. In this case, the cross-talk between acetylation and

ubiquitinylation regulates the same protein property or function: protein stability. On the

other hand, different PTMs occurring on the same site can regulate two distinct functions in

different contexts, i.e. for instance in different tissues, steps of the cell-cycle or different

cell compartments. For instance, Kamemura and collaborators (Kamemura et al, 2002)

found that the Thr-58 residue of the c-Myc protein is preferentially phosphorylated or O-

82

GlcNAcylated in a condition dependent fashion in presence or absence of mitogens,

suggesting that different cellular roles of c-Myc are regulated by these two PTMs.

Here we examine the putative cross talk between protein phosphorylation and O-

GlcNAcylation. In the last decade the interest for O-GlcNAcylation and its interaction with

phosphorylation has grown as showed by the recent studies that have unveiled the role of

this modification in regulating key steps of cellular metabolism (Ruan et al, 2013). Further,

O-GlcNAcylation is one of the few post-translational modifications for which more than

1,000 sites have been experimentally detected (Khoury et al., 2011).

Phosphorylation and O-GlcNAcylation occur on a specific set of serine and threonine

residue. Some residues are not modified and therefore can be only found in one

configuration (1-state sites); others are phosphorylated but not glycosylated or vice-versa,

thus having one more possible configuration (2-state sites). Finally some sites can be

glycosylated and phosphorylated (on different molecules of the same protein or at different

times) and can therefore be found in three states (3-state sites). We sought to determine if

phosphorylation and O-GlcNAcylation may act on 3-state sites as two independent

switches that regulate different biological functions or may act as a single switch to

regulate one single function. We hypothesized that the two PTMs regulate two functions. In

this case, we should observe that (i) phosphorylation and O-GlcNAcylation do occur at the

same residues more often than expected by chance. In addition, we would expect to observe

that (ii) the 3-state sites evolve slower than the 2-state ones, since the two functions

constitute a stronger constraint. This trend should be observed if we consider the site or its

flanking regions, since PTM sites are defined by motifs of amino acids rather than single

amino acids. Finally, we would expect (iii) the 3-state sites to show different preferences

for protein kinases compared to the 2-state ones, since for the 3-state sites the

phosphorylation is expected to be more condition-dependent (e.g. (Kamemura et al, 2002)).

We tested all these predictions on the human and mouse phosphoproteomes. Overall, our

analyses support the hypotheses that there is a cross-talk between phosphorylation and O-

GlcNAcylation and that these two PTMs are likely to control different cellular functions.

83

5.4 - Methods

Phosphorylation, O-GlcNAcylation, protein disorder, sequence data and protein

abundance data

An extensive dataset of human and mouse phosphorylation and O-GlcNAcylation sites was

built by combining data from 8 different databases and experimental studies about

phosphorylation (Beltrao et al, 2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al,

2012; Huttlin et al, 2010; Keshava Prasad et al, 2009; Minguez et al, 2012; Trinidad et al,

2012) and 5 ones about O-GlcNAcylation (Alfaro et al, 2012; Hornbeck et al, 2012; Lu et

al, 2013; Trinidad et al, 2012; Wang et al, 2011). To our knowledge this set of

phosphorylation and O-GlcNAcylation sites is representative of the data currently available

on the literature. All protein sequences and orthology relationships were retrieved from

ENSEMBL (version 69). Only protein sequences for which we could find orthology

relationships between a human protein and at least a mouse, dog and opossum protein were

considered. For humans and mouse, orthology relationships were determined for the

longest isoforms of each protein. Each group of orthologous sequences was aligned using

MUSCLE (Edgar, 2004). Disordered and ordered regions of proteins were predicted using

DISOPRED (Ward et al, 2004a). In order to map phosphorylation sites to our sequences,

the following procedure was applied. The sites that were already mapped onto proteins

associated with ENSEMBL IDs in the original datasets were directly mapped to our

sequences. For all other cases, phosphopeptides were mapped onto proteins using BLAT

(Kent, 2002). All peptides that mapped to more than one protein were removed at this step.

Mapped phosphorylation and O-GlcNAcylation sites and information about protein

disorder are available in Dataset S5.1 (available on request). Finally, data about protein

abudance was retrieved from PaxDb (Wang et al, 2012).

Shuffling procedure used to determine random expectations

In order to calculate the random expectation for 3-state modified sites, O-GlcNAcylation

sites were shuffled in each protein by preserving the overall proportion of sites for each

residue (S or T) and the localization in disordered/ordered regions. The null distributions

84

were estimated by iterating this procedure 1000 times, calculating each time the number of

3-state sites. To calculate the random expectations for the localization of the 3-state sites in

disordered or ordered regions we considered the two PTMs as one single modification and

we performed the shuffling reassigning this modification preserving the overall proportion

of co-occurrences per residue (S or T).

Evolutionary conservation

The Rate4Site software with default options was used to calculate the evolutionary rates for

the 1-state, 2-state and 3-state serines and threonines (Pupko et al, 2002). The raw

evolutationary rates were normalized with the following procedure. For each residue type

(e.g. serine located in a disordered region) the average evolutionary rate for that residue

type in that protein was calculated. Then, for each residue the evolutionary rate calculated

with Rate4Site was divided by the average evolutionary rate of the residue type in that

protein. In order to avoid the bias of having a different number of species in the alignments

used to determine the evolutionary rates, the same analysis was performed using alignments

from a previous study (Landry et al, 2009). Finally, in order to avoid the potential biased

determined by the algorithm used to calculate the evolutionary rates the analyses were also

performed with another algorithm, as described by (Gray & Kumar, 2011).

Kinase-phosphorylation site associations

NetPhorest (Miller et al, 2008) was downloaded from (http://netphorest.info) and was run

locally using default options. The kinase-phosphorylation site associations were determined

by ranking all possible associations determined by NetPhorest according to their score and

taking the one with the best score.

85

5.5 - Results

5.5.1 - An extensive dataset of phosphorylation and O-GlcNAcylated sites

We built a dataset of human (n = 86,065) and mouse (n = 43,013) phosphorylation sites by

collecting data from 8 different databases and experimental studies (Beltrao et al, 2012;

Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava

Prasad et al, 2009; Minguez et al, 2012; Trinidad et al, 2012). We successfully mapped

these sites onto 8,889 human and 5,903 mouse proteins. We also built a dataset O-

GlcNAcylation sites in human (n = 613) and mouse (n = 810) from 5 different databases

and experimental studies (Alfaro et al, 2012; Hornbeck et al, 2012; Lu et al, 2013; Trinidad

et al, 2012; Wang et al, 2011). We mapped these sites onto 262 human and 316 mouse

proteins respectively, in which we counted 105 and 156 co-occurrences (in humans and

mouse, respectively). Sixty-five human proteins and 84 mouse proteins contained at least

one co-occurrence of phosphorylation and O-GlcNAcylation sites (3-state sites). Previous

studies reported that phosphorylation sites tend to be located in disordered (unstructured)

regions (Iakoucheva et al, 2004; Landry et al, 2009). Three-state residues also tend to be

located in disordered regions in both organisms (p-value < 0.005; Annex 3, Figure S5.1).

5.5.2 – Phosphorylation and O-GlcNAcylation are found in the same residues more

than expected by chance alone

We first examined whether within the co-modified proteins the two PTMs occur on the

same residue. We counted the number of 3-state residues and we randomly shuffled the O-

GlcNAcylation sites in each protein to generate a null model that reflects the random

expectations for each species separately. We found that the number of 3-state residues is

1.3-times greater than expected by chance in both species (p-value < 0.001), thus

supporting rejecting our null hypothesis of independence between phosphorylation and O-

GlcNAcylation (Figure 5.1).

86

Figure 5.1. Number of 3-state sites in human and mouse and comparisons to random

expectations.

Number of 3-state (phosphorylatable and O-GlcNAcylatable) sites in human (A) and mouse (B) and comparisons to random expectations (1000 iterations, p-value < 0.001).

One potential confounding factor of our analyses is that phosphorylation and O-

GlcNAcylation tend to be sampled more often on highly-abundant proteins, which would

artificially inflate their co-occurrence. For instance, we recently showed that there is a

detection bias for phosphorylation sites towards highly abundant proteins (e.g. (Freschi et

al, 2014)) and this could also be true for O-GlcNAcylated proteins, thereby increasing the

probably of finding both modifications on highly abundant proteins. Our results are in line

with these expectations (Figure 5.2A,B), suggesting that in general, PTMs tend to be

preferentially detected on highly abundant proteins.

88

Figure 5.2. Fraction of sites as a function of protein abundance for human and mouse O-

GlcNAcylation sites and comparison of average protein abundance between all proteins and

proteins that contain 3-state sites for humans and mouse.

Fraction of sites as a function of protein abundance for human (A) and mouse (B) O-

GlcNAcylation sites and comparison of average protein abundance between all proteins and

proteins that contain 3-state sites for humans (C) and mouse (B). Data about protein abudance was

retrieved from PaxDb (Wang et al, 2012).

We also found that 3-state sites tend to be preferentially found in proteins with high

average protein abundance in mouse, but not in humans (Figure 5.2C,D). This difference

could reflect a functional difference between humans and mice, but more likely it is a side

effect of biases in the protein sampling in mouse. We examined the distributions of protein

abundance for the proteins with 3-state sites, showing that the sampling bias towards highly

abundant that we observed in mouse is common across the different studies (Annex 3,

Figure S5.2).

5.5.3 - Clues of independent regulation of multiple functions in humans but not in mouse

In previous studies of phosphorylation site evolution, residue conservation has been

associated to functional roles in protein regulation (Beltrao et al, 2012). We therefore

hypothesized that if phosphorylation and O-GlcNAcylation regulate independent protein

functions, 3-state sites should be more conserved than 2-state sites while if they regulate the

same function, the evolutionary rates of the 3-state sites should be approximately the same

as 2-state sites. We estimated the rates of evolution of all serines and threonines of

phosphorylated or glycosylated proteins using alignments of 16 species. We normalized

each rate by the average evolutionary rate of each residue type within each protein (see

methods) so that the rates become independent of protein abundance or structural properties

(order or disorder), both of which have been shown to affect rates of evolution (Landry et

al, 2009; Levy et al, 2012). We compared the distribution of the evolutionary rates of 1-

state (non-modified) serines and threonines to those of 2-state and 3-state modified

residues. We found that 3-state modified residues are more conserved over evolution

89

compared to 2-state and 1-state residues (Figure 5.3A) in humans. We also observed that on

average O-GlcNAcylated sites are more conserved than phosphorylation sites. However,

we did not observe the same trend in mouse (Figure 5.3B).

90

Figure 5.3. Comparison of residue conservation for 1-state, 2-state and 3-state residues in the

human and mouse proteomes.

Comparison of residue conservation for 1-state, 2-state and 3-state in the human (A) and mouse (B)

proteomes (Wilcoxon tests: n.s.: non-significant; *: p-value < 0.05; **: p-value < 0.01; ***: p-value

< 0.001). Panels (C) and (D) show the same analysis on the human proteome using a different

measure of evolutionary conservation (Gray & Kumar, 2011) or different sequence alignments

(Landry et al, 2009). A green circle indicates phosphorylation, while a blue one indicates O-

GlcNAcylation.

In order to avoid a potential bias determined by the number of species used to calculate the

evolutionary rates or the method used, we performed the same analysis using an

independent method (Figure 5.3C) (Gray & Kumar, 2011) and alignments that have been

used to calculate evolutionary rates in previous studies (Figure 5.3D) (Landry et al, 2009).

We also looked at the regions (+/- 5 amino acids) surrounding the 1-state, 2-state and 3-

state sites and we found that 3-state sites tend to be more evolutionary conserved compared

to 1-state and 2-state sites (Figure 5.4).

91

Figure 5.4. Comparison of the evolutionary conservation of the regions surrounding 1-state, 2-

state and 3-state residues (+/- 5 amino acids) for the human proteome.

(Wilcoxon tests: *: p-value < 0.05; **: p-value < 0.01; ***: p-value < 0.001). A green circle

indicates phosphorylation, while a blue one indicates O-GlcNAcylation.

This result again supports out hypothesis that phosphorylation and O-GlcNAcylation

overall are likely to control independent functions.

5.5.4 – Three state sites and 2-state ones have different preferences for protein kinases

If 3-state sites allow the regulation of multiple functions, they should have some features

that distinguish them from 2-state ones. Both 2-state phosphorylated and 3-state sites can

be phosphorylated. We would therefore expect 3-state sites to be phosphorylated by a set of

kinases that differs from the ones for 2-state ones. To test this prediction, we determined

the kinase-phosphorylation site associations for all the phosphorylated and 3-state residues

using NetPhorest ((Miller et al, 2008), see also methods) and we compared the likelihoods

being phosphorylated by a given kinase for 2-state phosphorylated residues and 3-state

residues. We found that 3-state residues show a clear preference for certain kinases

92

compared to the residues that are phosphorylated only, and this holds true for both mouse

and human (Figure 5.5).

Figure 5.5. Kinase preferences of 3-state residues for human and mouse proteins.

Kinase preferences of 3-state residues for human (A) and mouse (B) proteins. The associations were

determined using NetPhorest. Colored bars represent significant trends (p-value < 0.05): green

indicates significat preference while orange indicates significat avoidance.

Examples of such kinases include ATM/ATR, GSK and RCK. The example of ATM is of

particular interest, since this protein is involved in the response to DNA damage and its

deregulation leads to cancer (Kastan, 2008). The link between O-GlcNAcylation and ATM

has been already been shown in a recent study (Miura et al, 2012). Moreover, the function

of this protein is also regulated by protein phosphorylation (Kozlov et al, 2011). The role of

the 3-state sites in this protein has now to be investigated experimentally in order to

understand how they integrate the regulatory programs encoded by phosphorylation and O-

GlcNAcylation.

93

5.6 - Conclusion

Here, we focused on the cross-talk between phosphorylation and O-GlcNAcylation in

human and mouse. We sought to extend the previous studies on the cross-talk about these

two PTMs (Hart et al, 2011; Zeidan & Hart, 2010) by performing a proteome-wide analysis

in which we used the most recent available proteomics data. We found that the number of

3-state (glycosylated and phosphorylated) serines and threonines is grater than expected by

chance, suggesting that these sites could have a potential role for protein regulation.

Evolutionary conservation is an indicator of functionality (Beltrao et al, 2012; Landry et al,

2009) and our results show that in humans the 3-state sites tend to be significantly more

conserved than both the 2-state ones. This suggests that phosphorylation and O-

GlcNAcylation may act as independent switches to regulate two sets of protein functions,

since if they acted on the same function we would have expected to see the 3-state modified

sites not differing in their level of conservation compared to the 2-state modified ones.

Finally, we tried to associate some putative functions to the 3-state modified sites and we

found that are more often associated with some kinases compared to sites that are

phosphorylated only. Our finding that phosphorylation and O-GlcNAcylation of 3-state

sites are likely to regulate independent functions does not rule out the fact that indeed for

many 3-state sites phosphorylation and glycosylation may act as one single switch. The

most realistic scenario is that the cell uses a mixture of these two modes of function to

finely tune its functions. The study of the cross-talk between phosphorylation and O-

GlcNAcylation is still at its dawn, but the die is cast.

95

Chapter 6 – General conclusions

6.1 - Summary of the study

In this thesis we sudied the evolution of PTM networks to answer the following key questions:

(i) how a PTM regulatory network is rewired after gene duplication and how this

process may contribute to increase the organismal complexity

(ii) how a PTM regulatory network evolves in different species

(iii) how two PTM regulatory networks that share the same target residue interfere

with each other and what are the possible functional consequences of this

interference

In Chapter 2 and 3 we focused on the first one of these questions and we determined to

which extent gene duplication followed by divergence contributed rewiring a specific

eukaryotic PTM regulatory network: the phosphoregulatory network of the budding yeast

S. cerevisiae. Our results (Chapter 2) show that 100 million years of evolution were

sufficient to extensively rewire this PTM network. We observed major changes both at the

level of phosphorylation sites and at the level of the network of kinases that phosphorylate

them so that 95% of the PTM profiles and up to 50% of the kinase-phosphorylation sites

associations have changed between paralogs. We then investigated the evolutionary

mechanisms responsible for this rewiring and we found that phosphorylation sites tended to

be lost rather than gained between paralogous proteins. We proposed that this mechanism

could potentially contribute increasing the biological complexity and the fitness of the cell.

Indeed, in the case of multi-functional proteins which functions are regulated by multiple

independent phosphorylation sites, a duplication event followed by the differential loss of

phosphorylation sites would allow the two duplicates to split the functions between each

other. Finally, we also showed that at least a fraction of sites that have been lost beween

paralogs may actually have been compensated by the emergence of new phosphorylation

sites at positions close to the original ones (evolutionary turnover of phosphotylation sites).

Overall, our results show the effects of a duplication event on a PTM regulatory network,

pointing out the importance of this mechanisms to lead to biological innovations.

96

In Chapter 3 we continued on the path paved by Chapter 2 by investigating in detail the

evolutionary trajectories of the phosphorylation sites that are lost in WGD paralogs. We

found that a significant fraction of them tend to be preferentially lost towards aspartic and

glutamic acid (Asp and Glu) residues. This is an interesting finding because these two

amino acids have chemical properties that mimick those of phosphorylated serines and

threonines (both Asp and Glu are negatively charged amino acids). By looking at the

genetic code we noticed that this kind of transitions from a phosphorylatable amino acids to

negatively charged ones necessarely require two mutations and imply a non-functional

intermediate, i.e. an amino acid that does not carry any negative charge and cannot be

phosphorylated. If a site is important to regulate an essential function, the presence of the

non-functional intermediate would lead to a fitness defect, making these kind of transitions

unlikely to occur due to purifying selection. We reasoned that gene duplication would

represent a way to bypass this problem since one of the duplicates proteins could

accumulate mutations while the other could exert the original function, allowing the second

mutation to occur. We searched in the literature and we found that examples compatible

with this evolutionary scenario in which gene duplication allows transitions between

phosphorylation sites and negatively charged amino acids have already been observed and

reported (e.g. (Basu et al, 2008)). Our results suggest that this mechanism could be general

and that it has the potential to lead to new regulatory opportunities for the cell. Moreover,

our results also point out once again the importance of gene duplication as a mechanism

that can lead to biological innovations.

Chapter 4 focuses on the second objective of this thesis by investigating the evolutionary

rewiring of the phosphoregulatory network between human and mouse. We found that a

large number of phosphorylation sites are conserved between these two species and that, in

general, purifying selection is acting to mantain them. However, we also found that a lot of

phosphorylation sites are species specific since to a phosphorylated site in one species

corresponds one non-phosphorylatable residue in the other one. These sites represent good

candidates to explain the molecular bases of the phenotypic diveregence observed between

human and mouse. We also found for the first time more subtle differences in

97

phosphoregulation, represented by sites that are conserved at residue level in the two

species but are divergent with respect to the phosphoregulation. The biological impact and

the functions of these sites have now to be assessed. Finally, we reported some

observations that support the hypothesis of the evolutionary turnover of phosphotylation

sites, that states that phosphorylation sites can jump to different but close locations during

evolution and still retain their biological function. In fact, by aligning the human and mouse

phosphoproteins we identified more than 100 site pairs that tended to be found in the same

region of the protein more than expected by chance and tended to be phosphorylated by the

same kinases. These results suggests that evolutionary turnover should be taken into

account when comparing different phosphoproteomes, in order to avoid overestimating the

divergence between them.

In the last chapter (Chapter 5) we focused on the third objective of the thesis by studying

one of the mechanisms by which PTMs regulate protein functions. Different PTMs can

occur at the same residues, meaning that at a given time a residue can carry no PTM or one

of the two PTMs but not both at the same time (interference or cross-talk between PTMs).

The presence of this interference raises the question of whether both PTMs control the

same protein function or each PTM type controls a specific one. We investigated this

problem by studying the phosphorylation and O-GlcNAcylation profiles of human and

mouse. We first found that these two PTMs tend to occur together more than expected by

chance, confirming that there is an interference or cross-talk between phosphorylation and

O-GlcNAcylation. Further, we found that the residues that have been experimentally found

to be phosphorylated and O-GlcNAcylated tend, in general, to have an higher level of

evolutionary conservation than the residues that have been found to be phosphorylated or

O-GlcNAcylated only, suggesting that they may be involved in the regulation of a larger set

of functions compared to the sites that can carry only one of the two PTMs. Finally we

found that the set of kinases that phosphorylate doubly modified residues differs from the

one that phosphorylate the sites that undergo only one modification. All these observations

represent a description of the effects of the interference between two PTM regulatory

networks at global level.

98

6.2 - Perspectives

Although in the last century we have made impressive progresses, a lot of work has still to

be done to understand how PTM networks are organized, how they evolve and how they

are implemented in different organisms. In this section we will propose some research

paths that emerge from this thesis and that can bring us closer to our objective.

First, from our study emerges that there is a need for large scale and small scale

experimental studies of PTMs. Large scale studies are needed in order to saturate the

coverage of the PTMs that have been extensively studied up to now (phosphorylation,

ubiquitination, glycosylation) or add some data for those that we have not studied yet. To

this aim, we need to develop new enrichment protocols that would allow us to take full

advantage of the last generation of mass spectrometry instruments. Small scale studies are

also needed to assign functions to PTMs. Indeed, even for protein phosphorylation, the

most studied PTM, although thousands of phosphorylation sites have been reported in the

literature, the ones with known function remain a small fraction (Landry et al, 2009).

Another important aspect that needs to be further developed is the study of PTM writers

and erasers, and this for several PTM networks. If we want to study how different PTM

networks are integrated and how they participate to cellular regulation we need to know the

spatio-temporal regulatory patterns of the writers and erasers as well as their specificities in

order to link this information to PTM profiles, substrate abundance and PTM site

occupancy.

A complete new dimension that has to be explored in order to understand how PTM

networks work is to study how they rewire in different conditions, as also suggested by

Beltrao and collaborators (Beltrao et al, 2013) in a recent review. These studies would

provide both information about the network configuration in different conditions and also

allow to identify the PTMs that are condition-specific, thus helping us in the task of

assigning functions to PTMs.

99

New data about the profiles of different PTMs in different organisms are also needed to

study the evolution of PTM networks. At the moment we have limited data even for the

model organisms (Gruhler et al, 2005; Holt et al, 2009; Huttlin et al, 2010; Li et al, 2007;

Sharma et al, 2014; Zielinska et al, 2009), however by having more data about model and

non-model organisms we could get a better understanding of species divergence. For

example, we could study how differences in regulation may reflect phenotypic differences.

Further, we could study more in detail the evolutionary events that shaped PTM networks

and contributed to complexify them. Chapters 2 and 3 of this thesis point out the

importance of events like gene duplication in these processes. By having more data about

PTMs (in particular in mammals) we could understand when specific regulations appeared

and how the same biological process can be regulated in a different way in different

species.

Another limit to the understanding of the PTM networks is also that comprehensive

datasets of PTMs for specific organs or tissues are still substantially missing (Hornbeck et

al, 2012). By having these datasets we could study which parts of the whole PTM

regulatory network are of key importance in a cell type, tissue or organ. This approach

should be applied to different PTM networks, to understand how they are integrated

together at different scales from cells to tissues to organs.

Future studies should also investigate the links between PTM regulation and disease. Even

if this topic has not been addressed directly in this thesis, a lot of phosphorylation sites are

located in proteins implicated in diseases but their role remains unknown. The role of

PTMs as markers for disease and disease progression has not been also fully assessed a part

from specific cases (e.g. (Jin & Zangar, 2009)).

All these considerations suggest that we are still far from cracking the code of PTM

networks, i.e. the logic and the dynamics by which PTM networks cross-talk to each other

to regulate cellular functions but, nevertheless the studies of the last decade as well as this

study have set the directions to take in order to get there.

101

Annex 1 – Supplementary information for Chapter 2

Figure S2.1. Comparison of independent studies allows measuring phosphosite conservation

among paralogs.

Cross-study reproducible sites are sites that are phosphorylated in one study and also found to be

phosphorylated in another study for the same set of proteins. Cross-study conserved sites are sites

found to be phosphorylated in one paralog in one study and in its paralog in a second study. Under a

scenario where paralogous proteins were perfectly conserved, these two numbers should be equal as

they are equally affected by false positive and false negative identifications. Thus, the ratio of

cross-study conservation over cross-study reproducibility provides an estimate of the true state-

conservation (Figure 2.1A). Only S/T sites conserved the two paralogs are considered. 0 and 1

indicate nonphosphorylated and phosphorylated sites respectively.

102

Figure S2.2: Distribution of all PWM scores and of the maximal scores used to assign a

protein kinase to a particular phosphosite.

Overall distribution of PWM scores (red) and distribution of the maximal scores for each phosphosite (blue).

Maximum scores were used to assign a kinase to each phosphosites. See methods for details.

103

Figure S2.3. State-conserved sites are more abundant than expected by chance alone.

In order to calculate the expected number of state-conserved sites, we randomly re-assigned

phosphosites to conserved S/T sites of the paralogous phosphoproteins and calculated how many

occurred at homologous positions. This process was repeated 1000 times. There are 118 sites that

occur at homologous positions (236 phosphosites), a number that is 7.4 times higher than expected

by chance alone (P<< 0.001).

104

Figure S2.4. Correlation between the relative abundance of kinases assigned to the global

phosphoproteome and those found in the WGD phosphoproteome.

There is strong positive correlation between the relative fractions of kinases assigned to the global

phosphoproteome and those found in the WGD phosphoproteome (rho = 0.99, p-value < 2×10-16).

This suggests that there is no specific group of kinases that preferentially phosphorylates WDG

phosphoproteins.

105

Figure S2.5. State-diverged nonphosphorylated S/T sites likely comprise 50% of

nonphosphorylated sites.

Nonphosphorylated S/T of the state-diverged sites likely comprise cases that are false negative

identifications. Because sites that have not been reported to be phosphorylated (i.e. randomly

selected S/T) have lower PWM scores than phosphorylated ones (Figure 2.1F), we can use PWM

scores to estimate this proportion. Thus, in order to estimate the true ratio of nonphosphorylated S/T

sites in state-diverged nonphosphorylated sites, we re-created the PWM score distribution ii (sites

not found to be phosphorylated among the state-diverged sites) from Figure 1F by randomly

sampling different ratios of PWM scores from distribution i (non-phosphorylated) and distribution

iv (state-conserved phosphorylated sites). We calculated the median of these distributions and

iterated this procedure 100 times. Each box-plot represents the distribution of the 100 medians

calculated for each of the ratios considered. A mixture of 50% of each distribution gives the same

median as the median score of the state-diverged nonphosphorylated S/T.

106

Figure S2.6. Gains and losses of phosphosites for the literature curated dataset (Dataset 2)

(Nguyen Ba & Moses, 2010).

This dataset contains 394 phosphosites mapped on 118 proteins. The comparison of the number of

gains and losses with the respective null models (see methods) confirms the trend observed in

phosphoproteomics experiments. Random sampling was performed as detailed for Figure 2.2C. The

black square represents the observed numbers in each case.

107

Figure S2.7. Gains and losses of phosphosites for the unfiltered dataset (Dataset 3).

The comparison of the number of gains and losses with the random expectation confirms the trend

for phosphosites to be preferentially lost. Here all phosphosites that correspond to phosphopeptides

that are assigned to more than one protein are also considered. Random sampling is done as detailed

for Figure 2.2C. The black square represents the observed numbers in each case.

108

Figure S2.8. Gains and losses of phosphosites for Dataset 4 (additional species considered for

the inference of the ancestral sequence).

The comparison of the number of gains and losses with the random expectation confirms the trend

for phosphosites to be preferentially lost. Here the ancestral sequences were reconstructed as

detailed in the methods section but including an additional species that diverged from S. cerevisiae

prior to the WGD. Random sampling was done as detailed for Figure 2.2C. The black square

represents the observed numbers in each case.

109

Supplementary Files

Supplementary files can be found at the address:

http://www.bio.ulaval.ca/landrylab/download

File S2.1 Phosphosites identified in L. kluyveri. The file includes information about the position of the site along the protein, the type of residue and the confidence score. File S2.2 Mini-website of alignments of the paralogous pairs, their orthologs and the ancestral sequences. Disordered regions of proteins are indicated by asterisks and phosphorylation sites are in bold.

111

Annex 2 – Supplementary information for Chapter 4 Human Mouse Study #prot S T Y #prot S T Y Minguez et al. 5899 25799 8179 6631 3278 11474 2689 1708 Beltrao et al. 7349 28666 9240 8460 5760 21876 4706 2361 Phosida et al. 2312 7948 2056 454 2818 10001 1704 238 HPRD 4670 21493 6511 2340 - - - - Phosphosite.Org 8355 39796 16160 15610 5607 26010 6538 3958 phosphoELM 2923 11085 2743 1203 1510 3070 636 379 Huttlin et al. - - - - 3193 14679 2849 426 Non-redundant 12341 61401 24257 20962 8179 38718 9821 5501 Table S4.1. Number of phosphoproteins and phosphorylation sites (sorted by phosphorylatable residue) for all the studies we considered as well as the corresponding non-redundant values. Protein Site Sequence NetPhorest

score Predicted

kinase NUCL_HUMAN

ENSP00000318195_28 PKEVEEDSEDEEMSE

0.649539 CK2

NUCL_MOUSE

ENSMUSP00000027438_28

PKEVEEDSEDEEMSE

0.649539 CK2

PAXB1_HUMAN

ENSP00000328992_262 REDENDASDDEDDDE

0.649397 CK2

PAXB1_MOUSE

ENSMUSP00000113835_264

REDENDASDDEDDDE

0.649397 CK2

ARI4A_HUMAN

ENSP00000347602_160 DEKEEESSEEEDEDK

0.649337 CK2

ARI4A_MOUSE

ENSMUSP00000035512_160

DEKEEESSEEEDEDK

0.649337 CK2

B3KYA7_HUMAN

ENSP00000429744_344 LEEEEENSDEDELDS

0.648665 CK2

B3KYA7_MOUSE

ENSMUSP00000018476_316

LEEEEENSDEDELDS

0.648665 CK2

RPC7L_HUMAN

ENSP00000358320_163 KKEEEVTSEEDEEKE

0.648586 CK2

RPC7L_MOUSE

ENSMUSP00000089544_163

KKEEEVTSEEDEEKE

0.648586 CK2

ARI4A_HUMAN

ENSP00000347602_159 EDEKEEESSEEEDED

0.648118 CK2

ARI4A_MOUS ENSMUSP00000035512_1 EDEKEEESSEEEDE 0.648118 CK2

112

E 59 D

ARI4B_HUMAN

ENSP00000355562_295 EKEKEDNSSEEEEEI 0.647742 CK2

ARI4B_MOUSE

ENSMUSP00000106163_295

EKEKEDNSSEEEEEI 0.647742 CK2

SENP3_HUMAN

ENSP00000403712_75 PSFDASASEEEEEEE 0.647679 CK2

SENP3_MOUSE

ENSMUSP00000005336_73

PSFDASASEEEEEEE 0.647679 CK2

TBD2B_HUMAN

ENSP00000300584_957 PDKGELVSDEEEDT 0.647633 CK2

TBD2B_MOUSE

ENSMUSP00000045413_959

PDKGELVSDEEEDT 0.647633 CK2

U5S1_HUMAN ENSP00000392094_19 YIGPELDSDEDDDE

L 0.647600 CK2

U5S1_MOUSE ENSMUSP00000021306_19

YIGPELDSDEDDDEL

0.647600 CK2

NIN_HUMAN ENSP00000245441_1145 VTRRHVLSDLEDDE

V 0.632661 CK2

NIN_MOUSE ENSMUSP00000082422_1133

PATKHFLSDLGDHEA

0.103702 DMPK

F111A_HUMAN

ENSP00000434435_607 QQDVEMMSDEDL 0.633808 CK2

F111A_MOUSE

ENSMUSP00000119518_610

VQNVEMLSIDF 0.139290 CK2

OSTP_HUMAN

ENSP00000378517_191 ATDEDITSHMESEEL

0.601545 CK2

OSTP_MOUSE ENSMUSP00000031243_176

ATDEDLTSHMKSGES

0.107948 CK1

ORC2_HUMAN

ENSP00000234296_177 LIVPRSHSDSESEYS 0.576608 CK2

ORC2_MOUSE ENSMUSP00000027198_176

IIASRSHYDSESEYS 0.090376 MAP2K6_MAP2K3_MAP2K4_MAP2K7

K1551_HUMAN

ENSP00000310338_1198 NSIKNSSSEEEKQKE

0.602660 CK2

K1551_MOUSE

ENSMUSP00000041180_956

VPQCHCSSTEKKEKD

0.119368 ACTR2_ACTR2B_T

GFbR2

113

LTV1_HUMAN

ENSP00000356548_211 YDSAGLLSDEDCMSV

0.612627 CK2

LTV1_MOUSE ENSMUSP00000019950_206

RSSAGFLSDGGDLSA

0.129578 CK2

RBP2_HUMAN

ENSP00000283195_2583 KCELSKNSDIEQSSD

0.593321 CK2

RBP2_MOUSE ENSMUSP00000003310_2421

KCELPQNSDIKQSSD

0.115794 GRK

SYCC_HUMAN

ENSP00000369897_264 LTGEEVNSCVEVLLE

0.567883 CK2

SYCC_MOUSE

ENSMUSP00000010899_264

LSGEEVDSKVQVLL 0.093388 CK2

TSN1_HUMAN

ENSP00000361072_156 CCGFTNYTDFEDSPY

0.585526 CK2

TSN1_MOUSE ENSMUSP00000030465_156

CCGFNNYTDFNASRF

0.115104 CK2

SETB1_HUMAN

ENSP00000271640_474 LSPQAGDSDLESQLA

0.543830 CK2

SETB1_MOUSE

ENSMUSP00000015841_473

LSPQAADTESLESQL

0.080428 CK2

Table S4.2. Comparison of StC and StD sites. The first ten site pairs present in the table are the pairs of StC sites with the highest NetPhorest scores. The last ten pairs are the pairs of StD sites with the highest difference of NetPhorest scores between the phosphorylated site and its non-phosphorylated counterpart. Green rows refer to phosphorylated sites while grey to non-phosphorylated ones. Differences between orthologous 15mers centered on each site are highlighted in yellow. Protein ID Description Human site Mouse site

PKP2 plakophilin 2 ENSP00000070846_203 ENSMUSP00000036890_181

PKP2 plakophilin 2 ENSP00000070846_267 ENSMUSP00000036890_235

PNN pinin, desmosome associated protein ENSP00000216832_552 ENSMUSP00000021381_559

NUP210 nucleoporin 210kDa ENSP00000254508_1862 ENSMUSP00000032179_1839

NUP210 nucleoporin 210kDa ENSP00000254508_1863 ENSMUSP00000032179_1839

VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) ENSP00000261517_542

ENSMUSP00000077040_839


ENSMUSP00000077040_839

114


ENSMUSP00000077040_839


ENSMUSP00000077040_1956

DSG2 desmoglein 2 ENSP00000261590_984 ENSMUSP00000057096_921

ADNP2 ADNP homeobox 2 ENSP00000262198_1024 ENSMUSP00000068560_1052

LRP2 low density lipoprotein receptor-related protein 2 ENSP00000263816_4527 ENSMUSP00000079752_4533

LRP2 low density lipoprotein receptor-related protein 2 ENSP00000263816_4634 ENSMUSP00000079752_4632

EHBP1 EH domain binding protein 1 ENSP00000263991_751 ENSMUSP00000105191_765

EHBP1 EH domain binding protein 1 ENSP00000263991_769 ENSMUSP00000105191_765

ALMS1 Alstrom syndrome 1 ENSP00000264448_2751 ENSMUSP00000071904_1916

ALMS1 Alstrom syndrome 1 ENSP00000264448_2754 ENSMUSP00000071904_1916

NBN nibrin ENSP00000265433_402 ENSMUSP00000029879_429



FANCM Fanconi anemia, complementation group M ENSP00000267430_1413 ENSMUSP00000054797_1379





C10orf47 chromosome 10 open reading frame 47 ENSP00000277570_146 ENSMUSP00000060780_225

UHRF1BP1L UHRF1 binding protein 1-like ENSP00000279907_446 ENSMUSP00000020112_797

PDE3B phosphodiesterase 3B, cGMP-inhibited ENSP00000282096_561 ENSMUSP00000032909_536

RANBP2 RAN binding protein 2 ENSP00000283195_1146 ENSMUSP00000003310_1141

RANBP2 RAN binding protein 2 ENSP00000283195_2802 ENSMUSP00000003310_2638

RANBP2 RAN binding protein 2 9848] ENSP00000283195_2807 ENSMUSP00000003310_2641

ZNF646 zinc finger protein 646 29004] ENSP00000300850_1448 ENSMUSP00000052641_1412

CLSPN claspin ENSP00000312995_69 ENSMUSP00000045344_84


115



DAB2 disabled homolog 2, mitogen-responsive phosphoprotein (Drosophila) ENSP00000313391_723

ENSMUSP00000079689_731

FAM123C family with sequence similarity 123C ENSP00000314914_307 ENSMUSP00000054748_267

MAP1S microtubule-associated protein 1S ENSP00000325313_582 ENSMUSP00000019405_532



DDX24 DEAD (Asp-Glu-Ala-Asp) box polypeptide 24 ENSP00000328690_302 ENSMUSP00000105628_329

LRRC16A leucine rich repeat containing 16A ENSP00000331983_1314 ENSMUSP00000072662_1320

EFCAB13 EF-hand calcium binding domain 13 ENSP00000332111_385 ENSMUSP00000116040_452

BMP2K BMP2 inducible kinase ENSP00000334836_728 ENSMUSP00000037970_715



KIF18B kinesin family member 18B ENSP00000341466_676 ENSMUSP00000021311_558

FGD6 FYVE, RhoGEF and PH domain containing 6 ENSP00000344446_553 ENSMUSP00000020208_557



HTT huntingtin ENSP00000347184_411 ENSMUSP00000078945_638

MKL1 megakaryoblastic leukemia (translocation) 1 ENSP00000347847_295 ENSMUSP00000105207_335

MKL1 megakaryoblastic leukemia (translocation) 1 ENSP00000347847_305 ENSMUSP00000105207_345

SVIL supervillin ENSP00000348128_221 ENSMUSP00000115078_218







116

8


CLCC1 chloride channel CLIC-like 1 ENSP00000349456_506 ENSMUSP00000102224_502

CLCC1 chloride channel CLIC-like 1 ENSP00000349456_509 ENSMUSP00000102224_503

PDE3A phosphodiesterase 3A, cGMP-inhibited ENSP00000351957_475 ENSMUSP00000038749_472

PDE3A phosphodiesterase 3A, cGMP-inhibited ENSP00000351957_523 ENSMUSP00000038749_526

PCNT pericentrin ENSP00000352572_1703 ENSMUSP00000001179_1444

PCNT pericentrin ENSP00000352572_2370 ENSMUSP00000001179_1990

PARP9 poly (ADP-ribose) polymerase family, member 9 ENSP00000353512_61 ENSMUSP00000110528_20


RP5-862P8.2 Mitogen-activated protein kinase kinase kinase MLK4 ENSP00000355583_542

ENSMUSP00000034316_521

RP5-862P8.2 Mitogen-activated protein kinase kinase kinase MLK4 ENSP00000355583_546

ENSMUSP00000034316_521

PTPRC protein tyrosine phosphatase, receptor type, C ENSP00000356346_1281 ENSMUSP00000027645_1265

PTPRC protein tyrosine phosphatase, receptor type, C ENSP00000356346_1287 ENSMUSP00000027645_1271

CEP350 centrosomal protein 350kDa ENSP00000356579_1195 ENSMUSP00000120085_1200





F5 coagulation factor V (proaccelerin, labile factor) ENSP00000356770_1155 ENSMUSP00000083204_903

KNDC1 kinase non-catalytic C-lobe domain (KIND) containing 1 ENSP00000357561_257

ENSMUSP00000050586_267

SLK STE20-like kinase ENSP00000358770_569 ENSMUSP00000049977_554

DST dystonin ENSP00000359790_2635 ENSMUSP00000110756_2521

ZNF217 zinc finger protein 217 ENSP00000360526_568 ENSMUSP00000104783_621

ATRX alpha thalassemia/mental retardation syndrome X-linked ENSP00000362441_33 ENSMUSP00000109203_43



117

ATRX alpha thalassemia/mental retardation syndrome X-linked ENSP00000362441_706

ENSMUSP00000109203_669


ENSMUSP00000109203_871


ENSMUSP00000109203_1075

ITPR3 inositol 1,4,5-trisphosphate receptor, type 3 ENSP00000363435_1861 ENSMUSP00000038150_1831

SPEN spen homolog, transcriptional regulator (Drosophila) ENSP00000364912_1622 ENSMUSP00000101412_1669



ATXN2 ataxin 2 ENSP00000366843_872 ENSMUSP00000056715_830

RTN3 reticulon 3 ENSP00000367050_268 ENSMUSP00000065810_225

HIVEP1 human immunodeficiency virus type I enhancer binding protein 1 ENSP00000368698_479

ENSMUSP00000056147_591

BRCA2 breast cancer 2, early onset ENSP00000369497_384 ENSMUSP00000038576_400

SHROOM2 shroom family member 2 ENSP00000370299_229 ENSMUSP00000098701_237

FANCA Fanconi anemia, complementation group A ENSP00000373952_850 ENSMUSP00000045217_1071

CCDC88C coiled-coil domain containing 88C ENSP00000374507_1023 ENSMUSP00000082177_794

ACD adrenocortical dysplasia homolog (mouse) ENSP00000377496_411 ENSMUSP00000048180_290

MYLK3 myosin light chain kinase 3 ENSP00000378288_450 ENSMUSP00000034133_432

GLI3 GLI family zinc finger 3 ENSP00000379258_850 ENSMUSP00000106137_851

CD44 CD44 molecule (Indian blood group) ENSP00000398632_686 ENSMUSP00000005218_728



GORASP2 golgi reassembly stacking protein 2, 55kDa ENSP00000410208_448 ENSMUSP00000028509_432

TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1360 ENSMUSP00000068896_1379





CAMSAP3 calmodulin regulated spectrin-associated protein ENSP00000416797_704 ENSMUSP00000125993_58

118

family, member 3 3

CAMSAP3 calmodulin regulated spectrin-associated protein family, member 3 ENSP00000416797_811

ENSMUSP00000125993_882

CCDC110 coiled-coil domain containing 110 ENSP00000427246_807 ENSMUSP00000092964_644

PRAGMIN Tyrosine-protein kinase SgK223 ENSP00000428054_148 ENSMUSP00000106118_131

RNF214 ring finger protein 214 ENSP00000431643_56 ENSMUSP00000060941_48

DMXL1 Dmx-like 1 ENSP00000439479_1841 ENSMUSP00000045559_1829

SLC1A5 solute carrier family 1 (neutral amino acid transporter), member 5 ENSP00000444408_9 ENSMUSP00000104136_33


OBSCN obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF ENSP00000455507_3159

ENSMUSP00000038264_3281

MKL2 MKL/myocardin-like 2 ENSP00000459626_852 ENSMUSP00000009713_846

Table S4.3. List of evolutionary clustered sites. The list includes for each pair of evolutionary clustered sites the name of the proteins, a description of the protein and the two identifiers.

Protein ID Description Num. ECS SVIL supervillin 8 ATRX alpha thalassemia/mental retardation syndrome X-linked 6 FANCM Fanconi anemia, complementation group M 5 CEP350 centrosomal protein 350kDa 5

TOP2A topoisomerase (DNA) II alpha 170kDa 4 VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) 4 CLSPN claspin 4 SPEN spen homolog, transcriptional regulator (Drosophila) 3 BMP2K BMP2 inducible kinase 3

NBN nibrin 3 CD44 CD44 molecule (Indian blood group) 3 MAP1S microtubule-associated protein 1S 3 FGD6 FYVE, RhoGEF and PH domain containing 6 3 RBP2 RAN binding protein 2 3

Table S4.4. List of proteins with more than two evolutionary clustered sites. The list includes for each pair of evolutionary clustered sites the name of the proteins where they are found, a description of the protein and the two identifiers of the sites.

119

Figure S4.1. Algorithm to detect evolutionary clustered sites (ECS).

(A) Estimation of the colocalization of phosphorylation sites inside a window of length L.

Calculations were performed for windows of amino acids of increasing length. (B) Shuffling of

phosphorylation sites respecting their biochemical properties (residue: S, T or Y; location in

ordered/disordered regions) and calculation of the null expectations for the colocalization inside a

window of length L. Calculations were performed for windows of amino acids of increasing length.

(C) Comparison of the observed and expected values of colocalization. (D) Determination of the

closest phosphorylation sites for which the observed colocalization score is higher than expected by

chance (null expectation).

120

Figure S4.2. Comparison of human and mouse phosphorylation sites present in our dataset.

(A) Global number of phosphorylation sites. (B) Proportion of the different phosphorylated residues

(S: serine, T: threonine, Y: tyrosine).

Figure S4.3. Localization of SiD, StC and StD sites.

Fraction of sites located in disordered, ordered or mixed regions for each of the three categories and

comparison with the expectations. Mixed regions are regions where one site is located in a

disordered region while the orthologous one is located in an ordered region.

121

Figure S4.4. Conservation and divergence of clusters of poly-S/T/Y.

There are 158,970 poly-S/T/Y clusters (stretches of two or more consecutive S/T/Y residues) in the

human proteome and 158,022 in the mouse. We defined three categories of clusters: i) Site-diverged

clusters (SiD-c): human or mouse clusters that do not overlap with a cluster in the other species,

even though they can overlap with single phosphorylation sites; ii) state-conserved clusters (StC-c):

overlapping human and mouse clusters in which both the human and the mouse clusters contain at

least one phosphorylation site: iii) state-diverged clusters (StD-c): overlapping human and mouse

clusters in which only one among the human and the mouse clusters contains at least one

phosphorylation site. The plots show the number of observed SiD-c, StC-c and StD-c clusters of

poly-S/T/Y (orange dots) and the comparison to random expectations (distributions in grey). The

null model was generated by 1,000 iterations in which human and mouse clusters were randomized.

123

Figure S4.5. Analysis of position weight matrice (PWM) scores for the different classes of sites

and probability of being phosphorylated by the same protein kinase. (A) Comparison of the

distributions of PWM scores for human and mouse phosphorylated and non-phosphorylated

residues (Wilcoxon tests). (B) Comparison of the distributions of PWM scores for StD sites

(Wilcoxon tests; *: p-value < 0.05; **: p-value < 0.01; ***: p-value < 0.001). (C) Correlation

between human and mouse PWM scores for StC sites (red) and StD sites phosphorylated in human

but not in mouse (black). (D) Correlation between human and mouse PWM scores for StC sites

(red) and StD sites phosphorylated in mouse but not in human (black). (E) Proportion of

phosphorylated sites that have higher PWM scores compared to their corresponding site in the other

species for StC and StD sites. (F) Proportion of sites phosphorylated by the same kinase for the

different categories of sites (StD: state diverged, StC: state conserved, ECS: evolutionary clustered

sites). Black dots represent the observed proportion. Orange lines represent the range of proportions

expected by chance. The histogram shows the distribution of random expectations for ECS. P-value

for StD and StC: < 0.00001; p-value for ECS: 0.006.

124

Figure S4.6. NetPhorest scores and phosphorylation sites. Fraction of phosphorylated sites

(human and mouse) as a function of the NetPhorest score.

Figure S4.7. Distributions of NetPhorest scores for the different classes of sites.

(A,B) Distribution of NetPhorest scores for StC, StD and non-phosphorylated sites. Non-

phosphorylated sites (red) are orthologous sites that are conserved at the residue level and both non-

phosphorylated according to our phosphoproteomics data. For StD sites (in which one site is

Den

sity

Den

sity

NetPhorest score NetPhorest score

StCStD (pho.)StD (non-pho.)Non phosphorylated

StCStD (pho.)StD (non-pho.)Non phosphorylated

A B

0.0 0.2 0.4 0.6 0.7

01

23

45

0.0 0.2 0.4 0.6 0.7

01

23

45

125

phosphorylated while the orthologous one is phosphorylatable but not phosphorylated) we present

two distributions: one for phosphorylated residues and another for non-phosphorylated residues.

Figure S4.8. Relationship between NetPhorest scores in state-conserved sites and protein

abundance.

Distributions of NetPhorest scores for state-conserved sites (only the scores for the human residue

were considered) for four classes of relative protein abundance.

126

Figure S4.9. Distribution of the number of evolutionary clustered sites per protein.

127

Figure S4.10. Distance between evolutionary clustered sites.

(A) Proportion of evolutionary clustered sites as a function of the length of the window (expressed

in number of amino acids) in which the clustered sites are contained. (B) Cumulative distribution of

the proportion of evolutionary clustered sites as a function of the distance between them (1-100 aa).

Figure S4.11. Relationship between evolutionary clustered sites and available sites.

(A) Proportion of protein pairs having evolutionary clustered sites as a function of the available

sites (SiD sites). (B) Distribution of available sites present in the proteins that have evolutionary

clustered sites.

128

Supplementary files

Supplementary files can be found at the address:

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1004062#s4

Dataset S1. Alignments of orthologous mammalian proteins for the 68 proteins that show significant clustering of SiD phosphorylation sites (i.e. that contain evolutionary clustered sites). Proteins’ ENSEMBL IDs of the aligned proteins are provided. Alignments are in table format. The columns’ IDs provide information about the organism (following the ENSEMBL convention; e.g. “hsa” indicates Homo sapiens and “mus”, Mus musculus) and the type of data included (“aa” for amino acid and “p” for phosphorylation). Dataset S2. Human and mouse phosphorylation sites for the 11,150 proteins present in our dataset (i.e. that contains evolutionary clustered sites). ENSEMBL IDs of the aligned proteins are provided. The alignment is in table format. The column IDs provide information on the organism (following the ENSEMBL convention; e.g. “hsa” indicates Homo sapiens and “mus”, Mus musculus) and the type of data included (“aa” for amino acid and “p” for phosphorylation, “diso” for disorder/order). Two columns (“hsa” and “mus”) provide information about the position of the residues along the human or mouse sequences. Protein disorder is indicated by the “*” symbol, while order is indicated by the “.” symbol. For phosphorylation sites, we provide information (columns <organism_id>.p.db) about the papers/dataset that lists the site as being phosphorylated. ("Be", Beltrao et al., 2012; “Hp”, Keshava Prasad et al., 2009; “Hu”, Huttlin et al., 2010; "Mi", Minguez et al., 2012; “Pe”, Dinkel et al., 2011; “Ph”, Gnad et al., 2011; "Po", Hornbeck et al., 2012).

129

Annex 3 – Supplementary information for Chapter 5

Figure S5.1. Localization of 3-state sites (phosphorylatable and O-GlcNAcylatable) residues in disordered (A) or ordered regions (B) of human proteins. Localization of of 3-state sites (phosphorylatable and O-GlcNAcylatable) residues in disordered (C) or ordered regions (D) of mouse proteins. The distribution represents the random expectations for each one of the regions.

130

Figure S5.2. Distribution of human (A) and mouse (B) protein abundances for the proteins containing 3-state sites. PhosOrg: phosphositeOrg.

131

Annex 4 - qPCA: a scalable assay to measure the perturbation of protein-protein interactions in living cells

Abstract

One of the most important challenges in systems biology is to understand how cells

respond to genetic and environmental perturbations. Here we show that the yeast DHFR-

PCA, coupled with high-resolution growth profiling (DHFR-qPCA), is a straightforward

assay to study the modulation of protein-protein interactions (PPIs) in vivo as a response to

genetic, metabolic and drug perturbations. Using the canonical Protein Kinase A (PKA)

pathway as a test system, we show that changes in PKA activity can be measured in living

cells as a modulation of the interaction between its regulatory (Bcy1) and catalytic (Tpk1,

Tpk2) subunits in response to changes in carbon metabolism, caffeine and methyl

methanesulfonate (MMS) treatments and to modifications in the dosage of its enzymatic

regulators, the phosphodiesterases. Our results show that the DHFR-qPCA is easily

implementable and amenable to high-throughput. The DHFR-qPCA will pave the way to

the study of the effects of drug, genetic and environmental perturbations on in vivo PPI

networks, thus allowing the exploration of new spaces of the eukaryotic interactome.

Introduction

Protein-protein interactions (PPIs) are fundamental for all cellular functions (Vidal et al,

2011). In particular, they allow the cell to perceive external stimuli and generate

appropriate physiological responses. In the last decade, the development of high-throughput

techniques to study PPIs has led to the first maps of the protein interactome of several

model organisms (Giot et al, 2003; Ito et al, 2001; Krogan et al, 2006; Li et al, 2004; Rual

et al, 2005; Tarassov et al, 2008). These maps described new associations within and

among functional modules (Tarassov et al, 2008) and among protein complexes and

cellular functions (Krogan et al, 2006). One limit of these maps is that they have mainly

been determined in one single experimental condition. Studying how protein interactomes

132

change in different conditions would allow to understand how cells adapt to different

environments, how they respond to drugs, stressors and genetic perturbations as well as to

understand the basis of cellular development and differentiation (Ideker & Krogan, 2012).

In order to achieve these objectives, it is necessary to develop new techniques or to adapt

existing ones. Ideally, these techniques should be simple, easily implementable and

amenable to high-throughput. Because studying how cells respond to perturbations of PPIs

requires being able to detect them in their endogenous environments, these assays should

also be performed in living cells and among proteins that are natively regulated.

Protein Complementation Assay (PCA) is a family of techniques that is now widely used to

study PPI networks (Michnick et al, 2007; Morell et al, 2009). All different variants of

PCA are based on the same principles: two complementary fragments of a reporter protein

are fused to two proteins of interest. If the two proteins interact, the activity of the reporter

protein is reconstituted such that it provides a detectable signal. One of the most cost-

effective PCA techniques is the DHFR-PCA. In this case the reporter protein is a modified

murine dihydrofolate reductase (Dhfr) that confers resistance to the chemical methotrexate

(MTX) (Tarassov et al, 2008). Therefore, the presence of an budding yeast PPI network in

living cells (Tarassov et al, 2008). In yeast, the assay requires that the coding sequences of

the DHFR fragments (DHFR F[1,2] and DHFR F[3]) are inserted in the genome at the 3’

end of two genes of interest to produce proteins with the DHFR fragments fused at the C-

termini (Fig. 1).

133

Fig. 1 Rationale of the DHFR-PCA. The gene encoding the engineered mouse DHFR is split into two complementary fragments, DHFR F[1,2] and DHFR F[3] and fused to the genes encoding two proteins of interest A and B. The concentration of tetrahydrofolate (THF) produced increases with the amount of DHFR complexes formed and is expected to affect growth in a dose-dependent fashion.

The endogenous gene modification offers the advantage of minimally perturbing the

transcriptional regulation of the gene and does not require a modification of the protein

native localizations. The interaction between two proteins of interest can be detected and

measured as cellular growth on media with MTX. This PCA has recently been successfully

used to determine a large part of the Here we show that the fitness-based yeast DHFR-PCA

(Tarassov et al, 2008), combined with high-resolution growth profiling (DHFR-qPCA), can

be successfully used to study changes in PPIs in vivo in different conditions and genetic

backgrounds, and thus represents a tool that can be used to explore new dimensions of

protein interactomes. Using high-resolution growth profiling it is possible to follow the

growth of yeast strains in microchambers and in real-time. This allows a precise and

sensitive measurement of the growth curves of several strains in parallel in different growth

media.

134

Materials and methods

Bioinformatic analysis of previous DHFR-PCA data

The integrated dataset on protein abundance was downloaded from PaxDb (Wang et al,

2012). Data on colony size were taken from Tarassov et al. (Tarassov et al, 2008). In order

to determine the correlation between PCA signal and protein abundance we calculated the

Spearman’s rank correlation coefficient as implemented in R (The R project for Statistical

Computing).

Construction of the strains used to test the DHFR-qPCA

Diploid strains with different numbers of DHFR-fused genes were constructed as follows.

Haploid strains (BY4741 and BY4742 backgrounds; BY4741: MATa, his3Δ1, leu2Δ0,

met15Δ0, ura3Δ0; BY4742: MATα, his3Δ1, leu2Δ0, lys2Δ0, ura3Δ0) carrying a single

DHFR fragment (DHFR F[1,2] for MATa strains and DHFR F[3] for MATα strains) were

purchased from Open Biosystems (http://www.openbiosystems.com) and were crossed as

described by Tarassov et al. (Tarassov et al, 2008). For each interaction listed in

Supplementary Table 1, we crossed the corresponding BY4741 MATa GENE A-DHFR

F[1,2] strain with the BY4742 MATα GENE B-DHFR F[3] strain to generate diploid

heterozygous strains GENE A-DHFR F[1,2]/GENE A, GENE B/GENE B-DHFR F[3].

These strains are 1:1 strains, as only one allele of each gene is tagged with a DHFR

fragment. Then we sporulated these 1:1 strains, dissected the tetrads and genotyped the

segregation of the DHFR fragments (Treco & Winston, 2008) in order to obtain haploid

strains carrying one allele of each gene tagged (MATa GENE A-DHFR F[1,2] GENE B-

DHFR F[3] and MATα GENE A-DHFR F[1,2] GENE B-DHFR F[3], respectively). Finally,

we crossed these haploid strains to generate 2:2 diploid strains with both alleles of each

gene tagged (MATa/MATα GENE A-DHFR F[1,2]/GENE A-DHFR F[1,2], GENE B-DHFR

F[3]/GENE B-DHFR F[3]). All strain genotypes are listed in Supplementary Table 2. We

confirmed all integrations by colony PCR (Amberg et al, 2005) with oligonucleotides listed

in Supplementary Table 3.

135

High-resolution growth profiling

Saturated overnight cultures in YPD with suitable antibiotics were diluted to an OD600 of 1

and then diluted 1/30 in 150µl of SC (0.669% YNB w/o ammonium sulphate w/o amino

acids, 2% glucose, drop-out -lys, -met, -ade) with methotrexate 200µg/mL (Bioshop

Canada Inc.) or the methotrexate solvent DMSO (Bioshop Canada Inc.) in a 100 well

honeycomb plate (Growth Curves USA). We measured growth profiles using a Bioscreen C

(Growth Curves USA) by reading the OD420-580 every 15 minutes with continuous agitation

at 30°C. For the experiments in different conditions we added caffeine (Bioshop Canada

Inc.) and MMS (Sigma-Aldrich, concentration 99%) in a gradient of concentrations (final

concentrations: 1 mM, 2mM, 4mM, 6mM and 0.002%, 0.005%, 0.008%, 0.011%,

respectively) or we used 2% galactose instead of 2% glucose in the SC medium. Growth

curves where measured for at least three replicates from three independent precultures.

Comparison of the different parameters used to estimate growth

For each growth curve, three parameters were measured in order to obtain a quantitative

estimate of cellular growth: the maximum growth rate (∆OD/∆t), the efficiency (ODfinal-

ODinitial) and the lag time. The maximum growth rate was calculated as follows. We used a

sliding window approach to calculate a regression line for each interval spanning 31

measurements. Then, we sorted all regression coefficients and we determined the 98th

percentile. We assumed this value to be the maximum growth rate. This approach allowed

to eliminate any extreme value that could result from experimental errors (Hill & Otto,

2007). The lag time was defined as the average time point of the interval where the

maximum growth rate was observed. Finally, the efficiency was calculated as difference

between the final and initial ODs. All analyses were performed in R (The R project for

Statistical Computing).

Growth curve analysis

In order to measure the relative interaction score (M) we used the following algorithm. We

subtracted the lag time calculated in MTX (test) to the lag calculated in DMSO (control),

which estimates the effect of the perturbation on growth rate independently from the

136

interaction. We called this difference ∆L. Then, for each experiment, in order to evaluate

the differences in lag times between strains or conditions on a relative scale, we calculated

a relative interaction score value (M) as a proxy for effective cellular growth

(Supplementary Fig 2). We first computed the maximum ∆L among the strains or

conditions tested (∆L max). For each strain or condition we then determined M by

subtracting ∆L from ∆L max. The minimal interaction score is thus arbitrarily set to 0.

Spot assays

An overnight preculture in YPD was diluted to an OD600 = 1 and a ten fold serial dilution

was performed. Five µL of the cell suspensions were inoculated onto SC medium plates

with methotrexate 200µg/mL or DMSO.

Interactions between Ras and different RBD mutants

The DHFR F[1,2] coding sequence was amplified by PCR with oligonucleotides that

contain restriction sites BspEI and XhoI respectively, using the plasmid pAG25-linker-

DHFR-F[1,2]-ADHterm (Tarassov et al, 2008) as template and subcloned in the plasmid

p413Gal1-Ras-yCD-F[1] (Ear & Michnick, 2009) cutting with BspEI and XhoI to fuse the

Ras coding gene to the DHFR F[1,2]. Sequence encoding the DHFR F[3] was amplified

from pAG32-linker-DHFR(3)-ADHterm (Tarassov et al, 2008) using oligos that contain

restriction sites BspEI and XhoI. The resulting PCR fragment was digested with the

restriction enzymes BspEI and XhoI and subcloned in the six plasmids p415Gal1-RBD-

yCD-F[2] (Ear & Michnick, 2009) that contain the wild-type RBD residues 55-133 and six

mutant RBD 55-133. All constructed plasmids were verified by sequencing. A BY4741

strain was transformed with the plasmids p413Gal1-Ras-DHFR F[1,2] while six BY4742

strains were transformed with the plasmids p415Gal1-RBD-DHFR F[3] containing the

wild-type and the five RBD mutants. The BY4741 strain was crossed with all the six

BY4742 strains. The resulting diploid strains containing both plasmids were grown

overnight in SC-Raffinose medium (0.669% YNB w/o ammonium sulphate w/o amino

acids, 2% raffinose, drop-out –lys, -met, –ade, -his, -leu). High-resolution growth profiling

experiments were performed in SC-Galactose medium (0.669% YNB w/o ammonium

sulphate w/o amino acids, 2% galactose, drop-out –lys, -met, –ade, -his, -leu). Growth

137

curves where measured in two experiments. In the first one five replicates from five

independent pre-cultures were used to test each interaction. In the second one twelve

independent replicates were used to test each interaction. Fig 4 shows the combined results

of the two experiments.

PKA regulatory and catalytic subunits interactions in different conditions

In order to study the perturbation of the PKA complex in different conditions, we

constructed diploid strains using strains from the DHFR collection (Tarassov et al, 2008).

We crossed the MATa strains (BY4741 background) with the Tpk1 or the Tpk2 genes fused

to the DHFR F[1,2] with a MATα strain (BY4742 background) having the Bcy1 gene

tagged with DHFR F[3] to generate the diploid strains JFL001 and JFL002.

Strains carrying one additional copy of the PDE genes (PDE1 or PDE2) were obtained by

transforming the JFL001 and JFL002 strains with plasmids from the MoBY collection (Ho

et al, 2009). Plasmids of the MoBY collection carry one yeast gene (in this case PDE1 or

PDE2) under the control of its native promoter and terminator as well as an URA selection

cassette and a yeast centromeric sequence (CEN) (Ho et al, 2009). We also transformed the

JFL001 and JFL002 strains with an empty pRS316 plasmid, which also has a URA cassette

and a CEN sequence, to generate a no insert control strain. We then constructed strains

carrying a deletion of one allele of a different PDE gene. We performed two independent

transformations to obtain haploid MATα BY4742 strains where TPK1 or TPK2 genes were

tagged with the DHFR F[1,2] and BCY1 tagged with the DHFR F[3] (JFL003 and JFL004).

Then we crossed the JFL003 and JFL004 strains with the ho∆ (control strain), pde1∆ and

pde2∆ MATa strains from the YKO deletion collection (Winzeler et al, 1999). The strains

of the YKO collection are BY4741 MATa strains where a single gene is interrupted by a

fragment containing a KanMX cassette, which provides resistance to the antibiotic G418.

We therefore obtained diploid strains that carry a deletion of one allele of a different PDE

gene (JFL005, JFL006, JFL007 and JFL008) and control strains (JFL009 and JFL010). All

strains were grown in rich medium (YPD) with antibiotics. Strains carrying the DHFR

F[1,2] cassette were grown in presence of Nourseothricin (Werner BioAgents; 100

µg/mL), those carrying DHFR F[3] were grown in presence of Hygromycin B (Bioshop

138

Canada Inc.; 250 µg/mL). Finally, strains carrying a KanMX cassette were grown in

presence of G418 (Bioshop Canada Inc.; 200 µg/mL).

Co-immunoprecipitation and western blotting of Tpk2-Bcy1 (PKA)

Strain JFL011 used for immunoprecipitation was constructed as follows. Plasmid pYM17

(Janke et al, 2004) that contains six repeats of the HA tag and a natNT2 marker was

amplified with specific oligonucleotides for the integration at the BCY1 locus (C-terminus)

in a BY4742 strain. Then, we amplified plasmid pYM20 (Janke et al, 2004) that has nine

Myc tags in tandem repeats with oligos for integration at the TPK2 locus (C-terminus). All

oligonucleotides mentioned above are listed on Supplementary Table 3. This PCR product

was transformed into a BY4741 strain for homologous recombination. These haploid

strains were crossed to generate the JFL011 diploid strain. Six independent cultures of the

strain JFL011 were grown in 5mL of YPD overnight at 30°C with shaking. The next day,

cells were diluted to OD600 of 0.1 in 100mL and grown to OD600 of 0.5. Three of the

cultures were treated with caffeine (final concentration 6mM in water) while for the

remaining three (controls) treated with the same volume of water. All cultures were

incubated for 30 min. After incubation, the equivalent 15 OD600 of cells were collected,

washed once with 2mL of zymolyase buffer (1M Sorbitol, 0,01M Phosohate-buffer pH 7.6,

0,02M EDTA) and then resuspended in 4mL with the same buffer. 4µL of -

mercaptoethanol and 5µL of Zymolyase 20T (20mg/mL) were added to the cells and

incubated at 37°C for 23 min with agitation. The spheroplasts were washed with 1M

sorbitol, resuspended in 200µL of lysis buffer (50mM Tris-HCl pH 7.4, 0.01M EDTA,

150mM NaCl, 1% Triton X-100, PMSF 1mM, Aprotinin 2µg/mL, Leupeptin 20µg/mL,

Pepstatin A 2µg/mL) and incubated 2h on ice. Lysates were immunoprecipitated for 2h at

4°C with THE™ c-Myc Tag Antibody, mAB, Mouse (GenScript A00704) coupled to

Dynabeads M-280 Tosylactivated (Life Technologies Corporation), washed 3 times with

500µL of cold washing buffer (0.1M Na-Phospahte pH 7.4, 0.08% Tween 20) and eluted in

40 µL of boiling 2X Laemmli Buffer for 10 min. The primary antibodies for Western

blotting were the rabbit Anti-HA antibody (Rockland 600-401-384) for the HA tag, and

THE™ c-Myc Tag Antibody. The secondary antibodies were IRDye 680 conjugated Goat

Anti-Rabbit (926-32221) and IRDye 800 conjugated Goat Anti-mouse (926-32210)

139

(LiCor). Dried membranes were scanned and process using an Odyssey Infrared Imaging

System. Pixel quantification was performed using ImageJ64 (ImageJ).

Results and discussion

The DHFR-qPCA signal reflects the amount of protein complex formed in the cell

We first examined whether the DHFR-PCA signal provides a quantitative measure of PPIs

(Fig. 1), which is a minimal requirement for the measurement of changes in PPIs in

different conditions. Here by quantitative we mean that the PCA signal correlates with the

quantity of protein complex formed by two interacting proteins and changes in PCA signal

can be reproducibly measured. We first tested this hypothesis by combining protein

abundance data (Wang et al, 2012) with the PCA data from Tarassov et al. (Tarassov et al,

2008), where PCA signal is measured as colony size on agar plates containing

methotrexate. We found a highly significant correlation between the average abundance of

two interacting proteins and PCA signal (rho = 0.18, p-value < 2.2e-16), and this

correlation is significantly improved by considering the abundance of the least expressed

protein of the interacting pair (rho = 0.32, p-value < 2.2e-16, Fig 2). This result suggests

that PCA signal reflects the abundance of the protein complexes formed.

140

Fig. 2 Relationship between colony size (DHFR-PCA signal) and the abundance of the least

expressed protein of interacting pairs. Grey dots represent raw data and blue dots binned data.

We then tested whether changes in PPIs could be detected in the absence of modification of

protein abundance. We randomly selected 15 PPIs from the yeast DHFR-PCA network 2

with high (7 pairs), medium (2 pairs) and low (6 pairs) protein abundances (see

Supplementary Table 1) 12. We constructed yeast strains carrying different combinations

of DHFR tagged genes in diploid cells: one allele of each locus (1:1) or both alleles of each

locus (2:2) (Fig. 3A). These constructs allowed to directly manipulate by four fold the

amount of DHFR reconstituted without modifying protein abundance.

141

Fig. 3 (A) Fifteen diploid strains with one (1:1) or both alleles (2:2) of two genes (GENE A, light

blue and GENE B, yellow) tagged with DHFR fragments (DHFR F[1,2] in red and DHFR F[3] in

dark blue, respectively) were constructed in order to test whether the DHFR-PCA signal could be

modulated without changing protein abundance (the genes are under the control of their native

promoters). In these diploid strains, only the number of alleles of each locus that are fused to the

DHFR fragments varies. (B) Parameters used to describe yeast growth curves (Slope (∆OD/∆t),

Efficiency (ODfinal-ODinitial) and Lag time) and their correlations. (C) Example of raw data of a

DHFR-qPCA experiment showing the growth profiles for the Vps29-Vps35 interaction. Each curve

represents an independent biological replicate. While in DMSO (control) the 1:1 and 2:2 strains

have the same growth profile, in methotrexate (MTX) the 2:2 has a significantly shorter lag time

than the 1:1 (t-test; p-value < 0.001) (D) Results of DHFR-qPCA test for 14 PPIs (1:1 and 2:2

backgrounds; 15th interaction shown in panel C). All independent replicates are shown for each

interaction. Grey points represent growth in DMSO (control), while colored points represent growth

in MTX. Red points show interactions among proteins with high expression levels; blue, medium

expression and black low expression. Dashed lines associate the same interaction in the two

142

different backgrounds. The significance of the difference in lag time between the 1:1 and 2:2

backgrounds in MTX is shown for each interaction (t-test; ***: p-value < 0.001; *: p-value between

0.01 and 0.05; NS: non-significant). (E) Spot-dilution assays show that difference in growth rates

can also be detected on solid medium. Results for the Vps29-Vps35 interaction are shown (cell

dilution 1:10). An isogenic strain carrying the two DHFR fragments alone expressed on plasmids

provides a negative control. DMSO is the MTX solvent and is thus used as a control for growth.

When only one allele of each gene is fused to the DHFR fragments (A’ and B’ for the

tagged alleles, A and B for the untagged alleles), four types of protein complexes can be

formed: A’-B’, A-B’, A’-B, A-B. Therefore in the 1:1 strains, only the A’-B’ complexes

(1/4) would provide a DHFR-PCA signal. If both alleles of both genes are tagged (2:2

strains), all complexes would be of type A’-B’ and thus 100% of PPIs of the complex

would provide a DHFR-PCA signal. In both cases, the concentrations of proteins A and B

are unaltered. We applied high-resolution growth profiling (see Methods) to these strains in

DMSO (control) and MTX and estimated growth parameters from the growth profile. As a

first step, we determined which growth parameter would maximise the power to detect

changes in PPIs. We compared the slope, the efficiency and the lag time required to reach

the maximum growth rate (Fig. 3B). We found that all parameters are strongly correlated

with each other (Fig. 3B) and thus largely redundant. Because the lag time maximizes

correlation between replicates (rho = 0.90, p-value < 6.81e-5) and is therefore less sensitive

to experimental error, we used it as an estimate of growth and thus PCA signal. For 14 out

of 15 protein pairs, lag time was significantly lower for the 2:2 strains than for the 1:1

strains (Fig. 3C-D). These four-fold differences in DHFR-PCA signal can also be detected

on solid medium (Fig. 3E). We also sought to test whether the PCA signal would reflect the

known dissociation constant (Kd) of a protein complex. Block et al. (Block et al, 1996)

showed how point mutations in the Ras Binding Domain (RBD) affect the Kd of the Ras-

RBD complex. We tested the interactions between Ras and six RBD mutants with different

Kds by DHFR-qPCA. As expected, to a decrease in Kd corresponds an increase in PCA

signal (Fig. 4).

143

Fig. 4 DHFR-qPCA signal for interactions between Ras and different RBD mutants. PCA signal

increases with a decrease in Kd of the different mutant Ras-RBD protein complexes. The R89L

mutant shows no interaction and its M score is arbitrarily set on this graph. The numbers in

parenthesis indicate the Kd of each complex in µM units. Seventeen independent biological

replicates in two independent experiments were used to perform DHFR-qPCA assays for each

mutant Ras-RBD complex.

Altogether, these results (Fig. 2,3,4) indicate that the DHFR-qPCA provides a quantitative

readout of the amount of protein complexes formed between interaction partners, even

when changes in the amount of complex formed do not involve changes in protein

abundance.

DHFR-qPCA allows to study the effect of metabolic, drug and genetic perturbations on protein complexes

We next sought to test directly the approach on a canonical signaling pathway by using

different perturbations and combinations of perturbations. For this purpose, we chose the

well-characterized protein kinase A (PKA) complex. The PKA is a tetramer formed by two

regulatory and two catalytic subunits (Zhang et al, 2012) that is regulated by cAMP levels

(Fig. 5A).

145

Fig. 5 Condition-dependence of the interactions between the PKA regulatory and catalytic

subunits in response to different perturbations.

(A) The activation/inactivation of the PKA is regulated by intracellular levels of cAMP, which is

modulated among other mechanisms by the enzymes adenylate cyclase and phosphodiesterase

(Pde1 and Pde2 in yeast). (B) Comparison of the DHFR-qPCA signal for the Bcy1-Tpk2 interaction

in glucose and galactose (C) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in cells grown in

media supplemented with caffeine at different concentrations. (D) Co-immunoprecipitation of Bcy1

and Tpk2 in standard medium (YPD) and in YPD supplemented with caffeine confirms the DHFR-

qPCA results (t-test, p-value < 0.05). (E) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in cells

grown in media supplemented with methyl methanesulfonate (MMS). (F) DHFR-qPCA signal for

the Bcy1-Tpk2 interaction in cells grown in media supplemented with galactose and MMS at

different concentrations. (G) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in strains carrying

an additional copy (on a low copy number plasmid) or a deletion of one copy (heterozygous strain)

of the genes coding for the PDE enzymes (left and right panel, respectively). In all cases, n

represents the number of independent biological replicates.

The PKA pathway regulates different processes such as glucose metabolism (Dechant &

Peter, 2008), protein translation (Ashe et al, 2000), ribosome biogenesis (Martin et al,

2004), stress responses (Ramachandran et al, 2011), autophagy (Budovskaya et al, 2005)

and lifespan (Longo, 2003). In yeast, the regulatory subunits are encoded by the gene

BCY1, while the catalytic subunits are encoded by three different genes: TPK1, TPK2 and

TPK3 (Johnson et al, 1987; Toda et al, 1987). We studied the interactions between the

regulatory subunit (Bcy1) and the catalytic subunits (Tpk1 and Tpk2, respectively) by

DHFR-qPCA in response to four different perturbations: 1) galactose, which leads to a

decrease in PKA activity relative to glucose (Portela et al, 2003); 2) caffeine, which was

shown to indirectly inhibit the PKA (Soulard et al, 2010) through the TORC1 pathway; 3)

methyl methanesulfonate (MMS), a DNA damaging agent, which has recently been

associated with the PKA pathway (Bandyopadhyay et al, 2010) and was shown to lead to

the phosphorylation of the PKA regulatory subunit (Searle et al, 2011) and 4) dosage of the

PKA regulator PDE (phosphodiesterase), which negatively regulates the PKA by degrading

cAMP (Ma et al, 1999). Each of these conditions is known or has been hypothesized to

146

affect the PKA pathway in yeast but has not been assessed at the level of the protein

complex dissociation.

We estimated the effect of each perturbation in control conditions (without MTX) and

subtracted this effect to estimate the net PCA signal in MTX. We obtained the difference in

lag time between MTX and DMSO (∆L) from which we computed a relative interaction

score value (M, see Methods). Our results show that the DHFR-qPCA can detect changes

in the PKA activity as a response to metabolic, drug and genetic perturbations (Fig. 5 for

Bcy1-Tpk2; Supplementary Fig 1 for Bcy1-Tpk1). We observed more interaction between

the regulatory and catalytic subunits of the PKA in galactose (Fig. 5B; Supplementary Fig

1A) and in the presence of caffeine (Fig. 5C; Supplementary Fig 1B) when compared with

standard growth conditions. This confirms that the PKA is inhibited in these conditions

(Portela et al, 2003; Soulard et al, 2010) and this inhibition involves changes in Bcy1-Tpk1

and Bcy1-Tpk2 interactions. Further, our results show that the DHFR-qPCA can detect

concentration-dependent effects on the PKA complex, as PKA inhibition increases as

caffeine concentration increases (Fig. 5C; Supplementary Fig 1B). In this particular case,

we also measured changes in PKA complex using co-immunoprecipitation and confirmed

this quantitative effect on the PKA (for Bcy1-Tpk2) (Fig. 5D). The DHFR-qPCA signal

appears to provide a larger amplitude of changes in the interaction than co-

immunoprecipitation, which is most likely due to the fact that the DHFR-qPCA assay is a

fitness based assay which serves as a signal amplifier. This property could be exploited for

instance for the measurement of subtle changes in PPIs. We also found that the effect of

MMS on the PKA was stronger in galactose than in glucose (Fig. 5E,F; Supplementary Fig

1C,D). The PKA might indeed be maximally activated in glucose and the addition of MMS

does not allow to activate it to an extent that can be detected with this assay. It is also

possible that DNA damage affects the PKA pathway in a carbon source-dependent fashion,

with a greater effect in non-fermentable carbon sources. Finally, we found that modifying

the dosage of PDE1 or PDE2 leads to significant changes in the amount of PKA complex

formed, consistent with their roles as negative regulators of the PKA complex (Ma et al,

1999) (Fig. 5G; Supplementary Fig 1E). This result shows that the DHFR-qPCA allows the

147

detection of subtle quantitative genetic perturbations (50% of gene product) that affect

PPIs.

Conclusions

PPIs regulate many cellular processes and are therefore expected to be dynamic and

condition-dependent, i.e. the degree of association among proteins will depend on the

conditions to which cells are exposed. There is therefore a strong need for the development

of simple assays to measure changes in PPIs. Here we show that the yeast DHFR-qPCA is a

quantitative technique that allows the screening of PPIs in different conditions at low cost.

Our results, with those of Schlect et al. (Schlecht et al, 2012), show that the DHFR-qPCA

can be used to study PPIs in different conditions and at high-throughput. Ninety-six

interactions could be tested simultaneously in a standard plate-reader or more in dedicated

instruments (see Methods). Unlike PCA assays based on luciferase (Stefan et al, 2007), this

assay does not allow to measure dynamic PPIs in real-time, because it is based on fitness.

However, this offers the advantage that fitness differences among conditions or strains can

be amplified through generations and may thus allow to detect very small changes of

interactions. The PCA signal might be saturated for interactions with low-Kd and/or highly

abundant proteins. However, we expect that this would occur for a limited number of

interactions under natural conditions as we see a strong correlation between the abundance

of proteins and PCA signal over five orders of magnitude of protein abundance without

saturation (Fig. 2). With the availability of entire yeast collections tagged with the DHFR

fragments (Tarassov et al, 2008), any pairwise interaction of interest could be investigated

in different conditions and in different genetic backgrounds. The DHFR-qPCA will pave

the way to the study of the effects of drug, genetic and environmental perturbations on in

vivo PPI networks, thus allowing the exploration of new spaces of the model eukaryotic

interactome.

Acknowledgements

This work was supported by a Canadian Institute of Health Research (CIHR) Grant GMX-

191597 and partly by a Human Frontier Science Program grant (RGY0073/2010) and a

148

Genome Québec grant. C. R Landry is a CIHR New Investigator. L. Freschi was supported

by a fellowship from the Fonds de recherche du Québec – Nature et technologies

(FRQNT). L. Freschi and F. Torres-Quiroz were supported by fellowships from the Quebec

Research Network on Protein Function, Structure and Engineering (PROTEO). We thank

all members of the Landry laboratory and N. Aubin-Horth for their comments on the

manuscript.

Supplementary information

Gene A-DHFR F[1,2]

Gene B-DHFR F[3]

Interactions (Gene names)

Interaction (ORF names)

Abundance

VMA21 VPH1 VMA21-VPH1 YGR105W-YOR270C H ARX1 YBR267W ARX1-

YBR267W YDR101C-YBR267W H

DHH1 EDC3 DHH1-EDC3 YDL160C-YEL015W H DHH1 LSM7 DHH1-LSM7 YDL160C-YNL147W H DHH1 PBP1 DHH1-PBP1 YDL160C-YGR178C H SGN1 PUB1 SGN1-PUB1 YIR001C-YNL016W H TOM70 ALO1 TOM70-ALO1 YNL121C-YML086C H CKB1 CKA2 CKB1-CKA2 YGL019W-YOR061W M NOT5 MOT2 NOT5-MOT2 YPR072W-YER068W M LSB3 CUE5 LSB3-CUE5 YFR024C-A-YOR042W L MMS2 SIP5 MMS2-SIP5 YGL087C-YMR140W L PEX14 PEX17 PEX14-PEX17 YGL153W-YNL214W L SLA1 END3 SLA1-END3 YBL007C-YNL084C L YKE2 GIM5 YKE2-GIM5 YLR200W-YML094W L VPS29 VPS35 VPS29-VPS35 YHR012W-YJL154C L For each pair we used the data by Ghaemmaghami et al. 1 and we calculated the average abundance. Then, we classified the pairs in 3 classes: low abundance (L), medium abundance (M) and high abundance (H). Supplementary Table 1. Protein-protein interactions selected to test for the relationship between growth and the amount of DHFR complex formed. References 1. S. Ghaemmaghami, W. Huh, K. Bower, R. W. Howson, A. Belle, N. Dephoure, E. K.

O'Shea and J. S. Weissman, Nature, 2003, 425, 737-741.

149

Strain Genotype LTQ001 MATa, VMA21-DHFR F[1,2]-natNT2, VPH1-DHFR F[3]-hphNT1

LTQ002 MATa, ARX1-DHFR F[1,2]-natNT2, YBR267W-DHFR F[3]-hphNT1

LTQ003 MATa, DHH1-DHFR F[1,2]-natNT2, EDC3-DHFR F[3]-hphNT1

LTQ004 MATa, DHH1-DHFR F[1,2]-natNT2, LSM7-DHFR F[3]-hphNT1

LTQ005 MATa, DHH1-DHFR F[1,2]-natNT2, PBP1-DHFR F[3]-hphNT1

LTQ006 MATa, SGN1-DHFR F[1,2]-natNT2, PUB1-DHFR F[3]-hphNT1

LTQ007 MATa, TOM70-DHFR F[1,2]-natNT2, ALO1-DHFR F[3]-hphNT1

LTQ008 MATa, CKB1-DHFR F[1,2]-natNT2, CKA2-DHFR F[3]-hphNT1

LTQ009 MATa, NOT5-DHFR F[1,2]-natNT2, MOT2-DHFR F[3]-hphNT1

LTQ010 MATa, LSB3-DHFR F[1,2]-natNT2, CUE5-DHFR F[3]-hphNT1

LTQ011 MATa, MMS2-DHFR F[1,2]-natNT2, SIP5-DHFR F[3]-hphNT1

LTQ012 MATa, PEX14-DHFR F[1,2]-natNT2, PEX17-DHFR F[3]-hphNT1

LTQ013 MATa, SLA1-DHFR F[1,2]-natNT2, END3-DHFR F[3]-hphNT1

LTQ014 MATa, YKE3-DHFR F[1,2]-natNT2, GIM5-DHFR F[3]-hphNT1

LTQ015 MATa, VPS29-DHFR F[1,2]-natNT2, VPS35-DHFR F[3]-hphNT1

LTQ016 MATα, VMA21-DHFR F[1,2]-natNT2, VPH1-DHFR F[3]-hphNT1

LTQ017 MATα, ARX1-DHFR F[1,2]-natNT2, YBR267W-DHFR F[3]-hphNT1

LTQ018 MATα, DHH1-DHFR F[1,2]-natNT2, EDC3-DHFR F[3]-hphNT1

LTQ019 MATα, DHH1-DHFR F[1,2]-natNT2, LSM7-DHFR F[3]-hphNT1

LTQ020 MATα, DHH1-DHFR F[1,2]-natNT2, PBP1-DHFR F[3]-hphNT1

LTQ021 MATα, SGN1-DHFR F[1,2]-natNT2, PUB1-DHFR F[3]-hphNT1

LTQ022 MATα, TOM70-DHFR F[1,2]-natNT2, ALO1-DHFR F[3]-hphNT1

LTQ023 MATα, CKB1-DHFR F[1,2]-natNT2, CKA2-DHFR F[3]-hphNT1

LTQ024 MATα, NOT5-DHFR F[1,2]-natNT2, MOT2-DHFR F[3]-hphNT1

LTQ025 MATα, LSB3-DHFR F[1,2]-natNT2, CUE5-DHFR F[3]-hphNT1

LTQ026 MATα, MMS2-DHFR F[1,2]-natNT2, SIP5-DHFR F[3]-hphNT1

LTQ027 MATα, PEX14-DHFR F[1,2]-natNT2, PEX17-DHFR F[3]-hphNT1

150

LTQ028 MATα, SLA1-DHFR F[1,2]-natNT2, END3-DHFR F[3]-hphNT1

LTQ029 MATα, YKE3-DHFR F[1,2]-natNT2, GIM5-DHFR F[3]-hphNT1

LTQ030 MATα, VPS29-DHFR F[1,2]-natNT2, VPS35-DHFR F[3]-hphNT1

JFL001 MATa/MATα, TPK1-DHFR F[1,2]-natNT2/TPK1, BCY1/ BCY1-DHFR F[3]-hphNT1

JFL002 MATa/MATα, TPK2-DHFR F[1,2]-natNT2/TPK2, BCY1/BCY1-DHFR F[3]-hphNT1

JFL003 MATα, TPK1-DHFR F[1,2]-natNT2, BCY-DHFR F[3]-hphNT1

JFL004 MATα, TPK2-DHFR F[1,2]-natNT2, BCY-DHFR F[3]-hphNT1

JFL005 MATa/MATα, TPK1/TPK1-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, pde1∆-KanMX/PDE1




JFL009 MATa/MATα,TPK1/ TPK1-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, ho∆-KanMX/HO

JFL010 MATa/MATα, TPK2/TPK2-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, ho∆-KanMX/HO

JFL011 MATa/MATα, TPK2-Myc-hphNT1/TPK2, BCY1/BCY1-HA-natNT2

Supplementary Table 2. Genotypes of the strains constructed in this study.

Experiments Primer Information Primer Sequence 5’ to 3’

qPCA C Oligo Forward YGR105W (VMA21) GTTTAGCTGCTGCAATGGCC

qPCA C Oligo Forward YOR270C (VPH1) AAGTTTTTCGTGGGTGAAGG

qPCA C Oligo Forward YDR101C (ARX1) GCCAAGGATAAGAGGTTCGG

qPCA C Oligo Forward YBR267W (YBR267W) GACTCAACAGCGTGTTTGGC

qPCA C Oligo Forward YDL160C (DHH1) ACAGGCGTATCCTCCACCGC

qPCA C Oligo Forward YEL015W (EDC3) CTGGCTGGCCTTTGATTGCC


qPCA C Oligo Forward YNL147W (LSM7) TTATAGGTGTCCTAAAAGGC


qPCA C Oligo Forward YGR178C (PBP1) AGCGAACGGGTCGGCAATGC

151

qPCA C Oligo Forward YIR001C (SGN1) AAAAACACTTCAACAGTGCC

qPCA C Oligo Forward YNL016W (PUB1) ACAGCAGCAGCAACAGGGCG

qPCA C Oligo Forward YNL121C (TOM70) ATTACTTTTGCTGAAGCCGC

qPCA C Oligo Forward YML086C (ALO1) AGGATTTGAAAAAGTTCCGG

qPCA C Oligo Forward YGL019W (CKB1) GATGAGGCAGTATCTGGTCC

qPCA C Oligo Forward YOR061W (CKA2) ATTAGCTGTTCCTGAAGTGG

qPCA C Oligo Forward YPR072W (NOT5) AATCTGAGGAGGAATCATGG

qPCA C Oligo Forward YER068W (MOT2) TAAGGTTCCTATTCAGCAGC

qPCA C Oligo Forward YFR024C-A (LSB3) ACCATTCAGAAAGGGTGACG

qPCA C Oligo Forward YOR042W (CUE5) GAACCCCTGGATACTACACC

qPCA C Oligo Forward YGL087C (MMS2) ACTGGAAAAGAGCCTACACC

qPCA C Oligo Forward YMR140W (SIP5) CGAACTTGAAGATCAAATGG

qPCA C Oligo Forward YGL153W (PEX14) GATAGCAACGCCTCCATTCC

qPCA C Oligo Forward YNL214W (PEX17) TTAACAGATAGGTCCCGAGC

qPCA C Oligo Forward YBL007C (SLA1) TTACAGAACCAACCTACTGG

qPCA C Oligo Forward YNL084C (END3) GTCGATAACTGATGACTTGG

qPCA C Oligo Forward YLR200W (YKE2) ATGCGAAAAGAACATAAGGG

qPCA C Oligo Forward YML094W (GIM5) TTCCTTGTCCATCGAGGCCC

qPCA C Oligo Forward YHR012W (VPS29) TAATTCACCAAGTTTCTGCC

qPCA C Oligo Forward YJL154C (VPS35) CACCAACTGAAGTATATCCC

qPCA Oligo Reverse to test DHFR integration CCATCTTTTCGTAAATTTCTG

PKA BCY1-DHFR integration Forward TGCAGTAGACGTATTAAAGCTCAATGATCCTACAAGACATGGCGGTGGCGGATCAGGAGGC

PKA BCY1-DHFR integration Reverse AGGAAATTCATGTGGATTTAAGATCGCTTCCCCTTTTTACTTCGACACTGGATGGCGGCGTTAG

PKA TPK1-DHFR integration Forward TCAAGGTGAAGACCCATATGCTGATCTTTTCCGGGACTTCGGCGGTGGCGGATCAGGAGGC

PKA TPK1-DHFR integration Reverse AATATAGATACGAGAGGAAAATACAACAAAACATTAGTCATTCGACACTGGATGGCGGCGTTAG

152

PKA TPK2-DHFR integration Forward TCAAGGCGATGATCCATATGCTGAATACTTTCAAGATTTCGGCGGTGGCGGATCAGGAGGC

PKA TPK2-DHFR integration Reverse GTACTTGAAAATTGTTTTTGTGTTTTTTGGTTCATGGAACTTCGACACTGGATGGCGGCGTTAG

PKA C Oligo Forward BCY1 GTGATCAAGGGGAGAACTTTTATTT

PKA C Oligo Forward TPK1 CGACTCTAACACGATGAAAACCTAT

PKA C Oligo Forward TPK2 GGTATCGGTGACACGTCT

CoIP BCY1-HA Forward TACTGGGTCCTGCAGTAGACGTATTAAAGCTCAATGATCCTACAAGACATCGTACGCTGCAGGTCGAC

CoIP BCY1-HA Reverse AAGAGAAAGGAAATTCATGTGGATTTAAGATCGCTTCCCCTTTTTACTTAATCGATGAATTCGAGCTCG

CoIP TPK1-MYC Forward ACTACGGTGTTCAAGGTGAAGACCCATATGCTGATCTTTTCCGGGACTTCCGTACGCTGCAGGTCGAC

CoIP TPK1-MYC Reverse AAAAAAAAATATAGATACGAGAGGAAAATACAACAAAACATTAGTCATTAATCGATGAATTCGAGCTCG

CoIP TPK2-MYC Forward ATTATGGTATTCAAGGCGATGATCCATATGCTGAATACTTTCAAGATTTCCGTACGCTGCAGGTCGAC

CoIP TPK2-MYC Reverse AGAGAAAGTACTTGAAAATTGTTTTTGTGTTTTTTGGTTCATGGAACTTAATCGATGAATTCGAGCTCG

CoIP Oligo Reverse to test MYC or HA integration

CGACAGTCACATCATGC

Kd Oligo used to check Ras and RBD's plasmids constructions

CAACATTTTCGGTTTGTATTAC

Kd Oligo Forward to amplify DHFR F[1,2] and clone in p413Gal1-Ras contain a restriction site BspEI

ATCGCAGGCTCCGGAGGTGGAGGTTCTGGAGGTATGGTTCGACCATTGAACTGC

Kd Oligo Reverse to amplify DHFR F[1,2] and clone in p413Gal1-Ras contain a

CGATGCCCGCCCCCGCTCGAGCTATGTTCTAGATTAGGTA

153

restriction site Xho1 CCCAA

Kd Oligo Forward to amplify DHFR F[3] and clone in p415Gal1-RBD contain a restriction site BspEI

CGTTGAGGCTCCGGAGGTGGAGGTTCTGGAGGTATGAGTAAAGTAGACATGGTT

Kd Oligo Reverse to amplify DHFR F[3] and clone in p415Gal1-RBD contain a restriction site Xho1

AGATCGCCGCCCCCGCTCGAGCTAAGTTCTAGATTAGTCTTTCTT

C Oligos were used to confirm the integration at the proper locus. Supplementary Table 3. Oligonucleotides used in this study.

155

Supplementary Fig. 1. Dynamics of the interactions between the PKA regulatory and catalytic subunits in response to different perturbations. (A) Comparison of the DHFR-qPCA signal for the Bcy1-Tpk1 interaction in glucose and galactose (B) DHFR-qPCA signal for the interaction Bcy1-Tpk1 in cells grown in media supplemented with caffeine at different concentrations. (C) DHFR-qPCA signal for the Bcy1-Tpk1 interaction in cells grown in media supplemented with methyl methanesulfonate. (D) DHFR-qPCA signal for the Bcy1-Tpk1 interaction in cells grown in media supplemented with galactose and methyl methanesulfonate at different concentrations. (E) DHFR-qPCA signal for the interaction Bcy1-Tpk1 in strains carrying an additional copy (on a low copy number plasmid) or a deletion of one copy (heterozygous strain) of the genes coding for the PDE enzymes (left and right panel, respectively). In all cases, n represents the number of independent replicates.

156

Supplementary Fig. 2. Comparing PPIs using the M relative interaction score. (A) The difference between the lag times in DMSO and MTX (∆L) is calculated for all interactions. (B) M scores are calculated for each interaction by subtracting to the maximum ∆L of all interaction the ∆L of a specific interaction. (C) Bar graphs are generated to compare the relative interaction scores of all interactions tested.

157

References Albuquerque CP, Smolka MB, Payne SH, Bafna V, Eng J, Zhou HL (2008) A multidimensional chromatography technology for in-depth phosphoproteome analysis. Mol Cell Proteomics 7: 1389-1396 Alfaro JF, Gong CX, Monroe ME, Aldrich JT, Clauss TR, Purvine SO, Wang Z, Camp DG, 2nd, Shabanowitz J, Stanley P, Hart GW, Hunt DF, Yang F, Smith RD (2012) Tandem mass spectrometry identifies many mouse brain O-GlcNAcylated proteins including EGF domain-specific O-GlcNAc transferase targets. Proc Natl Acad Sci U S A 109: 7280-7285 Amberg DC, Burke DJ, Strathern JN (2005) Methods in yeast genetics: Cold Spring harbor Laboratory Press. Amoutzias GD, He Y, Gordon J, Mossialos D, Oliver SG, Van de Peer Y (2010) Posttranslational regulation impacts the fate of duplicated genes. Proceedings of the National Academy of Sciences of the United States of America 107: 2967-2971 Ashe MP, De Long SK, Sachs AB (2000) Glucose depletion rapidly inhibits translation initiation in yeast. Mol Biol Cell 11: 833-848 Ba ANN, Moses AM (2010) Evolution of characterized phosphorylation sites in budding yeast. Molecular Biology and Evolution 27: 2027-2037 Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guenole A, van Attikum H, Shokat KM, Kolodner RD, Huh WK, Aebersold R, Keogh MC, Krogan NJ et al (2010) Rewiring of Genetic Networks in Response to DNA Damage. Science 330: 1385-1389 Barr RK, Bogoyevitch MA (2001) The c-Jun N-terminal protein kinase family of mitogen-activated protein kinases (JNK MAPKs). Int J Biochem Cell B 33: 1047-1063 Basu U, Wang YB, Alt FW (2008) Evolution of Phosphorylation-Dependent Regulation of Activation-Induced Cytidine Deaminase. Mol Cell 32: 285-291 Beausoleil SA, Jedrychowski M, Schwartz D, Elias JE, Villen J, Li J, Cohn MA, Cantley LC, Gygi SP (2004) Large-scale characterization of HeLa cell nuclear phosphoproteins. Proceedings of the National Academy of Sciences of the United States of America 101: 12130-12135 Bell SP, Dutta A (2002) DNA replication in eukaryotic cells. Annu Rev Biochem 71: 333-374 Beltrao P, Albanese V, Kenner LR, Swaney DL, Burlingame A, Villen J, Lim WA, Fraser JS, Frydman J, Krogan NJ (2012) Systematic functional prioritization of protein posttranslational modifications. Cell 150: 413-425 Beltrao P, Bork P, Krogan NJ, van Noort V (2013) Evolution and functional cross-talk of protein post-translational modifications. Mol Syst Biol 9: 714 Beltrao P, Trinidad JC, Fiedler D, Roguev A, Lim WA, Shokat KM, Burlingame AL, Krogan NJ (2009) Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species. PLoS Biology 7: e1000134-e1000134

158

Benayoun BA, Veitia RA (2009) A post-translational modification code for transcription factors: sorting through a sea of signals. Trends in cell biology 19: 189-197 Block C, Janknecht R, Herrmann C, Nassar N, Wittinghofer A (1996) Quantitative structure-activity analysis correlating Ras/Raf interaction in vitro to Raf activation in vivo. Nat Struct Biol 3: 244-251 Bodenmiller B, Mueller LN, Mueller M, Domon B, Aebersold R (2007) Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nature Methods 4: 231-237 Boekhorst J, van Breukelen B, Heck A, Jr., Snel B (2008) Comparative phosphoproteomics reveals evolutionary and functional conservation of phosphorylation across eukaryotes. Genome biology 9: R144 Boulais J, Trost M, Landry CR, Dieckmann R, Levy ED, Soldati T, Michnick SW, Thibault P, Desjardins M (2010) Molecular characterization of the evolution of phagosomes. Mol Syst Biol 6: 423 Brewster RC, Weinert FM, Garcia HG, Song D, Rydenfelt M, Phillips R (2014) The transcription factor titration effect dictates level of gene expression. Cell 156: 1312-1323 Brooks CL, Gu W (2003) Ubiquitination, phosphorylation and acetylation: the molecular basis for p53 regulation. Curr Opin Cell Biol 15: 164-171 Budovskaya YV, Stephan JS, Deminoff SJ, Herman PK (2005) An evolutionary proteomics approach identifies substrates of the cAMP-dependent protein kinase. Proceedings of the National Academy of Sciences of the United States of America 102: 13933-13938 Bullock AN, Das S, Debreczeni JE, Rellos P, Fedorov O, Niesen FH, Guo K, Papagrigoriou E, Amos AL, Cho S, Turk BE, Ghosh G, Knapp S (2009) Kinase domain insertions define distinct roles of CLK kinases in SR protein phosphorylation. Structure 17: 352-362 Bullock AN, Debreczeni J, Amos AL, Knapp S, Turk BE (2005) Structure and substrate specificity of the Pim-1 kinase. The Journal of biological chemistry 280: 41675-41682 Bunkoczi G, Salah E, Filippakopoulos P, Fedorov O, Muller S, Sobott F, Parker SA, Zhang H, Min W, Turk BE, Knapp S (2007) Structural and functional characterization of the human protein kinase ASK1. Structure 15: 1215-1226 Caron C, Boyault C, Khochbin S (2005) Regulatory cross-talk between lysine acetylation and ubiquitination: role in the control of protein stability. BioEssays : news and reviews in molecular, cellular and developmental biology 27: 408-415 Chen C, Turk BE (2010) Analysis of serine-threonine kinase specificity using arrayed positional scanning peptide libraries. Current protocols in molecular biology / edited by Frederick M Ausubel [et al] Chapter 18: Unit 18 14 Chi A, Huttenhower C, Geer LY, Coon JJ, Syka JEP, Bai DL, Shabanowitz J, Burke DJ, Troyanskaya OG, Hunt DF (2007) Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proceedings of the National Academy of Sciences of the United States of America 104: 2193-2198 Choudhary C, Kumar C, Gnad F, Nielsen ML, Rehman M, Walther TC, Olsen JV, Mann M (2009) Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science 325: 834-840

159

Cohen P (2000) The regulation of protein function by multisite phosphorylation - a 25 year update. Trends in Biochemical Sciences 25: 596-601 Conant GC, Wolfe KH (2008) Turning a hobby into a job: how duplicated genes find new functions. Nature reviews Genetics 9: 938-950 Courcelles M, Lemieux S, Voisin L, Meloche S, Thibault P (2011) ProteoConnections: a bioinformatics platform to facilitate proteome and phosphoproteome analyses. Proteomics 11: 2654-2671 Crick F (1970) Central dogma of molecular biology. Nature 227: 561-563 Davis TL, Walker JR, Allali-Hassani A, Parker SA, Turk BE, Dhe-Paganon S (2009) Structural recognition of an optimized substrate for the ephrin family of receptor tyrosine kinases. Febs J 276: 4395-4404 Dean AM, Thornton JW (2007) Mechanistic approaches to the study of evolution: the functional synthesis. Nature reviews Genetics 8: 675-688 Dechant R, Peter M (2008) Nutrient signals driving cell growth. Curr Opin Cell Biol 20: 678-687 Deribe YL, Pawson T, Dikic I (2010) Post-translational modifications in signal integration. Nature structural & molecular biology 17: 666-672 Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, Diella F (2011) Phospho.ELM: a database of phosphorylation sites--update 2011. Nucleic acids research 39: D261-267 Dulai KS, von Dornum M, Mollon JD, Hunt DM (1999) The evolution of trichromatic color vision by opsin gene duplication in New World and Old World primates. Genome research 9: 629-638 Ear PH, Michnick SW (2009) A general life-death selection strategy for dissecting protein functions. Nature methods 6: 813-816 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792-1797 Efstratiadis A, Posakony JW, Maniatis T, Lawn RM, O'Connell C, Spritz RA, DeRiel JK, Forget BG, Weissman SM, Slightom JL, Blechl AE, Smithies O, Baralle FE, Shoulders CC, Proudfoot NJ (1980) The structure and evolution of the human beta-globin gene family. Cell 21: 653-668 Eisenberg E, Levanon EY (2003) Human housekeeping genes are compact. Trends Genet 19: 362-365 Fazili Z, Sun WP, Mittelstaedt S, Cohen C, Xu XX (1999) Disabled-2 inactivation is an early step in ovarian tumorigenicity. Oncogene 18: 3104-3113 Ferris SD, Whitt GS (1979) Evolution of the Differential Regulation of Duplicate Genes after Polyploidization. J Mol Evol 12: 267-317 Filippakopoulos P, Kofler M, Hantschel O, Gish GD, Grebien F, Salah E, Neudecker P, Kay LE, Turk BE, Superti-Furga G, Pawson T, Knapp S (2008) Structural coupling of SH2-kinase domains links Fes and Abl substrate recognition and kinase activation. Cell 134: 793-803

160

Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 1531-1545 Freschi L, Courcelles M, Thibault P, Michnick SW, Landry CR (2011) Phosphorylation network rewiring by gene duplication. Mol Syst Biol 7 Freschi L, Osseni M, Landry CR (2014) Functional divergence and evolutionary turnover in mammalian phosphoproteomes. PLoS Genet 10: e1004062 Gasch AP, Moses AM, Chiang DY, Fraser HB, Berardini M, Eisen MB (2004) Conservation and evolution of cis-regulatory systems in ascomycete fungi. PLoS Biol 2: e398-e398 Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M et al (2003) A protein interaction map of Drosophila melanogaster. Science 302: 1727-1736 Glotzer M, Murray AW, Kirschner MW (1991) Cyclin is degraded by the ubiquitin pathway. Nature 349: 132-138 Gnad F, de Godoy LMF, Cox J, Neuhauser N, Ren S, Olsen JV, Mann M (2009) High-accuracy identification and bioinformatic analysis of in vivo protein phosphorylation sites in yeast. Proteomics 9: 4642-4652 Gnad F, Gunawardena J, Mann M (2011) PHOSIDA 2011: the posttranslational modification database. Nucleic acids research 39: D253-260 Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M (2007) PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome biology 8: R250 Gordon JL, Byrne KP, Wolfe KH (2009) Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome. PLoS Genetics 5: e1000485-e1000485 Gordon R (1994) Evolution Escapes Rugged Fitness Landscapes by Gene or Genome Doubling - the Blessing of Higher Dimensionality. Comput Chem 18: 325-331 Gough NR, Wong W (2010) Focus Issue: The Evolution of Complexity. Sci Signal 3: eg5-eg5 Gray VE, Kumar S (2011) Rampant purifying selection conserves positions with posttranslational modifications in human proteins. Mol Biol Evol 28: 1565-1568 Gruhler A, Olsen JV, Mohammed S, Mortensen P, Faergeman NJ, Mann M, Jensen ON (2005) Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Molecular & Cellular Proteomics: MCP 4: 310-327 Gu X, Zhang Z, Huang W (2005) Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proceedings of the National Academy of Sciences of the United States of America 102: 707-712 Gu ZL, Nicolae D, Lu HHS, Li WH (2002) Rapid divergence in expression between duplicate genes inferred from microarray data. Trends in Genetics 18: 609-613

161

Gwinn DM, Shackelford DB, Egan DF, Mihaylova MM, Mery A, Vasquez DS, Turk BE, Shaw RJ (2008) AMPK phosphorylation of raptor mediates a metabolic checkpoint. Mol Cell 30: 214-226 Hansen TF, Carter AJR, Chiu CH (2000) Gene conversion may aid adaptive peak shifts. J Theor Biol 207: 495-511 Hart GW, Slawson C, Ramirez-Correa G, Lagerlof O (2011) Cross talk between O-GlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease. Annu Rev Biochem 80: 825-858 He XL, Zhang JZ (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169: 1157-1164 Herbig U, Griffith JW, Fanning E (2000) Mutation of cyclin/cdk phosphorylation sites in HsCdc6 disrupts a late step in initiation of DNA replication in human cells. Mol Biol Cell 11: 4117-4130 Hill JA, Otto SP (2007) The role of pleiotropy in the maintenance of sex in yeast. Genetics 175: 1419-1427 Ho CH, Magtanong L, Barker SL, Gresham D, Nishimura S, Natarajan P, Koh JLY, Porter J, Gray CA, Andersen RJ, Giaever G, Nislow C, Andrews B, Botstein D, Graham TR, Yoshida M, Boone C (2009) A molecular barcoded yeast ORF library enables mode-of-action analysis of bioactive compounds. Nat Biotechnol 27: 369-377 Holmberg CI, Tran SEF, Eriksson JE, Sistonen L (2002) Multisite phosphorylation provides sophisticated regulation of transcription factors. Trends in Biochemical Sciences 27: 619-627 Holt LJ, Tuch BB, Villen J, Johnson AD, Gygi SP, Morgan DO (2009) Global Analysis of Cdk1 Substrate Phosphorylation Sites Provides Insights into Evolution. Science 325: 1682-1686 Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic acids research 40: D261-270 Hsueh KW, Fu SL, Chang CB, Chang YL, Lin CH (2013) A novel Aurora-A-mediated phosphorylation of p53 inhibits its interaction with MDM2. Biochimica et biophysica acta 1834: 508-515 Hunter T (2000) Signaling - 2000 and beyond. Cell 100: 113-127 Hunter T (2007) The age of crosstalk: phosphorylation, ubiquitination, and beyond. Mol Cell 28: 730-738 Hurles M (2004) Gene duplication: the genomic trade in spare parts. PLoS Biology 2: 900-904 Hutti JE, Jarrell ET, Chang JD, Abbott DW, Storz P, Toker A, Cantley LC, Turk BE (2004) A rapid method for determining protein kinase phosphorylation specificity. Nature methods 1: 27-29 Huttlin EL, Jedrychowski MP, Elias JE, Goswami T, Rad R, Beausoleil SA, Villen J, Haas W, Sowa ME, Gygi SP (2010) A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143: 1174-1189 Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic acids research 32: 1037-1049 Ideker T, Krogan NJ (2012) Differential network biology. Mol Syst Biol 8

162

ImageJ -- imagej.nih.gov/ij/ Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America 98: 4569-4574 Janke C, Magiera MM, Rathfelder N, Taxis C, Reber S, Maekawa H, Moreno-Borchart A, Doenges G, Schwob E, Schiebel E, Knop M (2004) A versatile toolbox for PCR-based tagging of yeast genes: new fluorescent proteins, more markers and promoter substitution cassettes. Yeast 21: 947-962 Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P (2006) Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature 443: 594-597 Jin H, Zangar RC (2009) Protein modifications as potential biomarkers in breast cancer. Biomarker insights 4: 191-200 Johnson KE, Cameron S, Toda T, Wigler M, Zoller MJ (1987) Expression in Escherichia-Coli of Bcy1, the Regulatory Subunit of Cyclic Amp-Dependent Protein-Kinase from Saccharomyces-Cerevisiae - Purification and Characterization. J Biol Chem 262: 8636-8642 Kaganovich M, Snyder M (2012) Phosphorylation of yeast transcription factors correlates with the evolution of novel sequence and function. Journal of Proteome Research 11: 261-268 Kamemura K, Hayes BK, Comer FI, Hart GW (2002) Dynamic interplay between O-glycosylation and O-phosphorylation of nucleocytoplasmic proteins: alternative glycosylation/phosphorylation of THR-58, a known mutational hot spot of c-Myc in lymphomas, is regulated by mitogens. J Biol Chem 277: 19229-19235 Kantarci S, Al-Gazali L, Hill RS, Donnai D, Black GC, Bieth E, Chassaing N, Lacombe D, Devriendt K, Teebi A, Loscertales M, Robson C, Liu T, MacLaughlin DT, Noonan KM, Russell MK, Walsh CA, Donahoe PK, Pober BR (2007) Mutations in LRP2, which encodes the multiligand receptor megalin, cause Donnai-Barrow and facio-oculo-acoustico-renal syndromes. Nature genetics 39: 957-959 Kapoor M, Hamm R, Yan W, Taya Y, Lozano G (2000) Cooperative phosphorylation at multiple sites is required to activate p53 in response to UV radiation. Oncogene 19: 358-364 Kastan MB (2008) DNA damage responses: mechanisms and roles in human disease: 2007 G.H.A. Clowes Memorial Award Lecture. Molecular cancer research : MCR 6: 517-524 Kellis M, Birren BW, Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428: 617-624 Kent WJ (2002) BLAT---The BLAST-Like Alignment Tool. Genome Research 12: 656-664 Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M et al (2009) Human Protein Reference Database--2009 update. Nucleic acids research 37: D767-772

163

Khmelinskii A, Roostalu J, Roque H, Antony C, Schiebel E (2009) Phosphorylation-Dependent Protein Interactions at the Spindle Midzone Mediate Cell Cycle Regulation of Spindle Elongation. Dev Cell 17: 244-256 Khoury GA, Baliban RC, Floudas CA (2011) Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Scientific reports 1 Kikani CK, Antonysamy SA, Bonanno JB, Romero R, Zhang FF, Russell M, Gheyi T, Iizuka M, Emtage S, Sauder JM, Turk BE, Burley SK, Rutter J (2010) Structural bases of PAS domain-regulated kinase (PASK) activation in the absence of activation loop phosphorylation. The Journal of biological chemistry 285: 41034-41043 Kim DS, Hahn Y (2011) Identification of novel phosphorylation modification sites in human proteins that originated after the human-chimpanzee divergence. Bioinformatics 27: 2494-2501 Kim W, Bennett EJ, Huttlin EL, Guo A, Li J, Possemato A, Sowa ME, Rad R, Rush J, Comb MJ, Harper JW, Gygi SP (2011) Systematic and quantitative assessment of the ubiquitin-modified proteome. Mol Cell 44: 325-340 Koivomagi M, Valk E, Venta R, Iofik A, Lepiku M, Balog ER, Rubin SM, Morgan DO, Loog M (2011) Cascades of multisite phosphorylation control Sic1 destruction at the onset of S phase. Nature 480: 128-131 Kondrashov FA, Koonin EV (2004) A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet 20: 287-290 Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV (2002) Selection in the evolution of gene duplications. Genome biology 3 Kozlov SV, Graham ME, Jakob B, Tobias F, Kijas AW, Tanuji M, Chen P, Robinson PJ, Taucher-Scholz G, Suzuki K, So S, Chen D, Lavin MF (2011) Autophosphorylation and ATM activation: additional sites add to the complexity. J Biol Chem 286: 9107-9119 Krogan NJ, Cagney G, Yu HY, Zhong GQ, Guo XH, Ignatchenko A, Li J, Pu SY, Datta N, Tikuisis AP, Punna T, Peregrin-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440: 637-643 Kurmangaliyev YZ, Goland A, Gelfand MS (2011) Evolutionary patterns of phosphorylated serines. Biol Direct 6 Landry CR, Levy ED, Michnick SW (2009) Weak functional constraints on phosphoproteomes. Trends in Genetics: TIG 25: 193-197 Latham JA, Dent SY (2007) Cross-regulation of histone modifications. Nature structural & molecular biology 14: 1017-1024 Levine AJ, Oren M (2009) The first 30 years of p53: growing ever more complex. Nature reviews Cancer 9: 749-758

164

Levy E, Michnick S, Landry C (2012) Protein abundance is key to distinguish promiscuous from functional phosphorylation based on evolutionary information. Philosophical transactions of the Royal Society of London Series B, Biological sciences 367: 2594-2606 Li M, Luo J, Brooks CL, Gu W (2002) Acetylation of p53 inhibits its ubiquitination by Mdm2. J Biol Chem 277: 50607-50611 Li SM, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JDJ, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF et al (2004) A map of the interactome network of the metazoan C-elegans. Science 303: 540-543 Li X, Gerber SA, Rudner AD, Beausoleil SA, Haas W, Villen J, Elias JE, Gygi SP (2007) Large-scale phosphorylation analysis of alpha-factor-arrested Saccharomyces cerevisiae. Journal of Proteome Research 6: 1190-1197 Lienhard GE (2008) Non-functional phosphorylations? Trends in Biochemical Sciences 33: 351-352 Lim WA, Pawson T (2010) Phosphotyrosine signaling: evolving a new cellular communication system. Cell 142: 661-667 Lin DI, Barbash O, Kumar KG, Weber JD, Harper JW, Klein-Szanto AJ, Rustgi A, Fuchs SY, Diehl JA (2006) Phosphorylation-dependent ubiquitination of cyclin D1 by the SCF(FBX4-alphaB crystallin) complex. Mol Cell 24: 355-366 Liu X, Yu X, Zack DJ, Zhu H, Qian J (2008) TiGER: a database for tissue-specific gene expression and regulation. BMC Bioinformatics 9: 271 Livanova NB, Chebotareva NA, Eronina TB, Kurganov BI (2002) Pyridoxal 5'-phosphate as a catalytic and conformational cofactor of muscle glycogen phosphorylase B. Biochemistry Biokhimiia 67: 1089-1098 Longo VD (2003) The Ras and Sch9 pathways regulate stress resistance and longevity. Exp Gerontol 38: 807-811 Lu CT, Huang KY, Su MG, Lee TY, Bretana NA, Chang WC, Chen YJ, Chen YJ, Huang HD (2013) DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res 41: D295-305 Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151-1155 Lynch M, Force A (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics 154: 459-459 Lynch M, Sung W, Morris K, Coffey N, Landry CR, Dopman EB, Dickinson WJ, Okamoto K, Kulkarni S, Hartl DL, Thomas WK (2008) A genome-wide view of the spectrum of spontaneous mutations in yeast. Proceedings of the National Academy of Sciences of the United States of America 105: 9272-9277 Ma PS, Wera S, Van Dijck P, Thevelein JM (1999) The PDE1-encoded low-affinity phosphodiesterase in the yeast Saccharomyces cerevisiae has a specific function in controlling agonist-induced cAMP signaling. Mol Biol Cell 10: 91-104

165

Macek B, Gnad F, Soufi B, Kumar C, Olsen JV, Mijakovic I, Mann M (2008) Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Molecular & cellular proteomics : MCP 7: 299-307 Madeo F, Schlauer J, Zischka H, Mecke D, Frohlich KU (1998) Tyrosine phosphorylation regulates cell cycle-dependent nuclear localization of Cdc48p. Mol Biol Cell 9: 131-141 Malik R, Nigg EA, Korner R (2008) Comparative conservation analysis of the human mitotic phosphoproteome. Bioinformatics 24: 1426-1432 Mann M, Jensen ON (2003) Proteomic analysis of post-translational modifications. Nat Biotechnol 21: 255-261 Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) The protein kinase complement of the human genome. Science 298: 1912-1934 Marcantonio M, Trost M, Courcelles M, Desjardins M, Thibault P (2008) Combined enzymatic and data mining approaches for comprehensive phosphoproteome analyses: application to cell signaling events of interferon-gamma-stimulated macrophages. Molecular & Cellular Proteomics: MCP 7: 645-660 Martin DE, Soulard A, Hall MN (2004) TOR regulates ribosomal protein gene expression via PKA and the forkhead transcription factor FHL1. Cell 119: 969-979 Meetei AR, Medhurst AL, Ling C, Xue Y, Singh TR, Bier P, Steltenpool J, Stone S, Dokal I, Mathew CG, Hoatlin M, Joenje H, de Winter JP, Wang W (2005) A human ortholog of archaeal DNA repair protein Hef is defective in Fanconi anemia complementation group M. Nature genetics 37: 958-963 Michnick SW, Ear PH, Manderson EN, Remy I, Stefan E (2007) Universal strategies in research and drug discovery based on protein-fragment complementation assays. Nat Rev Drug Discov 6: 569-582 Miller ML, Jensen LJ, Diella F, Jorgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, Sicheritz-Ponten T, Olhovsky M, Pasculescu A, Alexander J, Knapp S, Blom N, Bork P, Li S, Cesareni G, Pawson T, Turk BE et al (2008) Linear motif atlas for phosphorylation-dependent signaling. Science signaling 1: ra2 Minguez P, Parca L, Diella F, Mende DR, Kumar R, Helmer-Citterich M, Gavin AC, van Noort V, Bork P (2012) Deciphering a global network of functionally associated post-translational modifications. Mol Syst Biol 8: 599 Miura Y, Sakurai Y, Endo T (2012) O-GlcNAc modification affects the ATM-mediated DNA damage response. Biochimica et biophysica acta 1820: 1678-1685 Mok J, Kim PM, Lam HYK, Piccirillo S, Zhou X, Jeschke GR, Sheridan DL, Parker SA, Desai V, Jwa M, Cameroni E, Niu H, Good M, Remenyi A, Ma J-LN, Sheu Y-J, Sassi HE, Sopko R, Chan CSM, De Virgilio C et al (2010) Deciphering protein kinase specificity through large-scale analysis of yeast phosphorylation site motifs. Science Signaling 3: ra12-ra12 Moll UM, Petrenko O (2003) The MDM2-p53 interaction. Molecular cancer research : MCR 1: 1001-1008 Morell M, Ventura S, Aviles FX (2009) Protein complementation assays: Approaches for the in vivo analysis of protein interactions. Febs Lett 583: 1684-1691 Moses AM, Landry CR (2010) Moving from transcriptional to phospho-evolution: generalizing regulatory evolution? Trends in Genetics: TIG 26: 462-467

166

Moses AM, Liku ME, Li JJ, Durbin R (2007) Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites. Proceedings of the National Academy of Sciences of the United States of America 104: 17713-17718 Mukherjee S, Keitany G, Li Y, Wang Y, Ball HL, Goldsmith EJ, Orth K (2006) Yersinia YopJ acetylates and inhibits kinase activation by blocking phosphorylation. Science 312: 1211-1214 Musso G, Costanzo M, Huangfu M, Smith AM, Paw J, San Luis B-J, Boone C, Giaever G, Nislow C, Emili A, Zhang Z (2008) The extensive and condition-dependent nature of epistasis among whole-genome duplicates in yeast. Genome Research 18: 1092-1099 Nash P, Tang X, Orlicky S, Chen Q, Gertler FB, Mendenhall MD, Sicheri F, Pawson T, Tyers M (2001) Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication. Nature 414: 514-521 Nguyen Ba AN, Moses AM (2010) Evolution of Characterized Phosphorylation Sites in Budding Yeast. Molecular Biology and Evolution 27: 2027-2037 Nussinov R, Tsai CJ, Xin F, Radivojac P (2012) Allosteric post-translational modification codes. Trends in biochemical sciences 37: 447-455 Ohno S (1970) Evolution by gene duplication, London, New York,: Allen & Unwin; Springer-Verlag. Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M (2006) Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127: 635-648 Olsen JV, Mann M (2013) Status of large-scale analysis of post-translational modifications by mass spectrometry. Mol Cell Proteomics 12: 3444-3452 Papp B, Pál C, Hurst LD (2003) Evolution of cis-regulatory elements in duplicated genes of yeast. Trends in Genetics: TIG 19: 417-422 Pearlman SM, Serber Z, Ferrell JE (2011) A Mechanism for the Evolution of Phosphorylation Sites. Cell 147: 934-946 Pike AC, Rellos P, Niesen FH, Turnbull A, Oliver AW, Parker SA, Turk BE, Pearl LH, Knapp S (2008) Activation segment dimerization: a mechanism for kinase autophosphorylation of non-consensus sites. Embo J 27: 704-714 Pincus D, Letunic I, Bork P, Lim WA (2008) Evolution of the phospho-tyrosine signaling machinery in premetazoan lineages. Proc Natl Acad Sci U S A 105: 9680-9684 Portela P, Van Dijck P, Thevelein JM, Moreno S (2003) Activation state of protein kinase A as measured in permeabilised Saccharomyces cerevisiae cells correlates with PKA-controlled phenotypes in vivo. Fems Yeast Res 3: 119-126

167

Prabakaran S, Lippens G, Steen H, Gunawardena J (2012) Post-translational modification: nature's escape from genetic imprisonment and the basis for dynamic information encoding. Wiley interdisciplinary reviews Systems biology and medicine 4: 565-583 Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, McCartney RR, Schmidt MC, Rachidi N, Lee S-J, Mah AS, Meng L, Stark MJR, Stern DF, De Virgilio C, Tyers M et al (2005) Global analysis of protein phosphorylation in yeast. Nature 438: 679-684 Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18 Suppl 1: S71-77 Ramachandran V, Shah KH, Herman PK (2011) The cAMP-Dependent Protein Kinase Signaling Pathway Is a Key Regulator of P Body Foci Formation. Mol Cell 43: 973-981 Reinders J, Wagner K, Zahedi RP, Stojanovski D, Eyrich B, van der Laan M, Rehling P, Sickmann A, Pfanner N, Meisinger C (2007) Profiling phosphoproteins of yeast mitochondria reveals a role of phosphorylation in assembly of the ATP synthase. Mol Cell Proteomics 6: 1896-1906 Rennefahrt UE, Deacon SW, Parker SA, Devarajan K, Beeser A, Chernoff J, Knapp S, Turk BE, Peterson JR (2007) Specificity profiling of Pak kinases allows identification of novel phosphorylation sites. The Journal of biological chemistry 282: 15667-15678 Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta OR, Strauss SH, Brunner AM, DiFazio SP (2012) Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus. Genome research 22: 95-105 Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li SM et al (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437: 1173-1178 Ruan HB, Singh JP, Li MD, Wu J, Yang X (2013) Cracking the O-GlcNAc code in metabolism. Trends in endocrinology and metabolism: TEM 24: 301-309 Scannell DR, Wolfe KH (2008) A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast. Genome research 18: 137-147 Schlecht U, Miranda M, Suresh S, Davis RW, St Onge RP (2012) Multiplex assay for condition-dependent changes in protein-protein interactions. Proceedings of the National Academy of Sciences of the United States of America Schwammle V, Aspalter CM, Sidoli S, Jensen ON (2014) Large scale analysis of co-existing post-translational modifications in histone tails reveals global fine structure of cross-talk. Molecular & cellular proteomics : MCP 13: 1855-1865 Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M (2011) Global quantification of mammalian gene expression control. Nature 473: 337-342

168

Searle JS, Wood MD, Kaur M, Tobin DV, Sanchez Y (2011) Proteins in the Nutrient-Sensing and DNA Damage Checkpoint Pathways Cooperate to Restrain Mitotic Progression following DNA Damage. Plos Genetics 7 Seo J, Lee KJ (2004) Post-translational modifications and their biological functions: Proteomic analysis and systematic approaches. J Biochem Mol Biol 37: 35-44 Seoighe C, Wolfe KH (1999) Yeast genome evolution in the post-genome era. Current opinion in microbiology 2: 548-554 Serber Z, Ferrell JE (2007) Tuning bulk electrostatics to regulate protein function. Cell 128: 441-444 Serber Z, Ferrell Jr JE (2007) Tuning Bulk Electrostatics to Regulate Protein Function. Cell 128: 441-444 Sharma K, D'Souza RC, Tyanova S, Schaab C, Wisniewski JR, Cox J, Mann M (2014) Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell reports 8: 1583-1594 Sheridan DL, Kong Y, Parker SA, Dalby KN, Turk BE (2008) Substrate discrimination among mitogen-activated protein kinases through distinct docking sequence motifs. J Biol Chem 283: 19511-19520 Skou JC (1965) Enzymatic Basis for Active Transport of Na+ and K+ across Cell Membrane. Physiol Rev 45: 596-& Souciet J-L, Dujon B, Gaillardin C, Johnston M, Baret PV, Cliften P, Sherman DJ, Weissenbach J, Westhof E, Wincker P, Jubin C, Poulain J, Barbe Vr, Ségurens Ba, Artiguenave Fß, Anthouard Vr, Vacherie B, Val M-E, Fulton RS, Minx P et al (2009) Comparative genomics of protoploid Saccharomycetaceae. Genome Research 19: 1696-1709 Soulard A, Cremonesi A, Moes S, Schutz F, Jeno P, Hall MN (2010) The Rapamycin-sensitive Phosphoproteome Reveals That TOR Controls Protein Kinase A Toward Some But Not All Substrates. Mol Biol Cell 21: 3475-3486 Sprang SR, Acharya KR, Goldsmith EJ, Stuart DI, Varvill K, Fletterick RJ, Madsen NB, Johnson LN (1988) Structural changes in glycogen phosphorylase induced by phosphorylation. Nature 336: 215-221 Stefan E, Aquin S, Berger N, Landry CR, Nyfeler B, Bouvier M, Michnick SW (2007) Quantification of dynamic protein complexes using Renilla luciferase fragment complementation applied to protein kinase A activities in vivo. Proceedings of the National Academy of Sciences of the United States of America 104: 16916-16921 Tan CS, Bodenmiller B, Pasculescu A, Jovanovic M, Hengartner MO, Jorgensen C, Bader GD, Aebersold R, Pawson T, Linding R (2009) Comparative analysis reveals conserved protein phosphorylation networks implicated in multiple diseases. Science signaling 2: ra39 Tarassov K, Messier V, Landry CR, Radinovic S, Molina MMS, Shames I, Malitskaya Y, Vogel J, Bussey H, Michnick SW (2008) An in vivo map of the yeast protein interactome. Science 320: 1465-1470 Tarrant MK, Cole PA (2009) The Chemical Biology of Protein Phosphorylation. Annu Rev Biochem 78: 797-825

169

Taylor SS, Radzioandzelm E, Hunter T (1995) Protein-Kinases .8. How Do Protein-Kinases Discriminate between Serine Threonine and Tyrosine - Structural Insights from the Insulin-Receptor Protein-Tyrosine Kinase. Faseb J 9: 1255-1266 The R project for Statistical Computing -- www.r-project.org/ Thingholm TE, Jørgensen TJD, Jensen ON, Larsen MR (2006) Highly selective enrichment of phosphorylated peptides using titanium dioxide. Nature Protocols 1: 1929-1935 Tirosh I, Barkai N (2007) Comparative analysis indicates regulatory neofunctionalization of yeast duplicates. Genome Biology 8: R50-R50 Toda T, Cameron S, Sass P, Zoller M, Wigler M (1987) Three different genes in S. cerevisiae encode the catalytic subunits of the cAMP-dependent protein kinase. Cell 50: 277-287 Treco DA, Winston F (2008) Growth and manipulation of yeast. Current protocols in molecular biology Chapter 13: Unit 13.12-Unit 13.12 Trinidad JC, Barkan DT, Gulledge BF, Thalhammer A, Sali A, Schoepfer R, Burlingame AL (2012) Global identification and characterization of both O-GlcNAcylation and phosphorylation at the murine synapse. Mol Cell Proteomics 11: 215-229 Ubersax JA, Ferrell JE, Jr. (2007) Mechanisms of specificity in protein phosphorylation. Nat Rev Mol Cell Biol 8: 530-541 Ubersax JA, Woodbury EL, Quang PN, Paraz M, Blethrow JD, Shah K, Shokat KM, Morgan DO (2003) Targets of the cyclin-dependent kinase Cdk1. Nature 425: 859-864 Uckun FM, Ma H, Zhang J, Ozer Z, Dovat S, Mao C, Ishkhanian R, Goodman P, Qazi S (2012) Serine phosphorylation by SYK is critical for nuclear localization and transcription factor function of Ikaros. Proc Natl Acad Sci U S A 109: 18072-18077 van Hoof A (2005) Conserved Functions of Yeast Genes Support the Duplication, Degeneration and Complementation Model for Gene Duplication. Genetics 171: 1455-1461 Vazquez F, Ramaswamy S, Nakamura N, Sellers WR (2000) Phosphorylation of the PTEN tail regulates protein stability and function. Mol Cell Biol 20: 5010-5018 Verma R, Annan RS, Huddleston MJ, Carr SA, Reynard G, Deshaies RJ (1997) Phosphorylation of Sic1p by G1 Cdk required for its degradation and entry into S phase. Science 278: 455-460 Vidal M, Cusick ME, Barabasi AL (2011) Interactome networks and human disease. Cell 144: 986-998 Wagner A (2001a) Birth and death of duplicated genes in completely sequenced eukaryotes. Trends Genet 17: 237-239 Wagner A (2001b) The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Molecular Biology and Evolution 18: 1283-1292 Wang J, Torii M, Liu H, Hart GW, Hu ZZ (2011) dbOGAP - an integrated bioinformatics resource for protein O-GlcNAcylation. BMC Bioinformatics 12: 91

170

Wang M, Weiss M, Simonovic M, Haertinger G, Schrimpf SP, Hengartner MO, von Mering C (2012) PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life. Mol Cell Proteomics 11: 492-500 Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT (2004a) The DISOPRED server for the prediction of protein disorder. Bioinformatics 20: 2138-2139 Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004b) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of molecular biology 337: 635-645 Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, Chu AM, Connelly C, Davis K, Dietrich F, Dow SW, EL Bakkoury M, Foury F, Friend SH, Gentalen E, Giaever G et al (1999) Functional characterization of the S-cerevisiae genome by gene deletion and parallel analysis. Science 285: 901-906 Wolfe KH, Shields DC (1997) Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387: 708-713 Wong A, Zhang YW, Jeschke GR, Turk BE, Rudnick G (2012) Cyclic GMP-dependent Stimulation of Serotonin Transport Does Not Involve Direct Transporter Phosphorylation by cGMP-dependent Protein Kinase. J Biol Chem 287: 36051-36058 Yang ZH (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Molecular biology and evolution 24: 1586-1591 Zeidan Q, Hart GW (2010) The intersections between O-GlcNAcylation and phosphorylation: implications for multiple signaling pathways. Journal of cell science 123: 13-22 Zhang JZ (2003) Evolution by gene duplication: an update. Trends Ecol Evol 18: 292-298 Zhang P, Smith-Nguyen EV, Keshwani MM, Deal MS, Kornev AP, Taylor SS (2012) Structure and Allostery of the PKA RII beta Tetrameric Holoenzyme. Science 335: 712-716 Zhao Y, Jensen ON (2009) Modification-specific proteomics: strategies for characterization of post-translational modifications using enrichment techniques. Proteomics 9: 4632-4641 Zhu H, Klemic JF, Chang S, Bertone P, Casamayor A, Klemic KG, Smith D, Gerstein M, Reed MA, Snyder M (2000) Analysis of yeast protein kinases using protein chips. Nat Genet 26: 283-289 Zielinska DF, Gnad F, Jedrusik-Bode M, Wisniewski JR, Mann M (2009) Caenorhabditis elegans has a phosphoproteome atypical for metazoans that is enriched in developmental and sex determination proteins. J Proteome Res 8: 4039-4049 Zielinska DF, Gnad F, Wisniewski JR, Mann M (2010) Precision mapping of an in vivo N-glycoproteome reveals rigid topological and sequence constraints. Cell 141: 897-907

post-translational modifications regulatory networks : evolution, … · 2018. 4. 24. · beltrao,...

Documents