post-translational modifications regulatory networks : evolution, … · 2018. 4. 24. · beltrao,...
TRANSCRIPT
Post-translational modifications regulatory networks: evolution, mechanisms and implications
Thèse
Luca Freschi
Doctorat en biologie Philosophiae Doctor (Ph.D.)
Québec, Canada
© Luca Freschi, 2015
III
Résumé
Les modifications post-traductionnelles (PTM) sont des modifications chimiques des
protéines qui permettent à la cellule de réguler finement ses fonctions ainsi que de coder et
d’intégrer des signaux environnementaux. Les progrès récents en ce qui a trait aux
techniques expérimentales et bioinformatiques nous ont permis de determiner les profils de
PTM pour des protéomes entiers ainsi que d’identifier les molécules qui sont responsables
d’ « écrire » ou d’« effacer » ces PTM. Avec ces donnés, il a été possible de commencer à
definir des réseaux de régulation cellulaire par PTM. Ici, nous avons étudié l’évolution de
ces réseaux pour mieux comprendre comment ils peuvent contribuer à expliquer la
complexité et la diversité des organismes ainsi que pour mieux comprendre leurs
mecanismes d’action. Avant tout, nous avons abordé la question de comment les réseaux de
régulation des PTM peuvent être recablés après un évenement de duplication des gènes en
étudiant comment le réseau de phosphorégulation de la levure bourgeonnante a été récablé
après un évenement de duplication complète du génome qui a eu lieu il y a 100 milions
d’années. Nos résultats mettent en évidence le rôle de la duplication des gènes comme
mécanisme clé pour l’innovation et la complexification des réseaux de régulation par PTM.
Par la suite, nous avons abordé la question de comment les PTM peuvent contribuer à la
diversité des organismes en comparant les profils de phosphorylation de l’homme et de la
souris. Nous avons trouvé des différences substantielles dans les profils de PTM de ces
deux espèces qui ont le potentiel d’expliquer, au moins en partie, les différences
phénotypiques observées entre eux. Nous avons aussi trouvé des évidences qui supportent
l’idée que les PTM peuvent « sauter » vers des nouvelles localisations et quand même
réguler les mêmes fonctions biologiques. Ce phénomène doit être pris en considération
dans les comparaisons des profils de PTM qui appartiennent à des espèces différentes, pour
éviter de surestimer la divergence causée par la régulation par les PTM. Enfin, nous avons
investigué comment plusieures PTM alternatives pour un même residu pouvent interagir
pour réguler des fonctions cellulaires. Nous avons examiné deux des PTM les plus connus,
la phosphorylation et la O-GlcNAcylation, qui modifient les sérines et les thréonines, et
nous avons étudié les mécanismes potentiels d’interaction entre ces deux PTM. Nos
résultats supportent l’hypothèse que ces deux PTM contrôlent plusieurs fonctions
biologiques plutôt qu’une seule fonction. Globalement, les résultats présentés dans cette
IV
thèse permettent d’élucider les dynamiques évolutives, les mécanismes de fonctionnement
et les implications biologiques des PTM.
V
Abstract
Post-translational modifications (PTMs) are chemical modification of proteins that allow
the cell to finely tune its functions as well as to encode and integrate environmental signals.
The recent advancements in the experimental and bioinformatic techniques have allowed us
to determine the PTM profiles of entire proteomes as well as to identify the molecules that
write or erase PTMs to/from each protein. This data have made possible to define cellular
PTM regulatory networks. Here, we study the evolution of these networks to get new
insights about how they may contribute to increase organismal complexity and diversity
and to better understand their molecular mechanisms of functioning. We first address the
question of how and to which extent a PTM network can be rewired after a gene
duplication event, by studying how the budding yeast phosphoregulatory network was
rewired after a whole genome duplication event that occurred 100 million years ago. Our
results highlight the role of gene duplication as a key mechanism to innovate and
complexify PTM regulatory networks. Then, we address the question of how PTM
networks may contribute to organismal diversity by comparing the human and mouse
phosphorylation profiles. We find that there are substantial differences in the PTM profiles
of these two species that have the potential to explain, at least in part, the phenotypic
differences observed between them. Moreover, we find evidence supporting the idea that
PTMs can jump to new positions during evolution and still regulate the same biological
functions. This phenomenon should be taken into account when comparing the PTM
profiles of different species, in order to avoid overestimating the divergence in PTM
regulation. Finally, we investigate how multiple and alternative PTMs that affect the same
residues interact with each other to control proteins functions. We focus on two of the most
studied PTMs, protein phosphorylation and O-GlcNAcylation, that affect serine and
threonine residues and we study their potential mechanisms of interactions in human and
mouse. Our results support the hypothesis that these two PTMs control multiple biological
functions rather than a single one. Globally this work provides new findings that elucidate
the evolutionary dynamics, the functional mechanisms and the biological implications of
PTMs.
VII
Table of Contents
RÉSUMÉ .......................................................................................................................................... III
ABSTRACT ....................................................................................................................................... V
TABLE OF CONTENTS .................................................................................................................... VII
LIST OF FIGURES ............................................................................................................................ XI
LIST OF ABBREVIATIONS .............................................................................................................. XIII
ACKNOWLEDGEMENTS ................................................................................................................ XVII
FOREWORD ................................................................................................................................... XIX
CHAPTER 1 - INTRODUCTION ............................................................................................ 1
1.1 - POST-TRANSLATIONAL MODIFICATIONS ................................................................................. 1
1.2 - HOW PTMS REGULATE PROTEIN FUNCTIONS .......................................................................... 2
1.3 - PTM REGULATORY NETWORKS ............................................................................................... 3
1.4 - CROSS-TALK BETWEEN PTMS ................................................................................................. 4
1.5 - PTM NETWORKS AND THE EVOLUTION OF BIOLOGICAL COMPLEXITY AND DIVERSITY ......... 4
1.6 - THE TECHNOLOGICAL ADVANCEMENTS OF THE LAST DECADE MAKE POSSIBLE THE STUDY
OF PTM NETWORKS ........................................................................................................................ 5
1.7 – AIMS OF THIS THESIS .............................................................................................................. 6
CHAPTER 2 - PHOSPHORYLATION NETWORK REWIRING BY GENE DUPLICATION ................................................................................................................................................... 9
2.1 – RÉSUMÉ ................................................................................................................................ 10
2.2 - ABSTRACT ............................................................................................................................. 11
2.3 – INTRODUCTION ..................................................................................................................... 12
2.4 - MATERIALS AND METHODS .................................................................................................. 13
2.5 – RESULTS AND DISCUSSION ................................................................................................... 18
2.5.1 - Paralogous phosphoproteins substantially diverged after WGD .................................. 18
2.5.2 - Conservation and compensation of phosphosite loss by site-position turnover ............ 21
2.5.3 - Life after WGD: rewiring the cellular regulatory networks .......................................... 22
2.5.4 - Phosphosite loss dominates site-divergence .................................................................. 24
2.6 - CONCLUSION ......................................................................................................................... 27
2.7 - ACKNOWLEDGEMENTS .......................................................................................................... 28
CHAPTER 3 - WHERE DO PHOSPHOSITES COME FROM AND WHERE DO THEY GO AFTER GENE DUPLICATION? ............................................................................................ 29
3.1 – RÉSUMÉ ................................................................................................................................ 30
3.2 – ABSTRACT ............................................................................................................................ 31
3.3 – INTRODUCTION ..................................................................................................................... 32
3.4 - METHODS .............................................................................................................................. 36
3.5 – RESULTS ............................................................................................................................... 37
3.6 - CONCLUSION ......................................................................................................................... 44
3.7 - ACKNOWLEDGEMENTS .......................................................................................................... 47
VIII
CHAPTER 4 - FUNCTIONAL DIVERGENCE AND EVOLUTIONARY TURNOVER IN MAMMALIAN PHOSPHOPROTEOMES ............................................................................. 49
4.1 – RÉSUMÉ................................................................................................................................. 50
4.2 - ABSTRACT ............................................................................................................................. 51
4.3 – INTRODUCTION ..................................................................................................................... 52
4.4 - METHODS .............................................................................................................................. 55
4.5 – RESULTS ................................................................................................................................ 59
4.5.1 - Conservation and divergence between human and mouse phosphoproteomes ............. 59
4.5.2 - A role for state-diverged sites in phosphoproteome divergence .................................... 64
4.5.3 - Evolutionary turnover of mammalian phosphorylation sites ......................................... 70
4.6 – CONCLUSION ......................................................................................................................... 74
4.7 – ACKNOWLEDGEMENTS ......................................................................................................... 76
CHAPTER 5 – CROSS-TALK BETWEEN O-GLCNACYLATION AND PHOSPHORYLATION IN MAMMALIAN PROTEOMES ................................................... 77
5.1 – RÉSUMÉ................................................................................................................................. 78
5.2 -ABSTRACT .............................................................................................................................. 79
5.3 - INTRODUCTION ...................................................................................................................... 80
5.4 - METHODS .............................................................................................................................. 83
5.5 - RESULTS ................................................................................................................................ 85
5.5.1 - An extensive dataset of phosphorylation and O-GlcNAcylated sites ............................. 85 5.5.2 – Phosphorylation and O-GlcNAcylation are found in the same residues more than expected by chance alone .......................................................................................................... 85
5.5.3 - Clues of independent regulation of multiple functions in humans but not in mouse ...... 88
5.5.4 – Three state sites and 2-state ones have different preferences for protein kinases ........ 91
5.6 - CONCLUSION ......................................................................................................................... 93
CHAPTER 6 – GENERAL CONCLUSIONS .......................................................................... 95
6.1 - SUMMARY OF THE STUDY ...................................................................................................... 95
6.2 - PERSPECTIVES ....................................................................................................................... 98
ANNEX 1 – SUPPLEMENTARY INFORMATION FOR CHAPTER 2 .............................. 101
ANNEX 2 – SUPPLEMENTARY INFORMATION FOR CHAPTER 4 .............................. 111
ANNEX 3 – SUPPLEMENTARY INFORMATION FOR CHAPTER 5 .............................. 129
ANNEX 4 - QPCA: A SCALABLE ASSAY TO MEASURE THE PERTURBATION OF PROTEIN-PROTEIN INTERACTIONS IN LIVING CELLS ............................................. 131
ABSTRACT ................................................................................................................................... 131
INTRODUCTION ............................................................................................................................ 131
MATERIALS AND METHODS ......................................................................................................... 134
RESULTS AND DISCUSSION .......................................................................................................... 139
The DHFR-qPCA signal reflects the amount of protein complex formed in the cell .............. 139
DHFR-qPCA allows to study the effect of metabolic, drug and genetic perturbations on protein complexes ................................................................................................................................ 143
IX
Conclusions ............................................................................................................................. 147
ACKNOWLEDGEMENTS ................................................................................................................ 147
REFERENCES ....................................................................................................................... 157
XI
List of Figures
FIGURE 2.1. CONSERVATION AND DIVERGENCE OF PHOSPHOREGULATION AMONG WGD PARALOGS...................................... 21 FIGURE 2.2. GAINS AND LOSSES OF PHOSPHOSITES AFTER GENE DUPLICATION. ................................................................... 25 FIGURE 2.3. L. KLUYVERI PHOSPHOPROTEOMICS CONFIRMS THAT PHOSPHOSITES ARE PREFERENTIALLY LOST IN PARALOGOUS
PHOSPHOPROTEINS. ....................................................................................................................................... 26 FIGURE 3.1. ALGORITHM USED TO CALCULATE AND COMPARE THE PROPORTIONS OF TRANSITIONS BETWEEN PHOSPHORYLATED AND
PHOSPHOMIMETIC RESIDUES RELATIVE TO CONTROL SITES. .................................................................................... 39 FIGURE 3.2. PHOSPHOSITES THAT ARE DIFFERENTIALLY LOST IN PARALOGOUS PHOSPHOPROTEINS EVOLVE TOWARD NEGATIVELY
CHARGED RESIDUES. ...................................................................................................................................... 40 FIGURE 3.3. DETAILED ANALYSIS OF THE PATTERNS OF EVOLUTION OF PSER AND PTHR SITES. ............................................... 41 FIGURE 3.4. TRANSITIONS BETWEEN PHOSPHORYLATABLE AND PHOSPHOMIMETIC AMINO ACIDS NEED TO GO THROUGH A NON‐
NEGATIVELY CHARGED INTERMEDIATE. ............................................................................................................... 45 FIGURE 3.5. A DUPLICATION EVENT COULD PROVIDE THE CONDITIONS FOR THE INTERMEDIATE NON‐FUNCTIONAL SITE TO BE
NEUTRAL, WHICH WOULD ALLOW A TRANSITION WITHOUT AFFECTING THE FITNESS OF THE ORGANISM. .......................... 46 FIGURE 4.1. PURIFYING SELECTION IS ACTING ON MAMMALIAN PHOSPHORYLATION SITES AND THEIR PHOSPHORYLATION STATUS. 61 FIGURE 4.2. ANALYSIS OF NETPHOREST SCORES FOR THE DIFFERENT CLASSES OF SITES. ....................................................... 67 FIGURE 4.3. COMPARISON OF A PAIR OF STC AND STD SITES. ......................................................................................... 69 FIGURE 4.4. PROPORTION OF SITES THAT ARE PHOSPHORYLATED BY THE SAME PROTEIN KINASE. ........................................... 71 FIGURE 4.5. EVOLUTIONARY HISTORIES OF CANDIDATE FUNCTIONALLY REDUNDANT SITE PAIRS. ............................................ 73 FIGURE 5.1. NUMBER OF 3‐STATE SITES IN HUMAN AND MOUSE AND COMPARISONS TO RANDOM EXPECTATIONS. ................... 86 FIGURE 5.2. FRACTION OF SITES AS A FUNCTION OF PROTEIN ABUNDANCE FOR HUMAN AND MOUSE O‐GLCNACYLATION SITES
AND COMPARISON OF AVERAGE PROTEIN ABUNDANCE BETWEEN ALL PROTEINS AND PROTEINS THAT CONTAIN 3‐STATE SITES
FOR HUMANS AND MOUSE. ............................................................................................................................. 88 FIGURE 5.3. COMPARISON OF RESIDUE CONSERVATION FOR 1‐STATE, 2‐STATE AND 3‐STATE RESIDUES IN THE HUMAN AND MOUSE
PROTEOMES. ................................................................................................................................................ 90 FIGURE 5.4. COMPARISON OF THE EVOLUTIONARY CONSERVATION OF THE REGIONS SURROUNDING 1‐STATE, 2‐STATE AND 3‐STATE
RESIDUES (+/‐ 5 AMINO ACIDS) FOR THE HUMAN PROTEOME. ................................................................................ 91 FIGURE 5.5. KINASE PREFERENCES OF 3‐STATE RESIDUES FOR HUMAN AND MOUSE PROTEINS. .............................................. 92
XIII
List of Abbreviations
cSer: control Serine
cThr: control Threonine
DHFR: Dihydrofolate reductase
FN: False negative
FP: False Positive
My: Million years
PCA: Protein Complementation Assay
PTM: Post Translational Modification
PWM: Position Weight Matrices
StC: State-Conserved
StD: State-Diverged
SiD: Site-Diverged
WGD: Whole Genome Duplication
XV
For Marcello and Elodia
XVII
Acknowledgements
My first thought goes to my advisor, Christian Landry. He has always done all the possible (and, sometimes, even the impossible) to make me a better scientist and a better man.
I would also like to thank the members of my thesis and PhD committees: Prof. Pedro Beltrao, Prof. Yves Bourbonnais, Prof. Nicolas Derome and Prof. Sabine Elowe. Thanks to their suggestions the quality of my PhD and of this thesis have improved a lot.
I cannot forget to mention here all the members of the Landry and Aubin-Horth labs: François-Christophe Marois-Blanchet, Guillaume Diss, José-Francisco Torres-Quiroz, Isabelle Gagnon-Arsenault, Jean-Baptiste Leducq, Alexandre Dubé, Guillaume Charron, Andrée-Ève Chrétien, Francis Rousseau-Brochu, Marie Filteau, Samuel Rochette, Lou Nielly-Thibault, Mélissa Giroux, Jukka-Pekka Verta, Martha Nigg, Nadia Aubin-Horth, François-Olivier Gagnon-Hébert, Sergio Cortez-Ghio, Jennyfer Lacasse, Carole Di Poi and Lucie Grecias. They have been my family in these 5 years in Québec.
A lot of thanks to my lovely wife, Maryam. She have always been on my side during these months, in the good and in the tough moments.
I would also like to thank my parents, Elodia and Marcello, for all the efforts they have made during all these years to allow me following my dreams. What I have accomplished up to now and what I will accomplish in the future is not only my success, but also theirs.
Finally, I would like to spend a few words to say thank you to a special person that unfortunately has not been able to share with me the moments and the feelings of the PhD defence: my beloved grandmother Sara. Without her advice I would not be where I am now.
XIX
Foreword
This thesis is organized in 6 chapters including a general introduction and a general conclusion. Chapters 2, 3 and 4 have already been published as independent scientific articles. Chapter 5 will be submitted for publication to a scientific journal. Annex 4 includes a further paper whose subject is not directly connected with the main theme of this thesis.
Chapter 2 has been published as: Freschi L., Courcelles M., Thibault P., Michnick S.W. and Landry C.R (2011) Phosphorylation network rewiring by gene duplication, Molecular Systems Biology. 7:504
Chapter 3 has been published as: Diss G., Freschi L., Landry C.R (2012) Where do phosphosites come from and where do they go after gene duplication? Journal of Evolutionary Biology - special issue: Molecular Evolutionary Routes that Lead to Innovations 2012: 843167
Chapter 4 has been published as: Freschi L., Osseni M., Landry C.R (2014) Functional Divergence and Evolutionary Turnover in Mammalian Phosphoproteomes, PLoS genetics 10 (1), e1004062
Annex 4 has been published as: Freschi L., Torres-Quiroz F., Dubé A.K., Landry C. R (2013) qPCA: a scalable assay to measure the perturbation of protein-protein interactions in living cells, Molecular Biosystems. 9(1):36-43
The analysis of the results and the writing of the articles have been performed by L. Freschi, under the direction of C. R Landry.
For Chapter 2 C. Landry, M. Courcelles and P. Thibault performed the phosphoproteomics experiments. S. W. Michnick contributed reagents, tools and guidance on the phosphoproteomics experiments.
For Chapter 3 G. Diss participated to the analysis of the results and the writing of the manuscript and he is therefore co-first author in the article.
For Chapter 4 M. Osseni contributed building up the data set used in the study.
1
Chapter 1 - Introduction
1.1 - Post-translational modifications
In cells the blueprint for all cellular functions is stored in the DNA. However, the actual
effectors of the cellular functions are a myriad of different molecules, among which
proteins have a promiment role. The information flux follows this simple rule: the
information stored in the DNA is transcribed into RNA, an intermediate messenger
molecule and then translated into proteins by ribosomes (Crick, 1970). All these steps are
tighty regulated so that each protein is expressed in the right place at the right time. For
instance, transcription factors that bind to the regulatory regions of genes conribute to
define the expression level of genes (Brewster et al, 2014). Different RNA sequences have
different stabilities and are translated at different rates by ribosomes, and this affects both
protein abundance and protein folding (Schwanhausser et al, 2011). Finally, after
translation, proteins can be further modified by the addition of chemical groups
(Prabakaran et al, 2012). This additions are called post-translational modifications or
PTMs. Up to now more than 300 different PTMs that have been reported in the literature,
which affect more than 300,000 residues of proteins in prokaryotes and eukaryotes
proteomes (Khoury et al, 2011). The attached chemical groups may vary in size and span
from small groups like the methyl one to entire proteins like ubiquitin. Moreover, most
PTMs are reversible, meaning that the protein can shuffle between different states
(modified/non-modified) over time or internal conditions (Olsen et al, 2006). Notable
examples of PTMs include phosphorylation, the addition of a phosphate group to serines
and threonies residues of proteins, glycosylation, the addition of sugar moieties to several
amino acid residues and ubiquinitylation, the addition of ubiquitin to lysine residues. PTMs
are an important mechanism through which the cell gets a fine tuning of cellular functions
and we will now go into the details of this aspect.
2
1.2 - How PTMs regulate protein functions
PTMs can modify the properties of a protein in different ways : (i) they can activate or
deactivate one or more functions by determining a conformational change of the protein
(Sprang et al, 1988), (ii) they can allow protein-protein interactions by changing the bulk
charge at the interaction surface of the protein (Khmelinskii et al, 2009), (iii) they can
contribute determining the stability (Vazquez et al, 2000) and the half-life (Koivomagi et
al, 2011) of the protein and (iv) they can determine the localization of the protein (Madeo et
al, 1998). A spectacular example of the first of these scenarios is repesented by the
glycogen phosphorylase, an enzyme involved in glycogen metabolism (Livanova et al,
2002). This protein is present in the cell in two forms, named a and b. The a form of the
enzyme has a low catalytic activity while the b form is characterized by a high one. The
transition from the a form to the b form is made possible by a phosphorylation event on a
specific residue (Ser-14). This molecular event ultimately determines a conformational
change that has as a consequence an increased enzymatic activity of the protein, which
breaks down glycogen chains into glucose molecules that become available for cellular
catabolic processes. An example of how PTMs regulate protein-protein interactions is
represented by the proteins p53 and MDM2. p53 is a tumor suppressor protein that plays
important roles in angiogenesis, genomic stability and apoptosis (Levine & Oren, 2009).
p53 interacts with MDM2, a p53 specific E3 ubiquitin ligase, to form the p53-MDM2
complex (Moll & Petrenko, 2003) that prevents p53 activation in unstressed cells. The
phosphorylation of the Ser-106 residue of p53 under stress conditions inhibits the
interaction between these two proteins (Hsueh et al, 2013) ultimately contributing to p53
activation. PTMs can also affect the half-life of a protein. The attachment of several
ubiquitin units to lysine residues is a general mechanism for the cell to target proteins to
proteasome mediated degradation. An example of the importance of this mechanism is
provided by cell cycle regulation. Indeed, the progression on the cell cycle is made possible
by the expression and subsequent ubiquitin-mediated proteolysis of cyclins and CDK
inhibitors (Glotzer et al, 1991). Finally, PTMs can also affect the localization of proteins.
For instance the phosphorylation of two key serine residues (Ser-358 and Ser-361) of the
Ikaros protein, a hematopoietic-specific transcription factor, determines the re-localization
of this protein in the nucleus where it can promote the transcription of the genes involved in
3
lymphocite differentiation (Uckun et al, 2012). The different mechanisms described above
show how PTMs can change the properties of proteins in many different ways, thus
representing a versatile mechanism to tune and regulate protein functions.
1.3 - PTM regulatory networks
One of the most important characteristics of PTMs is that, in general, they are reversible
modifications of proteins. This means that the cell has to possess molecular mechanisms to
add and remove PTMs in order to be able to regulate proteins function. Further, these
mechanisms need to be specific, since different PTMs (e.g. phosphorylation and
ubiquitylation) occur at different residue types (phosphorylation occurs on serine, threonine
and tyrosine residues while ubiquitylation occurs on lysine residues). In the last decades we
started to unravel the molecular details of these mechanisms for many PTMs and to
understand the principles beyond them, the most well known example being protein
phosphorylation. For this PTM we now know that the phosphate groups are added by a
specific set of proteins called protein kinases and are removed by another specific set of
proteins called protein phosphatases. In the human genome there are about 500 protein
kinases (that correspond to the 2% of all human genes) and 200 phosphatases (Manning et
al, 2002). Indeed, for each PTM type (e.g. phosphorylation, acetylation, ubiquitylation,
etc.) there is a set of proteins, called writers, that can add the PTM to other proteins and
another set, called erasers, that can remove the PTM (Lim & Pawson, 2010). In order to
understand how the cell regulates its functions through PTMs we need to consider and
study the entire network composed by all PTMs, writers and erasers, also called PTM
regulatory network. At the moment we are still far from having a complete understanding
of the cell PTM regulatory network and even the most recent studies mostly focus on a
single PTM network or a few of them (Hunter, 2007).
4
1.4 - Cross-talk between PTMs
An important aspect that characterizes PTM regulatory networks is that they are not
independent from each other. Indeed, several studies reviewed in (Hunter, 2007) have
revealed that the presence of one PTM at one residue can interfere with the the addition of
other PTMs at the same residue or at adjacent residues. This interference is often referred as
cross-talk. Two types of cross-talk have been described in the literature: positive cross-talk
and negative cross-talk (Hunter, 2007). The term positive cross-talk refers to a situation in
which the presence of one PTM at one residue favours the addition or the removal of
another PTM. The term negative cross-talk, instead, describes a situation in which one
residue can potentially undergo two or more PTMs, so there is a direct competition between
the writers of the differents PTM networks to modify that residue. An example of positive
cross-talk between PTMs is represented by the phosphorylation-dependent ubiquitylation of
cyclin D1 (Lin et al, 2006). Cyclin D1 is a regulator of the CDK4/6 kinases. The cyclin
D1/CDK complexes trigger the G1/S transition through the cell cycle. However, during the
S-phase cyclin D1 has to be degraded. This event is primed by the phosphorylation of Thr-
286, which promotes the subsequent ubiquitylation of cyclin D1 by an E3 ligase, targeting
the protein for degradation. An example of the second type of cross-talk, negative cross-
talk, is represented by the protein p53. This tumor suppressor is stabilized by the
acetylation of multiple specific lysine residues at the C-terminus (Lys-370, Lys-371, Lys-
372, Lys-381, Lys-382) (Li et al, 2002). The acetylation of these residues impedes their
ubiquitylation by MDM2, thus contributing to stabilize p53 and increase its half-life. These
examples show that PTM regulatory networks are interconnected and that in order to
understand how the whole cellular PTM regulatory network works, we need to take into
account and study the cross-talk between PTMs.
1.5 - PTM networks and the evolution of biological complexity and
diversity
The organisms that populate our planet have evolved from simpler ones through
evolutionary trajectories determined by natural selection and PTM regulatory networks
5
appearance and complexification is thought to have a role in the emergence of biological
complexity and diversity. A notable example that supports this scenario has been revealed
by recent studies about the evolution of tyrosine phosphorylation. This PTM regulatory
network, indeed, is the result several evolutionary steps that started more than a billion
years ago in a single-cell eukaryotic organism. Pincus and collaborators (Pincus et al, 2008)
shed light on these evolutionary steps. Limited tyrosine phosphorylation calalyzed by
Ser/Thr kinases cross-phosphorylation was observed in premetazoan organisms.
Premetazoan organisms also possessed a reduced set of erasers for tyrosine phosphorylation
(tyrosines phosphatases). However, the complete PTM network that included the set of
writers (tyrosine kinases) was only observed in metazoans and choanoflagellates. The
emergence of the set of writers on metazoans was also associated to an expansion of the set
of erasers. The emergence of the tyrosine phosphorylation PTM network is thought to have
had a huge impact for the emergence of multicellular organisms (metazoa), since tyrosine
phosphorylation is a key component of the molecular machinery that allows cell-cell
communications. This example shows how by studying the evolution of PTM networks we
can understand the basis of organismal complexity and diversity.
1.6 - The technological advancements of the last decade make possible the
study of PTM networks
The advancements in mass-spectrometry, genomics, biochemistry and bioinformatics of the
last decade allowed us to have an unprecedent set of tools to study PTM networks. While
classic techniques that rely on antibody-based Western blot analysis are still useful to detect
specific PTM events, the development of protocols that allow to enrich the sample for
peptides carrying a PTM of interest coupled to mass spectrometry (Zhao & Jensen, 2009)
have allowed to determine for the first time proteome-wide PTM profiles at high-
throughput and in particular those of model organisms like yeast (Gruhler et al, 2005; Holt
et al, 2009; Li et al, 2007), C. elegans (Zielinska et al, 2009), mouse (Huttlin et al, 2010)
and human (Sharma et al, 2014). Further, techniques like peptide arrays (Chen & Turk,
2010) have allowed us to explore the specifity of the writers and provided the basis for
6
determining writer-site associations. These associations allowed us to recostruct the
topology and the organization of some PTM networks. The results of these studies have
also lead to the developmement of algorithms (e.g. (Miller et al, 2008)) that can predict
PTM sites on proteins or associations writer-site. While each one of these tools and
techniques have several limitations, they allowed us to investigate for the first time whole
PTM networks at high-throughput.
1.7 – Aims of this thesis
The general aim of this thesis is to study the evolution of PTM regulatory networks in order
to understand how they rewire over time and in different organisms and how they cross-talk
to each other. Sheding light on these aspects of PTM networks will allow us to achieve a
better understanding of (i) what are the molecular mechainisms that contribute to increase
biological complexity, (ii) what are the molecuar basis of species divergence and (iii)
improve our knowledge about how the cell integrates different signals to take decisions.
We will now review more in detail the specific questions addressed in each chapter.
Chapter 2 addresses the question of how eukaryotic PTM networks are rewired after gene
duplication. Gene duplication is a mechanism that provides raw genetic material that can be
shaped by evolution and it is thought to be one of the mechanisms at the origin of
organismal complexity. By using budding yeast as model system we study to which extent
gene duplication changed the phosphoregulatory network of this model organism. We
chose this PTM network because it is the one for which we have the most complete data. In
this chapter we also investigate the molecular mechanisms involved in the rewiring of this
phosphoregulatory network and we discuss how these mechanisms may have contributed to
increase its complexity.
In Chapter 3 we further develop the analyses on the evolutionaries trajectories of yeast
phosphorylation sites after the duplication event (Chapter 2) and we study how some of
these trajectories that imply the loss of phosphorylation sites may actually contribute to
complexify the cellular regulatory network. We then discuss these results in the context of
7
how gene duplication may lead to biological innovations.
In Chapter 4 we address the question of how a mammalian PTM regulatory network has
been rewired by evolution, by comparing the mouse and human phosphoproteomes. In this
case also we focus on this regulatory network because it is the best known one. Comparing
the PTMs of human and mouse represents an important step to both understand the
regulatory differences between these species and, more in general, the molecular basis of
species divergence.
Finally, in chapter 5 we study the cross-talk of two mammalian PTM regulatory networks
that share the same target residues, the phosphorylation and O-GlcNAcylation PTM
networks (the target residues being Ser and Thr), in mouse and human. While examples of
cross-talk between these two PTMs had already been reported in the literature, a global
assessment of the cross-talk between these two PTM networks is not available yet. In our
analysis we first find evidence for a global the cross-talk between these two PTM networks
and we then determine if phosphorylation and O-GlcNAcylation could act as two molecular
switches that regulate a single function or two molecular switches for two functions. By
answering these questions we can understand some of the mechanisms by which different
PTM networks interact with each other allowing the cell to integrate different signals and to
take decisions.
9
Chapter 2 - Phosphorylation network rewiring by gene duplication
Published on: Freschi L., Courcelles M., Thibault P., Michnick S.W. and Landry C.R
(2011) Phosphorylation network rewiring by gene duplication, Molecular Systems Biology.
7:504
10
2.1 – Résumé
Pour comprendre comment des réseaux de régulation complexes se sont assemblés au fil de
l’évolution, nous avons besoin d’avoir une compréhension détaillée des dynamiques qui
suivent les événements de duplication des gènes, entre autres les changements des profils
des modifications post-traductionnelles. Nous avons comparé le profil de phosphorylation
des protéines paralogues de la levure bourgeonnante à celui d’une espèce qui a divergé de
Saccharomyces cerevisiae avant que l’événement de duplication des gènes se soit produit.
Nous avons trouvé que 100 millions d’années de divergence après l’événement de
duplication sont suffisants pour déterminer que la majorité des sites de phosphorylation
soient perdus ou gagnés par un paralogue ou l’autre, avec une forte tendance pour les
pertes. Toutefois, certaines pertes peuvent être en partie compensées par l’évolution
d’autres sites de phosphorylation, étant donné que les paralogues tendent à préserver le
même nombre de sites au fil du temps. Nous avons aussi trouvé qu’environ 50% des
relations kinase-substrat peuvent avoir changé durant cette période. Nos résultats suggèrent
qu’après la duplication, les protéines tendent à subir des événements de
subfonctionnalisation au niveau des modifications post-traductionnelles. De plus, même
lorsque les sites de phosphorylation sont conservés au cours de l’évolution, il y a une
rotation des kinases qui phosphorylent ces sites.
11
2.2 - Abstract
Elucidating how complex regulatory networks have assembled during evolution requires a
detailed understanding of the evolutionary dynamics that follow gene duplication events,
including changes in post-translational modifications. We compared the phosphorylation
profiles of paralogous proteins in the budding yeast Saccharomyces cerevisiae to that of a
species that diverged from the budding yeast prior to the duplication of those genes. We
found that 100 million years of post-duplication divergence are sufficient for the majority
of phosphorylation sites to be lost or gained in one paralog or the other, with a strong bias
towards losses. However, some losses may be partly compensated for by the evolution of
other phosphosites, as paralogous proteins tend to preserve similar numbers of phosphosites
over time. We also found that up to 50% of kinase-substrate relationships may have been
rewired during this period. Our results suggest that after gene duplication, proteins tend to
subfunctionalize at the level of posttranslational regulation and that even when
phosphosites are preserved, there is a turnover of the kinases that phosphorylate them.
12
2.3 – Introduction
Genomes and organisms gain in complexity during evolution by gene duplication followed
by the functional divergence of the duplicates (Hurles, 2004). Signalling and regulatory
proteins are thought to play a particularly important role in the evolution of organismal
complexity (Gough & Wong, 2010). We know very little about the early evolutionary
steps that follow the duplication of regulatory proteins and of the substrates they regulate.
Studies on short time scales and on well-characterized organisms are needed in order to
estimate the contribution of the different evolutionary forces to the assembly of novel
regulatory pathways and networks.
Here we address the evolution of phosphoregulatory networks by directly studying
phosphoproteins and their associated protein kinases. Protein phosphorylation regulates
several if not most of protein functions by affecting their stability, localization, activity and
ability to interact (Moses & Landry, 2010). When maintained, paralogous proteins may
diverge in function following two evolutionary paths, which are not mutually exclusive.
First, one paralog may evolve new functions (neofunctionalization) (Conant & Wolfe,
2008). Second, degenerative mutations may accumulate in one or both paralogs leading to
the loss of redundant functions (subfunctionalization) (Force et al, 1999; Lynch & Force,
2000). If we assume a model under which each phosphosite in a protein has a function
(Holmberg et al, 2002), neofunctionalization would correspond to sites acquired after the
duplication event and subfunctionalization to sites lost in one of the two paralogs. In the
first case, new connections are created in the kinase-substrate network; in the second case,
no new function has evolved and regulatory links are lost rather than created. We (Landry
et al, 2009) and others (Lienhard, 2008) have recently suggested that a fraction of
phosphorylation sites may have no specific functions and represent the result of kinase-
substrate interactions that evolved neutrally or nearly neutrally. Accordingly, a fraction of
the links that are created or lost after gene duplication in these networks would represent
gains and losses of phosphosites without sub- or neofunctionalization of the proteins.
In this study we used the budding yeast Saccharomyces cerevisae phosphorylation network
as a model. The lineage leading to the budding yeast underwent a whole genome
13
duplication (WGD) 100 million years (My) ago (Wolfe & Shields, 1997) that affected its
signalling networks significantly: while only 10% of all genes (~500 pairs) were
maintained as duplicates, 30% and 33% of protein kinases and phosphatases have been
retained as duplicates respectively (Seoighe & Wolfe, 1999). Furthermore,
phosphoproteins were significantly more likely to be retained as paralogs than
nonphosphorylated proteins (Amoutzias et al, 2010). Finally, duplicated kinases and their
regulatory proteins differ in sequence and functions (Musso et al, 2008) and many of them
show accelerated amino acid changes after the WGD (Kellis et al, 2004). Using
computational and experimental analyses, we examined the extent to which phosphosites
diverged after gene duplication, we addressed whether there have been accelerated gains
and losses of phosphosites among these phosphoproteins and whether kinase-substrate
relationships have been modified since the WGD.
2.4 - Materials and Methods
We compiled a set of 20342 phosphorylation sites on 2688 proteins from 8 large-scale
studies using 21068 phosphopeptides from 6 studies (Albuquerque et al, 2008; Bodenmiller
et al, 2007; Chi et al, 2007; Gruhler et al, 2005; Li et al, 2007; Reinders et al, 2007), as
compiled by Amoutzias et al. (Amoutzias et al, 2010) to which we added 3616
phosphopeptides from Beltrao et al. (Beltrao et al, 2009) and 3620 phosphopeptides from
Gnad et al. (Gnad et al, 2009). Raw phosphopeptides from these studies were filtered
according to the following criteria: for the Gnad dataset, we considered peptides with a
probability score above 0.95; for the Beltrao dataset we selected the peptides with score
greater than 0.02 and not being acetylated at the amino or carboxy terminus; then for all
datasets we selected all the peptides that matched one exact hit on S. cerevisiae proteins
using Blat searches (Kent, 2002). Peptides that matched more than one protein were
eliminated because they could not be assigned unambiguously to a single protein. We used
this data to assemble a first dataset. Thus, we compiled another dataset using the same data
about the phosphosites, but this time we did not apply the filtering step with Blat. To our
knowlegde these data sets of phosphorylation sites are the most comprehensive ones
currently available. Finally, we compiled a third dataset of manually curated phosphosites
14
that have been shown to be phosphorylated in small scale experiments and whose function
has been determined (Ba & Moses, 2010). The compiled data and all the other data
described below are available at: http://www.bio.ulaval.ca/landrylab/download/.
We estimated the state-divergence of phosphosites between paralogous proteins by
comparing cross-study conservation and reproducibility. Our data set comes from 8 distinct
studies, so there are 28 possible pairwise comparisons. We only considered sites that were
S/T in both paralogs. For each pair of studies we considered 2 sets of concatenated
paralogous proteins, para.1 and para.2. We counted the number of sites found in para.1 in
study 1 and examined how many were also found in para.1 in study 2 (cross-study
reproducibility) and para.2 in study 2 (cross-study conservation) (Annex1, Figure S2.1).
We did the same comparison for these two studies between sites identified in para.2 of
study 1 and also in para.2 of study 2 (cross-study reproducibility) and of para.1 of study 2
(cross-study conservation). Each pair of studies therefore yields two ratios of cross-study
conservation/cross-study reproducibility and this ratio gives a measure of the extent of
conservation between paralogs while taking into account the reproducibility of the two
studies.
State conservation ≈ cross-study conservation/cross-study reproducibility
State conservation ≈ (Study.1 para.1 Study.2 para.2)/Study.1 para.1
(Study.1 para.1 Study.2 para.1)/Study1. para1
A regression of the cross-study conservation on the cross-study reproducibility provides a
rough estimate of the state-conservation between paralogs while taking reproducibility into
account (Figure 2.1A).
15
Local phosphosite turnover was tested as follows. We took all the pairs of WGD
phosphoproteins where both paralogs had one or more phosphosites. For each phosphosite
present in the first paralog, we examined a window of length l centered on the site, thus
defining a range of positions along the sequence. Excluding all state-conserved sites (at the
exact same position), we counted all the phosphosites present in the aligned second paralog
inside the corresponding range of positions within a window. A site was conserved if for a
given phosphosite in the first paralog there was at least one phosphosite in the second
paralog inside the range of positions. We then determined the ratio of conserved sites over
all sites for each window size. The random expectation was estimated using 100
randomizations of phosphosites as described below.
The Position Weight Matrices (PWM) used for the prediction of the protein kinases
associated with each of the phosphosites were derived empirically by Mok et al. (Mok et al,
2010) through in vitro peptide screening using 61 of the 122 kinases from S. cerevisiae.
While this data is incomplete, it is the best currently available as it relies on empirically
derived consensus motifs rather than completely in silico predictions. In order to assign all
of the phosphosites to their most likely corresponding kinases, we extracted all of the 15-
mers of the yeast proteome that correspond to the phosphosite and their 14 flanking (±7)
residues. All phosphosites were then scored by summing the logarithm of the values
present in each kinase PWM matrix corresponding to each of the amino acids of the 15-
mer. We then assigned a protein kinase to a particular site based on the highest score for
that site (Annex1, Figure S2.2). Data on kinase-substrate interactions were obtained from
Ptacek (Ptacek et al, 2005) and Ubersax (Ubersax et al, 2003). In the first case the data
represents microarray interactions between 87 different kinases and more than 4000
potential substrates. We estimated the fraction of paralogs that were phosphorylated by the
same kinase, considering only paralogs that were both phosphorylated by at least one
kinase. The second data comes from an in vitro experiment testing for interactions between
Cdc28 and the yeast proteome. We calculated the number of times both paralogs were
phosphorylated by the kinase among all cases where at least one of the two was
phosphorylated.
16
Gains and losses of phosphosites were inferred as described in Figure 2.2A. We estimated
the expected numbers of gains and losses by randomly sampling S/T sites. We divided the
phosphosites in the four classes according to the type of the residue (S or T) and the type of
region where the residue was located (ordered or disordered), and the representation of each
class was respected in the resampling. Disordered regions of proteins were predicted using
DISOPRED (Ward et al, 2004b) using all the fungal protein sequences as a reference
database. We performed a random sampling of S/T positions 1000 times, calculating the
number of gains and losses after each resampling. The ancestral residues occupying the
phosphosite position were determined as follows. We aligned all of S. cerevisiae genes to
the Lachancea kluyveri and Zygosaccharomyces rouxii orthologs, these two species having
diverged from the S. cerevisiae lineage prior to the whole-genome duplication. All the
sequences and the orthology relationships were obtained from YGOB (Gordon et al, 2009)
and alignments performed with MUSCLE (Edgar, 2004) using default parameters.
Orthology relationships were found for 4401 genes (among which 516 out of 553 of S.
cerevisiae paralogous genes). For each quartet of sequences, we inferred the ancestral
sequence at the first node joining the two paralogs (Figure 2.2A). The ancestral protein
sequences were inferred using the codeml method implemented in PAML (Yang, 2007)
using the following parameters (fix_alpha = 0, alpha = 0.04, fix_blength = 2). We
reconstructed ancestral sequences using two different substitution matrices (wag and
dayhoff) and both gave similar results so we are presenting only results derived from the
wag matrix. We examined the robustness of the reconstruction by performing the same
analyses including an additional pre-WGD species (K. thermotolerans) to our set. In this
case we were able to reconstruct the orthology relationships and the ancestral sequence for
4388 genes (among which 516 out of 551 of S. cerevisiae paralogous genes) (Dataset 4).
All analyses were performed using Perl (http://www.perl.org) and R (http://www.r-
project.org/) scripts.
The Lanchacea kluyveri phosphosites were identified as follows. L. kluyveri (formerly
known as Saccharomyces kluyveri) strain FM628 (MATa ura3) was obtained from Marc
Johnston (Washington University). Pre-cultures of 75 ml were grown to OD600 ~ 3
overnight in standard yeast YPD medium at 30°C, agitated at 600 rpm and diluted to OD600
17
= 0.1 in the morning in 1L of YPD. Cells were harvested at OD600~0.6-0.8 by
centrifugation at 4,000 rpm for 20 minutes. The pellet (about 2-3 grams) were suspended in
20 ml of lysis buffer following (Albuquerque et al, 2008) with slight modifications: 50 mM
Tris-HCl (pH 8.0), 150 mM NaCl, 0.2% NonidetP-50, 1.5 mM MgCl2, 0.2 mM EDTA.
The lysis buffer also contained phosphatase inhnibitors phosSTOP (Roche), protease
inhibitors, complete protease cocktail (Roche) and 1 mM PMSF. Samples were quickly
frozen directly in liquid nitrogen drop-by-drop to make 1cm3 frozen pellets and conserved
at -80 °C. Yeast powder extracts were then produced using a Frezzer-Mill (Spex
SamplePrep), which pulverizes cryogenically small pellets with a magnetically driven
impactor submerged in liquid nitrogen. The fine powder was then centrifuged at 14,500
RPM (rotor SA600) for 30 minutes at 4°C. The clear supernatant was treated with
Benzonase (Novagen) to eliminate nucleic acids overnight at 4°C and then cold acetone
precipitated.
Protein pellets were resuspended in 1% SDS/50 mM ammonium bicarbonate (AB) and
microBCA (Pierce) was used to determine protein concentration. Proteins extracts (1 mg)
were reduced for 20 min at 37°C with 0.5 mM Tris (2-carboxyethyl)phosphine,
TCEP(Pierce), alkylated with 50 mM iodoacetamide for 20 min at 37°C and quenched by
adding 50 mM DTT. Samples were diluted 10x with 50 mM AB, digested overnight at
37°C with sequencing grade trypsin (enzyme:substrate, 1:100) (Promega). The digestion
was stopped by adding trifluoro acetic acid (TFA) and was followed by evaporation on a
SpeedVac (Thermo Fisher Scientific, San Jose, CA). Phosphopeptides were enriched on
home-made TiO2-affinity columns (1.25 mg Titansphere, 5 µm, GL Sciences), using 250
mM lactic acid (Fluka) and eluted with 30 µl of 1% ammonium hydroxide, as described
previously (Thingholm et al, 2006). Samples were acidified with 1 µL of TFA, desalted
using 30 mg HLB cartridge (Waters Corporation, Milford, MA), dried and resuspended in
2% acetonitrile, can (Thermo Fisher Scientific), 0.2% formic acid, FA (EMD Chemicals
Inc., Gibbstown) prior to analysis.
Triplicate 2D-nanoLC-MS/MS analysis of phosphopeptides was performed on an LTQ-
Orbitrap XL mass spectrometer (Thermo Fisher Scientific) coupled to an Eksigent LC
18
system. Online SCX separation (Opti-Guard 1mm cation column, Optimize Technologies)
was performed using five different ammonium acetate salt fractions, pH 3.0 (0, 250, 500,
1000 & 2000 mM) in 2% ACN (0.2% FA). Peptides eluted from each salt fraction were
transferred to a pre-column reverse phase trap (4 mm length, 360 µm i.d.) and injected on a
reverse phase analytical column (10 cm length, 150 µm i.d.) (Jupiter C18, 3µm, 300 Å,
Phenomenex). A linear gradient (2 to 25% ACN over 63 min followed by 25 to 40% ACN
over the next 15 min) was applied to separate phosphopeptides, which were directly
injected into the mass spectrometer at a flow rate of 600 nL/min. Detailed MS operation
procedure is described in (Marcantonio et al, 2008). Mascot Distiller v2.1.1 (Matrix
Science, London, UK) was used to extract and preprocess MS/MS spectrum from raw data
file. Peptide identification was done with Mascot v2.2 using Lachancea (Saccharomyces)
kluyveri protein sequence database (http://www.ebi.ac.uk/embl/). The following parameters
were used: parent and fragment tolerance of 0.02 Da and 0.5 Da respectively, trypsin with 2
missed cleavages and the following modifications: carbamidomethyl (C), deamidation
(NQ), oxidation (M), phosphorylation (STY). ProteoConnections (Courcelles et al, 2011)
was used to limit peptide false discovery rate to 1% and evaluate the confidence of
phosphorylation site localisation. MS/MS of all peptide identifications are available at
http://www.thibault.iric.ca/proteoconnections. Phosphosites with a confidence score above
60% were considered for the evolutionary analyses (711 sites in 396 proteins).
2.5 – Results and discussion
2.5.1 - Paralogous phosphoproteins substantially diverged after WGD
Our dataset consists of 2726 phosphosites (serines (S), 82%; threonines (T), 16%; tyrosines
(Y), 2%) that belong to one or the other member of the 352 pairs of yeast WGD paralogs
for which at least one of the two proteins is a phosphoprotein. In this work we focused on
S/T phosphosites as they make up 98% of all phosphosites. Among these sites, 2445 are
unique to one paralog and 118 (that correspond to 236 phosphosites) occur at homologous
positions, a number 7.4 times higher than expected by chance (P<< 0.001, Annex 1, Figure
19
S2.3). Phosphosites diverge in two ways. First are cases where a S/T residue is
phosphorylated in a protein and a residue that cannot be phosphorylated occupies the
homologous position in its paralog (site-divergence). Site-divergence accounts for 69% of
the sites that are unique to one paralog. Second, a S/T is phosphorylated in one paralog and
its homologous position is conserved (S/T) but not observed to be phosphorylated (state-
divergence). Eighty-six percent of homologous sites that are phosphorylated are in fact
state-diverged. This measure of state-divergence is strongly upwardly biased by false
negative (FN) and false positive (FP) identifications and also by the fact that
phosphopeptides that match more than one protein are not included in this dataset. We
considered these issues by comparing the cross-study conservation with the cross-study
reproducibility. We found that state-conservation between paralogs is around 36% for
filtered peptides (considering only phosphopeptides that match a single position in the
proteome) and 54% for unfiltered peptides (considering all phosphopeptides) (Figure
2.1A). Protein sequence, function, localization and/or recognition by protein kinases have
diverged to such extent in 100 My that only 36-54% of their post-translational regulation
by phosphorylation appears to be conserved despite a conservation of the actual residues.
20
21
Figure 2.1. Conservation and divergence of phosphoregulation among WGD paralogs.
(A) The state-conservation of paralogous proteins was estimated as a regression of the cross-study
conservation on the cross-study reproducibility. A 1:1 relationship is expected if all phosphosites
were state-conserved. Deviation from this 1:1 relationship provides an estimate of state divergence.
Filtered data: phosphopeptides that match a single protein; unfiltered data: all phosphopeptides. (B)
Positive correlation in the number of phosphosites of WGD paralogous proteins. Red dots indicate
average numbers in binned data and green dots the actual data. Green intensities indicate the
number of points at these positions. (C) Proportion of paralogous pairs with significant conservation
as a function of the window size considered. A site is considered conserved if there is a
phosphorylated site in the other paralog within the window (excluding the exact position). (D) Case
of putative local compensation. The fraction of conserved sites as a function of window size is
shown. Blue: observed value; Grey: 95th quantile (100 permutations); Red: average of the expected
distribution. (E) Fraction of paralogous phosphosites or phosphoproteins assigned to the same
protein kinase. Assignments are based on PWMs from (Mok et al, 2010). The observed fraction is
calculated using these assignments while the expected fraction is estimated after shuffling the
assigned kinases among the pairs of paralogous sites. Ptacek: large-scale in vitro kinase-substrate
interactions on microarrays (Ptacek et al, 2005). Ubersax: in vitro Cdc28-substrate interactions
(Ubersax et al, 2003). (F) Distributions of the PWM scores for different classes of sites.
2.5.2 - Conservation and compensation of phosphosite loss by site-position turnover
Surprisingly, despite the low level of site-conservation between paralogous proteins, there
is a highly significant correlation in the number of phosphorylation sites between paralogs
(rho = 0.35, p-value < 2.2×10-16; Figure 2.1B). This correlation remains significant when
the number of phosphosites is normalized by protein length (rho = 0.32 p-value < 6.9×10-
14) or the length of disordered regions (rho = 0.27 p-value < 3.8×10-10), which both tend to
be preserved between paralogs. The correlation is also significant when only site-diverged
phosphosites are considered (rho = 0.28, p-value = 2.0 ×10-11). This correlation suggests
that stabilizing selection is acting to maintain the overall number of phosphosites. This
result is in agreement with a recent study (Beltrao et al, 2009) reporting that the
phosphorylation levels of orthologous protein complexes or pathways between Candida
22
albicans and S. cerevisiae tend to be conserved. The turnover of phosphosite position over
time could be made possible by the fact that sites that appear at a position nearby a site that
is lost can compensate for the loss (Serber & Ferrell Jr, 2007), particularly when the charge
of a region rather than that of a particular residue is important. The redundancy in the
position of phosphosites has been previously proposed to explain the weak site-
conservation among species (Landry et al, 2009), but so far there has been limited evidence
for this (Ba & Moses, 2010; Moses & Landry, 2010).
If this local turnover model is responsible for the overall conservation of the number of
phosphosites, the proportion of conservation between paralogs should increase significantly
if we consider regions of proteins rather than actual positions. We found that to be the case
for a significant but limited number of paralogous pairs. We re-considered the proportion of
state-conserved sites as the proportion of sites in a protein that have a phosphosite in the
homologous region of a given window size in its paralog. We first found that the window
size that maximizes the signal is about 33 amino acids in length (Figure 2.1C). Then, we
found that among the 167 pairs of paralogous proteins where both paralogs have at least
one phosphosite, 11 of them (6.6%) showed a significant level of conservation at that
window length (an example is shown in Figure 2.1D). This result may suggest either that
compensation by near-by sites is relatively uncommon and is specific to some types of
proteins, or that the relatively limited coverage of the yeast phosphoproteome leaves us
with limited power to detect significant compensation. Another possibility is that such
compensation takes place only in highly phosphorylated proteins. Indeed, we found that
paralogous pairs for which there is significant functional compensation have significantly
more phosphosites (mean: 9.28 vs. 3.87; Wilcoxon test: p-value < 9.5×10-11) and also tend
to contain a larger proportion of disordered residues (mean: 53% versus 42%, p = 0.01)
compared to all pairs.
2.5.3 - Life after WGD: rewiring the cellular regulatory networks
Phosphosites are phosphorylated by a variety of kinases that recognize specific motifs
surrounding the phosphosites. As for many eukaryotes, around 2% (120 total) of yeast
23
protein-coding genes code for protein kinases (Zhu et al, 2000). We examined the
conservation of the relationships between our set of phosphosites and yeast kinases by
assigning each phosphosite to a kinase using empirically derived Position Weight Matrices
(PWM) for 61 yeast kinases (Data set S1 from (Mok et al, 2010)). We first found that
WGD paralogs are generally not biased in terms of the protein kinases that regulate them
(rho = 0.99, p-value < 2×10-16, Annex 1, Figure S2.4). Secondly, we found that state-
conserved sites are assigned to the same kinase 44% of the time, a twenty fold increase
over what is expected if phosphosites were randomly matched between paralogs (p-value <
0.0001; Figure 2.1E). This number drops to 23% for state-diverged sites, again supporting
the fact that state-divergence does not entirely result from FN identifications. These sites
are either not being phosphorylated or being phosphorylated by a different kinase in a
different condition not addressed so far in phosphoproteomics studies. This first hypothesis
is supported by the fact that, for state-diverged sites, the assigned scores are significantly
higher for the phosphorylated sites than the non-phosphorylated ones (Figure 2.1F). We
estimated that the state-diverged nonphosphorylated S/T sites in reality comprise 50% of
nonphosphorylated sites (Annex 1, Figure S2.5).
The low percentage of assignment (44%) of the same kinase to state-conserved sites
suggests that the kinases that phosphorylate paralogous sites have changed since the WGD
(Moses & Landry, 2010). We found independent support for this from large scale and
small-scale kinase-substrate interaction experiments (Ptacek et al, 2005; Ubersax et al,
2003) in which kinase-substrate relationships are also conserved in similar proportions
(Figure 2.1E). Overall, these analyses suggest that while a significant fraction of sites is
conserved and phosphorylated in both paralogs, the flanking sequences and/or protein
structure and/or localization have diverged enough for the substrate to be regulated by a
different protein kinase, a regulatory network turnover that is similar to what is observed
for transcriptional networks (Gasch et al, 2004; Moses & Landry, 2010). After 100 My of
evolution, up to 50% of kinase-substrate relationships may be rewired, while preserving the
phosphorylation status of the substrates.
24
2.5.4 - Phosphosite loss dominates site-divergence
A recent study on the budding yeast reported putative cases of neo- and
subfunctionalization of phosphosites (Amoutzias et al, 2010), but did not compare the
extent of those changes to a null model. We therefore sought to quantify whether site
divergence resulted from losses or gains of phosphosites by reconstructing the ancestral
sequences of the paralogous proteins and comparing the observed proportions to the neutral
expectations (Figure 2.2A, 2.2B & 2.2C). We found that 25% of sites correspond to gains
and 31% of sites correspond to losses. These proportions are, respectively, significantly less
and more than expected by chance alone, based on the resampling of phosphorylatable sites
in the same set of phosphoproteins (Figure 2.2C). This remains true for ordered and
disordered regions of proteins, which have been shown to evolve at different rates. We
consider that these losses represent several subfunctionalization events as non-functional
phosphosites (Landry et al, 2009) are expected to evolve as randomly selected S/T. These
results are also unlikely to result from false positives, as we performed the same analyses
on a smaller number of manually curated phosphosites (Nguyen Ba & Moses, 2010);
Annex 1, Figure S2.6) and we observed similar results. Our results are also robust to data
filtering (Annex1, Figure S2.7) and variation in ancestral sequence reconstruction (Annex
1, Figure S2.8).
25
Figure 2.2. Gains and losses of phosphosites after gene duplication.
(A) Inference of gains and losses of phosphosites. Serines (S) and threonines (T) are considered
equivalent with respect to phosphorylation. !S/!T indicates residues that are not a S nor a T, and
pS/pT indicates phosphorylated S/T . (B) Examples of lost (S72), gained (S121) and conserved site
(S103) from the curated dataset (Dataset 2). (C) The number of observed losses is greater than
expected by chance alone and the number of gains shows the opposite result. Results in ordered and
disordered regions agree with each other.
26
A limitation of this analysis is that we have to assume that the phosphorylatable sites (S/T)
of the ancestral sequence that correspond to phosphorylation sites in S. cerevisiae were
phosphorylated in the ancestor. Only a direct observation of the phosphorylation state of
the ancestral proteins would alleviate this problem. We therefore performed a
phosphoproteomics experiment on Lachancea kluyveri (Souciet et al, 2009), a species that
diverged from S. cerevisiae before the WGD event and that can be used as a proxy for
ancestral functions (van Hoof, 2005). We identified 855 phosphosites on 429 proteins
(Annex 1, File S2.1) that we mapped on our alignments. We found that a smaller
proportion of phosphosites identified in L. kluyveri are also phosphorylated in the S.
cerevisiae WGD paralogs (1:2) compared to the 1:1 S. cerevisiae orthologs (Figure 2.3A).
Figure 2.3. L. kluyveri phosphoproteomics confirms that phosphosites are preferentially lost in
paralogous phosphoproteins.
(A) L. kluyveri phosphosites are more likely to be phosphorylated in S. cerevisiae if they are in 1:1
orthologs (142/469 sites in 108 proteins) than in 1:2 orthologs (31/181 sites in 45 proteins). (B)
27
Ratios of the number of sites unique to S. cerevisiae to the number of shared ones with L. kluyveri
for 1:1 orthologs (142/6644 sites in 108 proteins) and 2:1 orthologs (62/2681 sites in 45 proteins).
Assuming that the rate of phosphosite gain in the L. kluyveri lineage was similar in these
two categories of genes (1:1 and 1:2 L.kluyveri-S. cerevisiae orthologs), this result confirms
that phosphosites were more likely to be lost in the S. cerevisiae WGD paralogs and thus
that gene duplication has significantly accelerated the rate of phosphosite divergence. We
also found that the proportion of sites that are uniquely phosphorylated in S. cerevisiae (not
found to be phosphorylated in L. kluyveri) in the WGD paralogs is actually comparable to
the one for the 1:1 orthologs (Figure 2.3B). Under a scenario where phosphosite gains
accelerated the divergence of the WGD paralogs, we would have expected to see a
significantly higher fraction of gains for the 2:1 orthologs compared to the 1:1 ones. Our
phosphoproteomics results therefore support our bioinformatics analyses based solely on
ancestral sequence reconstruction and confirm the prevalence of phosphosite losses in the
divergence of paralogous phosphoproteins.
2.6 - Conclusion
A previous study considering the ancestral function of duplicated WGD proteins has shown
the importance of subfunctionalization in shaping the function of WGD paralogs acting at
the level of protein functions (van Hoof, 2005), whereas investigations of transcriptional
regulation have also found a significant contribution of neofunctionalization in the
divergence of paralogs (Papp et al, 2003; Tirosh & Barkai, 2007). Our results suggest that
at the level of post-translational regulation, subfunctionalization may have been the most
important driving force in shaping the yeast regulatory network. One limitation of our
analysis is that we consider that, when functional, each phosphosite has an independent
function, which may not be necessarily the case, as several cooperative effects among
phosphosites have been reported (Kapoor et al, 2000). The combined and individual effects
of the sub- and neofunctionalized sites will need to be addressed experimentally to estimate
the functional effects of these divergences. Further integrative analyses will also be
required to elucidate the importance of neo- and subfunctionalization that take place at
28
multiple levels (transcription, protein function, PTMs), as these may be largely dependent
on each other (Jensen et al, 2006). Another key finding of our study is that 100 My may be
sufficient to rewire half of the kinase-substrate relationships in a cell. This result is in
agreement with the idea that protein-protein interaction networks evolve rapidly. In about
300 My of evolution, half of all the interactions are supposed to be replaced by new
interactions (Wagner, 2001b).
2.7 - Acknowledgements
We thank H. Wurtele and A. Verreault for the use of their facilities and N. Lartillot and A.
Moses for comments on the manuscript. This work was supported by a Canadian Institute
of Health Research (CIHR) grant GMX-191597 and FRSQ to C. R Landry. C. R Landry is
a CIHR New Investigator. L. Freschi was supported by a Quebec Research Network on
Protein Function, Structure and Engineering (PROTEO) fellowship.
29
Chapter 3 - Where Do Phosphosites Come from and Where Do They Go after Gene Duplication?
Published on: Diss G., Freschi L., Landry C. R (2012), Where do phosphosites come from
and where do they go after gene duplication? Journal of Evolutionary Biology - special
issue: Molecular Evolutionary Routes that Lead to Innovations 2012: 843167.
30
3.1 – Résumé
La duplication des gènes, suivie par la divergence, est un mécanisme important pour
promouvoir des innovations au niveau moléculaire. Si la divergence au niveau de la
régulation transcriptionnelle est bien documentée, nous ne connaissons pas beaucoup de
détails à propos de la divergence causée par les modifications post-traductionnelles
(PTMs). Ici, nous testons si des gains et des pertes d’acide aminés phosphorylées après la
duplication des gènes peuvent modifier de façon spécifique la régulation de ces protéines
dupliquées. Nous montrons que lorsqu’un site de phosphorylation est perdu dans un
paralogue, les transitions vers les acides aminés chargés négativement (qui peuvent mimer
l’état phosphorylé de façon constitutive) sont significativement favorisées. Ces transitions
ne peuvent pas se produire avec une seule mutation, signifiant que la fonction doit être
perdue avant d’être regagnée avec les résidus phosphomimétiques. En conclusion, nous
discutons de comment la duplication des gènes peut faciliter les transitions entre acides
aminés phosphorylés et acides aminés phosphomimétiques.
31
3.2 – Abstract
Gene duplication followed by divergence is an important mechanism that leads to
molecular innovation. Whereas regulatory divergence at the transcriptional level is well
documented, little is known about divergence of posttranslational modifications (PTMs).
Here we test whether gains and losses of phosphorylated amino acids after gene duplication
may specifically modify the regulation of these duplicated proteins. We show that when
phosphosites are lost in one paralog, transitions from phosphorylated serines and threonines
are significantly biased toward negatively charged amino acids, which can mimic their
phosphorylated status in a constitutive manner. Surprisingly, these favoured transitions
cannot be reached by single mutational steps, which suggests that the function of a
phosphosite needs to be completely abolished before it is restored through substitution by
these phosphomimetic residues. We conclude by discussing how gene duplication could
facilitate the transitions between phosphorylated and phosphomimetic amino acids.
32
3.3 – Introduction
Gene duplication is one of the most prominent mechanisms by which organisms acquire
new functions (Ohno, 1970). Spectacular examples of such gains of function resulting from
gene duplications are the evolution of trichromatic vision in primates (Dulai et al, 1999),
the evolution of human beta-globin genes that are involved in the oxygen transport at
different developmental stages (Efstratiadis et al, 1980) as well as the expansion of the
family of immunoglobulins and other immunity related genes that shaped the vertebrate
immune system (Boulais et al, 2010; Zhang, 2003). Because of the central role of gene
duplication in evolution, there has been a profound interest for a better understanding of
how these new functions evolve at the molecular level (Hurles, 2004), for determining at
what rate gene duplication occurs (Lynch & Conery, 2000; Lynch et al, 2008; Wagner,
2001a) and for testing whether the retention of paralogous genes necessarily requires the
evolution of new functions (Force et al, 1999; Hurles, 2004; van Hoof, 2005). One of the
most important challenges has been to determine mechanistically how specific mutations
translate into new functions, as establishing sequence-function relationships remains a
difficult task (Dean & Thornton, 2007).
After a gene duplication event, the two sister paralogs are identical copies of their ancestor
and encode two identical functions, thus relaxing the selective constraints on each paralog
(Lynch & Conery, 2000). Under most evolutionary models, both paralogs have to diverge
to be retained on evolutionary time scales, otherwise one paralog would be lost and the
system would return to its ancestral state (non-functionalization) (Hurles, 2004). There are
two ways for paralogs to diverge in function. The first one is the acquisition of new
functions by one or both of the two paralogs, a mechanism called neofunctionalization
(Force et al, 1999; Lynch & Conery, 2000; Ohno, 1970). The second mechanism, called
subfunctionalization, implies the complementary partitioning of the ancestral function
between the two paralogs by losses of functions (Force et al, 1999; Lynch & Conery, 2000;
Lynch & Force, 2000). These two mechanisms are not mutually exclusive because the
ancestral function can be partitioned by subfunctionalization and then one or both paralogs
may acquire new functions by neofunctionalization, a mechanism called
neosubfunctionalization (He & Zhang, 2005). An increase in the dosage of a gene product
33
by the addition of a second identical copy of the ancestral gene can also contribute to the
retention of paralogous pairs, without the need for the gain or loss of functions
(Kondrashov & Koonin, 2004; Kondrashov et al, 2002).
Divergence between paralogs does not necessarily imply a divergence in a specific function
but can also involve a change in the regulation of that function. For instance, the regulatory
control of a protein function can be modified at the transcriptional or at the
posttranslational level. Divergence in expression pattern of duplicated transcript is well
documented (Ferris & Whitt, 1979; Force et al, 1999; Gu et al, 2002; Ohno, 1970). For
example, Gu et al. showed that a large fraction of ancient duplicated gene pairs in yeast
shows divergent gene expression patterns (Gu et al, 2002). A more recent study showed
that nearly half of the genes that duplicated after a Whole Genome Duplication event
(WGD) in a forest tree species have diverged in expression by a random degeneration
process (Rodgers-Melnick et al, 2012). However, little is known about the divergence of
regulation by posttranslational modifications (PTMs), which take place after transcription
and translation and directly affects protein activities (Moses & Landry, 2010).
PTMs are covalent modifications of one or more amino acids that affect the activity of a
protein, its localization in the cell, its turnover rate, and its interactions with other
molecules (Mann & Jensen, 2003). Cells use a wide range of different PTMs to exert
distinct regulations on proteins. Although only 20 amino acids are encoded by the genetic
code, more than 200 amino acid variants or their derivatives are found in proteins after
PTMs (Seo & Lee, 2004). Phosphorylation, the addition of a phosphate moiety from an
ATP donor to a serine (Ser), threonine (Thr) or tyrosine (Tyr) residue by a protein kinase, is
by far the best-known PTM, as it is the most common and is involved in the regulation of
key biological processes of fundamental and medical interest, such as signal transduction
and cell-cycle regulation (Hunter, 2000). Phosphorylation of these amino acids modifies
their biochemical properties in several manners. Of particular interest for this study is the
fact that the addition of a phosphate group brings two new negative charges that allow the
formation of a salt-bridge or contribute to the local charge of the protein (Serber & Ferrell,
2007). Given that a phosphate group is a relatively large molecule, phosphorylation can
34
also have sterical effects. Such properties can notably induce conformational changes of the
protein, modify its catalytic activity or block the access to its catalytic site, which result in
the activation or inhibition of the activity of the target protein by direct or allosteric effects
(Serber & Ferrell, 2007).
Several of the effects of protein phosphorylation can be mimicked by the negatively
charged amino acids aspartic acid (Asp) and glutamic acid (Glu). Indeed, the biochemical
properties of these amino acids are close to those of phosphorylated Ser or Thr residues
(Tarrant & Cole, 2009). In particular conditions, Asp and Glu are constitutive functional
equivalents of phosphosites in a phosphorylated state. This functional resemblance has
been exploited by biochemists by replacing Ser and Thr residues by Asp and Glu in
proteins of interest in order to mimic their phosphorylated status. This molecular mimicry
led them to call Asp and Glu phosphomimetic amino acids (Tarrant & Cole, 2009). This
trick appears to have been also used by nature to evolve new phosphosites. A striking
example comes from the evolution of the Activation Induced cytidine Deaminase (AID)
across vertebrates, an enzyme involved in the generation of antibody diversity. The
interaction of this enzyme with the Replication Protein A (RPA) promotes AID access to
transcribed double-stranded DNA during immunoglobulin class switch recombination. This
interaction requires a negative charge on AID, which is provided by an Asp in bony fish. In
these organisms, the enzyme is constitutively capable of interacting with RPA. In
amphibians and mammals, the function of the Asp residue is carried out by a
phosphorylatable Ser (pSer), which allows the regulation of the protein interaction by
protein kinases in a condition specific fashion (Basu et al, 2008). It was recently suggested
that this type of evolutionary transitions might be common. Globally, it was shown that
pSer tend to evolve from or to phosphomimetic amino acids (Asp and Glu) when gained
and lost respectively throughout the evolution of eukaryotes (Kurmangaliyev et al, 2011;
Pearlman et al, 2011).
Protein phosphoregulation has been suggested to play a role in the evolutionary fate of
paralogous proteins. Most studies done so far focused on the paralogous genes of the
budding yeast Saccharomyces cerevisiae because its phosphoproteome has been intensely
35
studied (Albuquerque et al, 2008; Beltrao et al, 2009; Gnad et al, 2009). Using the yeast
paralogs that derive from the WGD event, Amoutzias et al. showed that the number of
phosphosites on a phosphoprotein is an important determinant for the retention of its
duplicated descendants (Amoutzias et al, 2010). In a following study, Freschi et al. studied
the gains and losses of phosphosites in paralogous phosphoproteins and found that the great
majority of them are present in one paralog and not in the other. This divergence was
shown to be principally driven by losses rather than gains of phosphosites on one paralog
(Freschi et al, 2011). Finally, Kaganovich and Snyder found that phosphosites tend to
diverge more asymmetrically than non-phosphorylated amino acids, playing thus an
important role in paralogous genes divergence and retention (Kaganovich & Snyder, 2012).
These observations raise the question of where do phosphosites come from and where do
they go after a gene duplication. According to the observations on phosphomimetic amino
acids described above, gains and losses of phosphosites could represent two distinct types
of divergence. On the one hand, the gain or the loss of phosphosites from or to a non-
phosphomimetic residue would represent a divergence in the function of the protein. On the
other hand, a gain or a loss could occur from or to phosphomimetic residues, leading to a
modification of the control of the charged residue by the cell rather than a modification of
function per se. Here we test whether this second scenario could have contributed to the
divergence of paralogous proteins using the yeast phosphoproteome as a model.
3.4 - Methods
Dataset
All analyses were performed using the dataset we compiled in a previous study (Freschi et
al, 2011) and that is available at http://www.bio.ulaval.ca/landrylab/download/. This dataset
contains 20,342 phosphosites on 2688 proteins from eight large-scale studies (Albuquerque
et al, 2008; Beltrao et al, 2009; Bodenmiller et al, 2007; Chi et al, 2007; Gnad et al, 2009;
Gruhler et al, 2005; Li et al, 2007; Reinders et al, 2007). It also provides the alignments of
all S. cerevisiae WGD paralogous genes with their ancestral sequence and with the
orthologs of L. kluyveri and Z. rouxii. The aligments were performed using MUSCLE
(Edgar, 2004) while the ancestral sequence was inferred using the Codeml method
implemented in PAML (Yang, 2007). We chose to analyze only two species that diverged
before the Whole Genome Duplication event for the following reasons. The majority of
phosphorylation sites are located in disordered regions (Landry et al, 2009) and these
regions are fast evolving. Alignment of sequences from distantly related species leads to
spurious alignments or to alignments that may contain several indels. Indels decrease the
number of phopshorylation sites available for the analysis, as ancestral sequences cannot be
computed at these positions. Further, in Freschi et al. 2011 (Freschi et al, 2011), we
performed the analyses including an additional species that diverged prior to the whole-
genome duplication and we found that this did not significantly affect our results. Finally,
this dataset also provides information about the localization of each residue in ordered or
disordered regions of the protein, according to predictions made with DISOPRED (Ward et
al, 2004b).
Approaches to study gains and losses of phosphosites
We applied different approaches to study gains and losses coming from, or going to
negatively charged amino acids. In the first approach, we used the ancestral sequence as a
reference to assess the presence of a gain or a loss at a specific position. For the gains, we
compared the proportion of phosphomimetic amino acids in the ancestral sequence (Asp or
Glu) going to pSer or pThr to the proportion of phosphomimetic amino acids going to cSer
and cThr. For the losses, we compared the proportion of phosphorylated residues (pSer and
37
pThr) coming from Asp or Glu to the proportion of non-phosphorylated residues (cSer and
cThr) coming from Asp or Glu, respectively. We required the ancestral sequence to have a
phosphorylatable residue and one of the two paralogs to be phosphorylated at the
homologous position. Comparisons of proportions were performed using Fisher’s exact
tests as implemented in R. In our second approach, we used a parsimony method to
calculate the same proportions. This time we used the sequences of S. kluyveri and Z. rouxii
as reference. In the case of a gain of phosphosites, we required the presence of the same
negatively charged residue (Asp or Glu) in the reference species as well as in one of the
two paralogs and a phosphorylatable residue (Ser or Thr) in the other paralog. In the case of
losses of phosphosites, we required the presence of the same phosphorylatable residue (Ser
or Thr) in the reference species as well as in one of the two paralogs and a negatively
charged residue (Asp or Glu) in the other paralog. All proportions were calculated by
dividing the number of sites coming from or going to an Asp or a Glu by the number of
sites that come from or go to any of the 17 non-phosphorylatable amino acids following the
same criteria (Figure 3.1).
3.5 – Results
The phosphoproteome of S. cerevisiae is the best described among eukaryotes and has been
mapped by mass spectrometry, leading to the identification of high-confidence
phosphosites (Albuquerque et al, 2008; Beltrao et al, 2009; Gnad et al, 2009). We
assembled a data set (Freschi et al, 2011) that consists of 2,726 phosphosites (Ser, 82%;
Thr, 16%; Tyr, 2%) that belong to one or the other member of the 352 pairs of yeast WGD
paralogs for which at least one of the two proteins is a phosphoprotein. We inferred the
ancestral sequence for each pair of paralogs using alignments with orthologous sequences
from Lachancea kluyveri and Zygosaccharomyces rouxii, two species that diverged from S.
cerevisiae before the WGD event. For each pair, we aligned all five sequences, we mapped
the phosphosites on the sequences of the paralogs and analysed phosphosites that diverged,
i.e. cases where a phosphorylatable residue was present in only one paralog.
38
Under a scenario where gains of phosphosites would result from selection for transitions
from phosphomimetic amino acids to phosphorylated residues, we would expect
phosphorylated Ser or Thr (pSer and pThr, respectively) to evolve more often from Asp or
Glu than non-phosphorylated ones (cSer and cThr, respectively). Similarly, under a
scenario where losses of phosphosites would result from transitions from phosphorylated
residues to phosphomimetic amino acids, we would expect pSer and pThr to evolve more
often to Asp and Glu than equivalent cSer and cThr. We tested these two hypotheses as
described in (Figure 3.1).
In the first case, we compared the proportion of pSer and pThr that were gained from Asp
and Glu with that of cSer and cThr, i.e. all serines and threonines from the same set of
proteins that were gained from Asp and Glu but that are not known to be phosphorylated. In
the second case, we compared the ratio of sites that were lost and replaced by
phosphomimetic residues in only one paralog with the ratios derived from cSer and cThr.
We performed the analysis using paralogous ancestral sequences inferred with a likelihood
method and also using a parsimonious approach, whereby the ancestral state of phoshosites
was inferred based on the conservation of the site in one of the two paralogs and its two
orthologs (Figure 3.1A). Global results are presented in Figure 3.2 and detailed analyses are
presented in Figure 3.3.
39
Figure 3.1. Algorithm used to calculate and compare the proportions of transitions between
phosphorylated and phosphomimetic residues relative to control sites.
(A) Phosphosite (pS, pT) gains from phosphomimetic amino acids were identified as cases where
only one of the paralog has a phosphosite and the ancestral sequence has a phosphomimetic residue
at the same position. Control sites (cS, cT) were identified in the same way but considering Ser and
Thr that are not known to be phosphorylated. The ancestral sequence was inferred using likelihood
or parsimony approaches. Phosphosites losses to phosphomimetic amino acids were identified as
cases where one paralog has a phosphosite in a position that is occupied by a phosphomimetic
amino acid in the other paralog and a phosphorylatable amino acid at the same position in the
ancestral sequence. (B) The proportion of pS or pT that evolved from or to D or E was compared to
the proportion of cS or cT that evolved from or to D or E. X represents any amino acid with the
exception of Ser, Thr and Tyr.
40
Figure 3.2. Phosphosites that are differentially lost in paralogous phosphoproteins evolve
toward negatively charged residues.
Each bar represents the percentage of sites (pSer and pThr, cSer and cThr) that evolved from or to
Asp or Glu. Numbers above the bars represent the total number of pSer, cSer, pThr or cThr sites
that were gained or lost. Numbers above the arrows indicate p-values of the Fisher’s exact tests,
bold ones being below 0.05.
41
Figure 3.3. Detailed analysis of the patterns of evolution of pSer and pThr sites.
Each bar represents the percentage of sites (pSer, cSer, pThr or cThr) that evolved from or to Asp or
Glu. Numbers above the bars represent the total number of pSer, cSer, pThr or cThr sites that were
gained or lost. Numbers above the arrows indicate p-values of the Fisher’s exact tests, bold ones
being below 0.05. The top panel shows results obtained by ancestral sequence reconstruction using
a likelihood approach and the bottom panel using parsimony.
42
A gobal analysis of pSer, pThr, Asp and Glu shows that phosphosites tend to be lost to Asp
and Glu more frequently than cSer and cThr, and this holds true for both likelihood (16.6%
vs 12.1%, respectively, p = 0.002) and parsimony (17.1% vs 9.6%, respectively, p = 0.006)
reconstruction methods (Figure 3.2). However, although there is a tendency towards the
gains of phosphosites form Asp and Glu, the observed differences are not significant
(Figure 3.2). When studied separately, phosphosites in ordered and disordered regions
show the same global tendency to go toward phosphomimetic amino acids (Likelihood:
17.5% vs 10.0% in ordered regions, p = 0.058; 16.5% vs 13.7% in disordered regions, p =
0.086. Parsimony: 20.0% vs 8.1% in ordered regions, p = 0.076; 16.7% vs 11.7% in
disordered regions, p = 0.110). This suggests that the effect might be more important in
ordered regions of proteins, as would be expected if these residues were playing structural
roles. Further, we found that phosphosites are not preferentially gained from
phosphomimetic amino acids in disordered regions, while there is a non significant
tendency for this type of transition in ordered regions (Likelihood: 16.0% vs 15.7% in
disordered regions, p = 0.943; 18.8% vs 13.7% in ordered regions, p = 0.294. Parsimony:
14.1% vs 14.2% in disordered regions, p = 1.000; 11.8% vs 10.2% in ordered regions, p =
0.691). Because the distinction between order and disorder reduces the number sites in each
category and does not provide opposite results, we considered both regions simultaneously
in the following analyses.
We also examined which class of substitution could be contributing to this overall result
(Figure 3.3). We first found that pSer and pThr that were gained after gene duplication
follow trends that are in the expected direction although some of the comparisons are not
statistically significant and other results are in the opposite direction (Figure 3.3). However,
this detailed analysis showed that pSer are significantly more likely to evolve to Glu than
cSer (11.6% vs 5.3%, p = 0.008) and pThr evolve significantly more frequently to Asp than
cThr (9.8% vs 4.3% respectively, p = 0.013).
Protein phosphorylation is known to have a key role in regulating protein activities (Cohen,
2000). Evolutionary events such as gains and losses of phosphosites can lead to changes in
protein regulation, thus rewiring the protein regulatory network of the cell (Freschi et al,
43
2011). In the literature, there is evidence for gains of new phosphosites coming from
negatively charged residues among orthologs (Basu et al, 2008; Pearlman et al, 2011) as
well as cases of losses of phosphosites to these amino acids (Kurmangaliyev et al, 2011).
The biochemical properties of Glu and Asp mimic the ones of pSer and pThr with the
exception that their charge is not regulatable (Tarrant & Cole, 2009). These observations
led us to hypothesize that coding sequence divergence of paralogous genes by neo and
subfunctionalization does not strictly involve the apparition or the partitioning of protein
function. Paralogous genes could also diverge in how these functions are regulated.
Divergence in the regulatory control is well known at the transcriptional level (Gu et al,
2005; Rodgers-Melnick et al, 2012), but has not been specifically addressed at the
posttranslational level. We tested this hypothesis on the complete set of WGD
phosphoproteins of the buddying yeast S. cerevisiae.
Using two different methods to infer the ancestral state of phosphorylated and non-
phosphorylated Ser and Thr, we found that pSer and pThr globally have a tendency to
evolve from negatively charged amino acids in paralogous phosphoproteins compared to
their non-phosphorylated counterparts. The tendencies observed are in agreement with our
hypothesis and with the observations made by Pearlman et al. across eukaryotes (Pearlman
et al, 2011). However, the observed differences are not significant, which could be
explained by a few non-exclusive scenarios. First, we are looking at a narrow evolutionary
window (100 My), which contrasts with the analysis conducted by Pearlman et al., who
used aligned sequences from organisms spanning the entire tree of life (Pearlman et al,
2011). Further, the mechanism proposed may apply primarily to few sites and in ordered
regions of proteins. Only few phosphosites in these regions could be analysed here since
the majority of them is found in disordered regions [37], which reduces the statistical power
of our analysis. Our results regarding losses of phosphosites are in line with this hypothesis.
Finally, a significant fraction of phosphosites are thought to be non-functional (Landry et
al, 2009). Because these non-functional sites are not under selective pressure, they may
contribute to decrease the signal coming from functional sites. Nevertheless, from our
results, we cannot rule out the possibility that gains of phosphosites are not more likely to
derive from phosphomimetic residues after gene duplications. A larger sample size, the
44
study of a time window of a different length and a better knowledge of the functional
importance of phosphosites may be needed to provide a final answer.
Following the same approach, we examined whether phosphorylated residues, when lost,
are more likely to be replaced by Asp and Glu than when non-phosphorylated equivalent
residues are lost. We found that this is the case globally and also when considering
individual cases for both pSer and pThr; pSer are more likely to be replaced by Glu
residues while pThr by Asp residues. A similar trend was detectable for the transitions from
pThr to Glu. These results are in agreement with those from Kurmangaliyev et al.
(Kurmangaliyev et al, 2011) who also showed that pSer are more likely to evolve to
phosphomimetic amino acids than cSer in the divergence of orthologs between species. Our
results show that the evolutionary trajectories of pSer and pThr provide a mechanism for
paralogous protein divergence. Our analyses support the hypothesis that divergence
between paralogs can be generated by a loss of the posttranslational regulatory control on a
function rather than by the complete loss of the function itself. Indeed, the substitution of a
phosphosite for an Asp or a Glu residue may block one paralog into a single constitutive
functional state whereas the other one remains regulatable by protein kinases and
phosphatases.
3.6 - Conclusion
Our results raise the question of how these transitions are made possible during evolution.
The genetic code is organized in such a way that transitions between phosphorylatable and
phosphomimetic amino acids involve a transition state with an amino acid that is not
negatively charged, except for transitions between two Asp and two Ser codons that
involve a Tyr residue (Figure 3.4).
45
Figure 3.4. Transitions between phosphorylatable and phosphomimetic amino acids need to
go through a non-negatively charged intermediate.
However, Tyr is only rarely phosphorylated in yeast and Tyr residues are not
phosphorylated by the serine/threonine kinases (Ubersax & Ferrell, 2007), which suggests
that this path would not be favoured. A non-negatively charged intermediate could lead to a
complete loss of the function that was performed by the negative charge and could thus be
deleterious (Figure 3.5A).
46
Figure 3.5. A duplication event could provide the conditions for the intermediate non-
functional site to be neutral, which would allow a transition without affecting the fitness of the
organism.
(A) Without a duplication event, the loss of a negative charge could have deleterious effects if the
charge is important for the function of the protein. (B) The redundant paralogous gene copy could
serve as a backup and prevent deleterious effects created by the loss of the charge. The backup copy
could then be retained or lost. In the latter case, the system would be different from its ancestor.
Here we propose that the relaxed constraints that follow a gene duplication event could
provide the mean to reach this intermediate state and to go beyond (Figure 3.5B). After
gene duplication, when one of the duplicated copies is lost, the system is assumed to go
back to its ancestral state, a process called non-functionalization (Lynch & Conery, 2000).
However, following our model, the duplicated copy could serve as a backup for a transition
period, which would allow the other copy to reach a state that would have been unreachable
otherwise (Gordon, 1994; Hansen et al, 2000; Scannell & Wolfe, 2008). After the loss of
the backup copy, the system would remain different from its ancestral state since the
phosphorylation profile and thus the phosphoregulation of this protein has changed. The
47
term non-functionalization may thus not be suitable for such cases. In the case of a WGD
event, where the vast majority of the duplicated genes are eventually lost and are thought to
return back to their ancestral state, these 2-step transitions could potentially lead to a great
burst in the evolution of phosphoregulation. Further studies at different time points
following gene duplication would be needed to determine how important this mechanism
could be for the evolution of phosphosites.
3.7 - Acknowledgements
This work was supported by a Canadian Institute of Health Research (CIHR) Grant GMX-
191597 and Natural Sciences and Engineering Research Council of Canada discovery grant
to C. R Landry. C. R Landry is a CIHR New Investigator. G. Diss and L. Freschi were
supported by fellowships from the Quebec Research Network on Protein Function,
Structure and Engineering (PROTEO). We thank the members of the Landry laboratory,
two anonymous referees and N. Aubin-Horth for comments on the manuscript.
49
Chapter 4 - Functional Divergence and Evolutionary Turnover in Mammalian Phosphoproteomes
Published on: Freschi L., Osseni M., Landry C.R (2014) Functional Divergence and
Evolutionary Turnover in Mammalian Phosphoproteomes, PLoS genetics 10 (1), e1004062
50
4.1 – Résumé
Ici, nous avons étudié l’évolution de la phosphorégulation chez les mammifères en
comparant les phosphoprotéomes de l’homme et la souris. Nous avons trouvé que 84% des
positions qui sont phosphorylées dans une espèce ou l’autre sont conservées au niveau des
résidus. Vingt pourcent de ces sites conservés sont phosphorylés dans les deux espèces.
Cette proportion est 2.5 fois plus grande que ce qui est attendu par chance. Cela suggère
que la sélection purificatrice tend à préserver la phosphorégulation. L'autre 80% des sites
qui sont conservés au niveau du résidu sont différentiellement phosphorylés chez l’homme
et la souris. Nos résultats suggèrent qu’au moins 5% de ces sites ont le potentiel d’être des
vrais cas de divergence entre les réseaux de phosphorylation de ces deux espèces et cela
même si le résidu est conservé dans les protéines orthologues des deux espèces. Nous avons
aussi montré que le turn-over évolutif des sites de phosphorylation qui se trouvent dans des
positions adjacentes chez l’humain ou la souris mène à une surestimation de la divergence
de phosphorégulation dans ces deux espèces. Notre étude propose des analyses avancées
des phosphoprotéomes et un cadre pour l’étude de leur contribution à l’évolution
phénotypique.
51
4.2 - Abstract
Here, we studied the evolution of mammalian phosphoregulation by comparing the human
and mouse phosphoproteomes. We found that 84% of the positions that are phosphorylated
in one species or the other are conserved at the residue level. Twenty percent of these
conserved sites are phosphorylated in both species. This proportion is 2.5 times more than
expected by chance alone, suggesting that purifying selection is preserving
phosphoregulation. The other 80% sites that are conserved at the residue level are
differentially phosphorylated between species. We showed that least 5% of them are likely
to reflect true cases of phosphoregulatory divergence between mouse and humans.
Moreover, we showed that evolutionary turnover of phosphosites at adjacent positions in
human or mouse leads to an over estimation of the divergence in phosphoregulation
between these two species. Our study provides a framework for the study of
phosphoregulatory divergence contribution to phenotypic evolution.
52
4.3 – Introduction
Most proteins undergo chemical modifications after their synthesis (post-translational
modifications, PTMs). These modifications allow a fine-tuning of protein functions and
represent a mechanism to expand the coding capacity of genes (Nussinov et al, 2012). Over
the past decade, methods based on mass spectrometry have accelerated the discovery of
PTMs (Beausoleil et al, 2004; Choudhary et al, 2009; Huttlin et al, 2010; Kim et al, 2011;
Olsen et al, 2006; Zielinska et al, 2010). Each experiment can now detect thousands of
modified residues, allowing to probe the functional state of entire proteomes. The PTM that
has been studied the most is protein phosphorylation: the addition of a phosphate group to
specific amino acids (serine (S), threonine (T) and tyrosine (Y) in eukaryotes).
Phosphorylation has been shown to affect protein functions, interactions, stability and
localization (Khmelinskii et al, 2009; Madeo et al, 1998; Sprang et al, 1988; Vazquez et al,
2000). It is thus of fundamental importance to understand how protein phosphorylation
evolves within and between species because changes in phosphorylation profiles may cause
changes in protein function and regulation and in organismal phenotypes, including disease
development (e.g. (Herbig et al, 2000)).
There have been several reports recently on the evolution of phosphoproteomes. For
instance, Kim and Hahn (Kim & Hahn, 2011) identified phosphorylation sites that emerged
after the split between humans and chimpanzees and found that these sites are located in
proteins involved in crucial biological processes such as cell division and chromatin
remodelling. Other studies have looked at the evolution of a subset of phosphoproteomes
on a broader evolutionary scale (Boulais et al, 2010; Malik et al, 2008). For example,
Boulais and collaborators (Boulais et al, 2010) performed a phosphoproteomics analysis of
mouse phagosomal proteins and then compared these proteins to their orthologs from 10
model organisms, from Drosophila to mouse (Boulais et al, 2010). They observed that the
phagosomal phosphoproteome was extensively rewired during evolution, but that some
phosphorylation sites have been maintained for more than a billion years, suggesting their
importance for phagosomal functions. Finally, other studies looked at the conservation and
divergence of entire phosphoproteomes over a broad evolutionary scale (Boekhorst et al,
2008; Gnad et al, 2007; Landry et al, 2009; Tan et al, 2009) (and reviewed in (Levy et al,
53
2012)) in order to understand the evolutionary mechanisms and the constraints acting on
phosphorylation sites. These studies found that phosphorylated residues tend to be on
average more conserved than their non-phosphorylated counterparts (Gnad et al, 2007;
Landry et al, 2009) and that this is particularly true for those that were experimentally
shown to play functional roles (Landry et al, 2009).
Most studies that aimed at studying the evolution of phosphoproteomes so far have looked
at the evolutionary conservation of phosphorylation sites in several species without
knowing if these sites are actually phosphorylated in species other than the reference. In
other words, if a phosphorylation site in one species corresponds to a phosphorylatable
amino acid in another species, both residues were considered as conserved phosphorylation
events. This assumption was necessary because of the lack of phosphorylation data
available for more than one species. However, we can hypothesize that residue
conservation does not always imply phosphoregulatory conservation. Indeed, sites could be
conserved at the residue level but differ in their phosphoregulation due to changes
elsewhere in the protein, for instance, the recognition motifs of the protein by kinases and
phosphatases (Ubersax & Ferrell, 2007) or upstream (in trans) in the signalling cascade.
This aspect has not been addressed by previous studies, except in a few cases (Beltrao et al,
2009; Boulais et al, 2010). However, identifying such sites is of great interest since sites
that differ in their phosphoregulation despite being conserved at the residue level could lead
to changes in the architecture of phosphorylation networks and, ultimately, contribute to
phenotypic evolution. We examine this issue here.
Another aspect of phosphoproteomes that can be studied using evolutionary analysis is how
phosphorylation sites alone or in combination may affect the function of a protein
(Nussinov et al, 2012). Many models of phosphorylation site function stress the importance
of conformational changes by protein phosphorylation (Barr & Bogoyevitch, 2001;
Nussinov et al, 2012; Skou, 1965). In other models, phosphorylation sites regulate protein
functions without the need for conformational changes but rather through changes in the
local charge of the protein (Serber & Ferrell Jr, 2007), i.e. simply through bulk
electrostatics. A corollary of this last model is that the protein phosphorylation code is
54
redundant, i.e. that phosphorylation sites can change their position over time and still
maintain their biological function as long as the number of sites in a given protein region is
preserved, without affecting organismal phenotypes. By looking at the patterns of evolution
of phosphorylation sites, one could find traces of this redundancy by studying rapid
phosphorylation site evolutionary turnover (phosphorylation site gains and losses). This
evolutionary turnover has been invoked for interpreting the global rapid pattern of
evolution in different species (Ba & Moses, 2010; Freschi et al, 2011; Gnad et al, 2007;
Landry et al, 2009; Macek et al, 2008; Malik et al, 2008). However, evidence for positional
redundancy of phosphorylation sites is relatively limited. Two independent pieces of
evidence come from the cell cycle phosphorylation networks. Moses and collaborators
(Moses et al, 2007) studied the evolution of cyclin-dependent kinase (CDK) consensus
phosphorylation sites of the yeast pre-replicative complex (Bell & Dutta, 2002). They
found that although orthologous proteins contained clusters of CDK consensus sites, the
position and the number of phosphorylatable sites were not conserved, suggesting that
phosphorylation sites tend to shift their positions during evolution. In a more recent
investigation, Holt and collaborators (Holt et al, 2009) compared the positions of 547
phosphorylation sites on 308 Cdk1 substrates in vivo in the budding yeast and their
orthologous sites in other fungi. They found that the precise positioning is conserved only
in the very closely related species. However, in both cases the phosphorylation status of the
sites in other species was not investigated so it is not clear whether the phosphorylation
sites were absent from the orthologous proteins or if they actually shifted during evolution
through gains or losses to another position. The extent to which phosphorylation site
positional redundancy plays a role in overall phosphoproteome turnover therefore awaits
comprehensive phosphorylation data from closely related species, which we have
assembled here.
We performed an integrated analysis of phosphorylation site evolution between the human
and mouse proteomes using a large dataset of phosphorylation sites (Beltrao et al, 2012;
Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava
Prasad et al, 2009; Minguez et al, 2012). These two phosphoproteomes are the ones for
which we have the greatest amount of phosphoproteomics data between closely related
55
species. We estimated the extent of divergence and conservation between the two
phosphoproteomes and we investigated whether phosphorylation site evolutionary turnover
could contribute to this divergence.
4.4 - Methods
Phosphoproteomics and sequence data
An extensive dataset of human and mouse phosphorylation sites was built by combining
data from 7 different databases and experimental studies (Beltrao et al, 2012; Dinkel et al,
2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava Prasad et al,
2009; Minguez et al, 2012). All protein sequences and orthology relationships were
retrieved from ENSEMBL (version 69). In this study, only protein sequences for which we
could find orthology relationships between a human protein and at least a mouse, dog and
opossum protein were considered. This step allowed us to study the evolutionary history of
phosphorylation sites. For humans and mouse, orthology relationships were determined for
the longest isoforms of each protein. Each group of orthologous sequences was aligned
using MUSCLE (Edgar, 2004). Disordered and ordered regions of proteins were predicted
using DISOPRED (Ward et al, 2004a). In order to map phosphorylation sites to our
sequences, the following procedure was applied. The sites that were already mapped onto
proteins associated with ENSEMBL IDs in the original datasets were directly mapped to
our sequences. For all other cases, phosphopeptides were mapped onto proteins using
BLAT (Kent, 2002). All peptides that mapped to more than one protein were removed at
this step. Mapped phosphorylation sites and information about protein disorder are
available in Annex 2, Dataset S2.
Calculating random expectations for phosphorylation sites
In order to calculate the random expectation for the number of sites belonging to each one
of the different categories (StC, StD and SiD), statuses (0: non-phosphorylated, 1:
phosphorylated) of phosphorylatable amino acid were shuffled in each protein by
preserving the overall proportion of sites for each residue (S, T or Y) and the localization in
disordered/ordered regions. The null distributions were estimated by iterating this
56
procedure 100 times, calculating each time the number of sites belonging to each category.
We calculated random expectations by shuffling the mouse sites only. We also performed
the calculations by independently shuffling both human and mouse sites and found similar
results.
Protein abundance data and classes of abundance
Data on protein abundance were taken from PaxDb (Wang et al, 2012) (H. sapiens whole
organism integrated dataset). In the analysis presented on Figure 4.1D, proteins were
ordered by their abundance and divided in four equal bins.
Housekeeping proteins, tissue specific proteins and sites with known function
Data on housekeeping genes were retrieved from Eisenberg and Levanon (Eisenberg &
Levanon, 2003) who identified 575 human genes that are expressed in 47 different tissues
and cell lines based on microarray data. Data on tissue-specific genes derive from an
independent dataset and were retrieved from the TiGER database (Liu et al, 2008). About
5.3 millions human ESTs were mapped to UniGene clusters and the expression pattern of
the all UniGenes in 30 human tissues was determined using the NCBI EST database. 7,261
tissue-specific genes were identified. Manually curated data on functional phosphorylation
sites (n = 156) were retrieved from Landry et al. (Landry et al, 2009). These sites were
derived from the manual curation of the primary literature.
NetPhorest and position weight matrices scores
NetPhorest (Miller et al, 2008) was downloaded from (http://netphorest.info) and was run
locally using default options. In order to calculate position weight matrices scores, 29
position weight matrices which scores are based on the same metric were obtained from
Benjamin Turk (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al,
2009; Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010;
Pike et al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012). These
matrices were used to score all 10-mer amino acids in the mouse and human proteomes that
have a phosphorylatable amino acid on the sixth position. The score reflects the probability
of each 10-mer to be phosphorylated by a specific kinase.
57
Comparison of proportions, distributions and correlations
Proportions were compared with 2-sample tests for equality of proportions with continuity
correction. Distributions were compared with non-parametric Wilcoxon Rank Sum tests.
Correlations were calculated with the Spearman method. All these statistical analyses were
performed as implemented in R.
Algorithm to identify evolutionary clustered sites phosphorylation sites pairs
Site colocalization in orthologous proteins was estimated using a window of positions
(centered on each human phosphorylation site). The fraction of colocalized sites over the
total number of sites was calculated for a range of window sizes. In order to determine
which sites were closer in sequence linear space than expected by chance alone, the mouse
phosphorylation sites were shuffled in each protein by preserving the overall proportion of
sites for each residue (S, T or Y) and disordered/ordered regions, and the fraction of
colocalized sites was calculated for each window length. One thousand iterations were
performed in order to generate the null model. Also, we masked all the positions in which a
phosphorylatable amino acid was present at a given position in both human and mouse.
Evolutionary clustered sites were defined as sites that were more likely to be colocalized
than expected by chance alone (null model). The closest pair of phosphorylation sites
present in these windows was then selected (see also Annex 2, Figure S4.1). The
phosphorylatable amino acids serine (S) and threonine (T) differ in biochemical properties
compared to tyrosine (Y), another phosphorylatable amino acid (Taylor et al, 1995).
Therefore, S/T and Y sites were considered as belonging to separate classes and not
considered to be able to compensate each other. Only 1,529 pairs of orthologous proteins
that had at least two phosphorylation sites that diverged (site-divergence) in human and
mouse respectively were considered. Among these pairs, 563 had at least one SiD site that
involves a phospho-serine or phospho-threonine in both humans and mouse. Only one
single pair had a SiD site that involves a phospho-tyrosine in both humans and mouse.
58
Testing if evolutionary clustered sites tend to be phosphorylated by the same kinase or
group of kinases
The kinase that was the most likely to phosphorylate each one of the evolutionary clustered
sites was inferred using NetPhorest (Miller et al, 2008) and proportion of evolutionary
clustered site pairs phosphorylated by the same kinase was determined. This number was
compared to a null distribution obtained by randomly shuffling (10,000 iterations) the
kinase-phosphorylation site associations between different evolutionary clustered sites.
Analogous analyses were performed for StC and StD sites. We then performed the same
analysis but this time using the three best kinases predicted by NetPhorest, as proposed by
Tan et al. (Tan et al, 2009). We therefore considered two evolutionary clustered sites as
being phosphorylated by the same group of kinases if they shared one or more kinases
(kinase group) among the three best kinases predicted to be associated to each site
according to NetPhorest. This number was compared to a null distribution obtained by
randomly shuffling (100 iterations) the kinases-phosphorylation site associations between
different evolutionary clustered sites. Analogous analyses were performed for StC and StD
sites. Finally, we performed again all the analyses described above but this time using
position weight matrices from the literature (see section NetPhorest and position weight
matrices scores for further details) instead of NetPhorest to infer the kinase that was the
most likely to phosphorylate each one of the StD, StC and evolutionary clustered sites.
59
4.5 – Results
4.5.1 - Conservation and divergence between human and mouse phosphoproteomes
We assembled a dataset of human (n = 106,877) and mouse (n = 54,400) phosphorylation
sites by collecting data from 7 different databases and experimental studies (Beltrao et al,
2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010;
Keshava Prasad et al, 2009; Minguez et al, 2012) (Annex 2, Table S4.1). We successfully
mapped 128,705 sites onto 11,150 human and mouse orthologous proteins: 86,065 in
humans and 42,640 in mouse (Annex 2, Figure S4.2). As previously observed (Iakoucheva
et al, 2004; Landry et al, 2009), phosphorylation sites are preferentially located in
disordered regions of proteins (observed vs. expected proportions: 0.69 vs. 0.62, p-value =
2.2 × 10-16). Given this asymmetry in the localization of phosphorylation sites, we
generated all the null models of our analyses by respecting the proportion of sites in these
two structural categories. Our dataset allows comparing the human and mouse
phosphoproteomes using both sequence information and the phosphorylation status of each
site. Accordingly, we classified orthologous sites into three classes following Freschi et al.
(Freschi et al, 2011) (Figure 4.1A): i) Site-diverged (SiD): sites phosphorylated in one
species and non-phosphorylatable in the other; ii) State-conserved (StC): sites
phosphorylated in both species; iii) State-diverged (StD): sites that are conserved at the
residue level but that have been reported to be phosphorylated in only one of the two
species.
In order to examine the extent of conservation of phosphorylation between human and
mouse, we estimated the fraction of sites belonging to each of these three categories
compared to the total number of sites that are phosphorylated in human, mouse or in both
species. We first looked at phosphorylation site divergence. We found that 16,863 sites
(16% of the sites that are phosphorylated in human or mouse or both species) are SiD
(Figure 4.1B). These sites are about 1% less abundant than random expectations obtained
by shuffling the phosphorylation statuses of S/T/Y residues (Figure 4.1B), suggesting that
purifying selection is acting on phosphorylation sites to maintain their function but to a
limited extent, as previously observed with different approaches (e.g. (Landry et al, 2009)).
These sites, if functional, are expected to reflect differences in phosphoregulation between
60
human and mouse. However, a fraction of these SiD sites might be positionally redundant
site pairs such that the functional divergence may be overestimated (see below).
We examined other types of conservation and divergence. We first found that 20,146
phosphorylation sites (18% of the sites that are phosphorylated in human or mouse or both
species, Figure 4.1B) are StC. This proportion is 2.5 times greater than what is expected by
chance alone (Figure 4.1B). We observed this strong signal for conservation in both
disordered and ordered regions (Annex 2, Figure S4.3). These results suggest an overall
conservation of the phosphorylation profiles between the two species, most likely as a
result of purifying selection acting to maintain the phosphoregulation of these sites. We
performed a similar analysis on clusters of poly-S/T/Y (stretches of two or more
consecutive S/T/Y residues) rather than single residues and found the same patterns of
conservation and divergence (Annex 2, Figure S4.4).
61
Figure 4.1. Purifying selection is acting on mammalian phosphorylation sites and their
phosphorylation status.
(A) Site-diverged (SiD) sites are orthologous residues where one is phosphorylated and the other is
a non-phosphorylatable amino acid (any amino acid but S, T and Y). State-conserved (StC) sites are
62
orthologous phoshorylatable residues (S, T, Y) that are both reported to be phosphorylated. Finally,
state-diverged (StD) sites are orthologous phosphorylatable residues for which only one of the two
is phosphorylated. Circles with the P symbol indicate residue phosphorylation. Colors indicate the
different categories of sites. (B) Number of observed SiD, StC and StD sites and their respective
expected distributions as estimated by randomizing mouse phosphorylation sites. (C) Three
scenarios for StD sites: false positive and false negative identifications; rapidly evolving non-
functional phosphorylation sites; divergence in phosphoregulation. (D) Relationship between state-
conservation and protein abundance. The four classes of protein abundance have the same number
of proteins. (E) Comparison of the proportion of StC and StD sites in housekeeping and tissue-
specific proteins. (F) Comparison of the proportion of sites with known functions present in StC and
StD sites.
Despite an overall signal of conservation on the phosphorylation status of proteins, the
most represented category of sites in our dataset is StD sites (71,550 sites or 66% of the
sites that are phosphorylated in human, mouse or both species). Three different non-
exclusive scenarios could explain this large number of StD sites (Figure 4.1C). The first
one implies that state divergence results from an incomplete coverage of phosphoproteomic
data, which means that the phosphoproteomes of the two species might have been
undersampled, for instance sampled at different depths or in different conditions or tissues
(e.g. (Huttlin et al, 2010)). The second scenario is that a large fraction of the StD sites
identified might result from non-functional phosphorylation sites. Non-functional
phosphorylation sites evolve rapidly (Landry et al, 2009) and could therefore lead to the
poor conservation on the phosphorylation status we observed. The third scenario is that a
fraction of StD phosphorylation sites is actually diverging in its regulation. Finally, state-
divergence could also be inflated by false positive identifications in one species or the
other.
We examined which scenario or scenarios were compatible with our data. According to the
first scenario, StD may mostly result from false-negative phosphorylation sites in the data.
This is certainly the case for an important part of the data as our dataset contains twice as
much phosphorylation data for humans than mouse, and humans are not expected to have
63
more phosphorylation sites than mouse. We reasoned that if state-divergence is caused by
false-negatives in the datasets, we would expect to see the fraction of StC to increase as a
function of protein abundance, since highly abundant proteins are more likely to be
sampled in both species than rare proteins. Indeed, we found that the proportion of state
conserved sites almost doubles between the two extreme classes of abundance (Figure
4.1D, see also Figure 4.2A). Admittedly, this effect could also be caused by the fact that
phosphoregulation is more conserved on highly-expressed proteins but it is unlikely, as it
was recently shown that abundant proteins are enriched in non-functional phosphorylation
sites (Levy et al, 2012) that evolve relatively rapidly (Landry et al, 2009). In addition, only
conserved residues are considered in this analysis.
We also examined whether StC or StD phosphorylation sites were more likely to be found
in housekeeping or tissue-specific proteins. Housekeeping proteins are expressed in all
tissues, while tissue-specific ones are expressed in one or a few tissues. Accordingly, if StD
sites are affected by false negatives we would expect to find them preferentially in tissue-
specific proteins. We examined the dataset of housekeeping genes (Eisenberg & Levanon,
2003) and tissue-specific genes (Liu et al, 2008) and found that StC sites are preferentially
found in housekeeping proteins compared to StD sites (proportions: 0.027 vs. 0.019, p-
value = 0.005, Figure 4.1E), while the trend is reversed if we look at tissue specific proteins
(proportions: 0.268 for state diverged vs. 0.219 for StC, p-value = 6.1 × 10-5, Figure 4.1E).
This result is in agreement with our hypothesis that StD sites are affected by false
negatives, although this effect could be due to the fact that phosphoregulation is more
conserved on housekeeping proteins.
In order to examine whether non-functional phosphorylation sites could contribute to poor
state-conservation between species, we used a manually curated dataset of functional
phosphorylation sites compiled by Landry and collaborators (Landry et al, 2009).
Functional sites were identified as sites for which a phenotype was observed when
phosphorylatable residues were mutated. If non-functional sites contribute to state-
divergence, we would expect functional sites to be overrepresented in StC sites. We found
that StC sites are enriched in functional phosphorylation sites compared to StD sites
64
(proportions: 0.0025 vs. 0.00046, p-value < 1.19 × 10-14, Figure 4.1F). This observation
suggests that a fraction of the StD sites we identified might be non-functional
phosphorylation sites, which would explain their poor conservation status between species.
It is important to consider that in both cases these observations are not biased by residue
conservation as both StC and StD categories are composed of only phosphorylatable
residues.
4.5.2 - A role for state-diverged sites in phosphoproteome divergence
Our observation that the majority of StD sites might result from false-negative
phosphorylation site identifications or might be non-functional does not rule out the
possibility that at least some of these sites could be actual StD sites that diverge in
regulation, for instance due to the sequences surrounding the phosphorylated residues.
Kinase recognition motifs on substrates are difficult to compare directly due to their
degeneracy (Ubersax & Ferrell, 2007). We therefore relied on kinase prediction tools for
our analyses. We assigned each site to a protein kinase using the NetPhorest classifier
(Miller et al, 2008) to associate protein kinases with all phosphorylation sites based on the
site flanking sequences. NetPhorest classification is based on an atlas of consensus
sequence motifs that covers 179 kinases and 104 phosphorylation-dependent binding
domains and was built using in vivo and in vitro experimental data (Miller et al, 2008). If a
site is phosphorylated in one species but not in the other, the sequences surrounding the
phosphorylatable residue should match a kinase consensus motif better for the
phosphorylated site than for the orthologous non-phosphorylated one. Given that
NetPhorest provides a score (from 0 to 1) for many possible kinase-substrate associations,
we selected the kinase having the best NetPhorest score and we used this score as a proxy
to assess the probability of a given site to be phosphorylated. We relaxed this assumption in
some of our analyses. In addition, we performed the same analyses directly using a
collection of position weight matrices derived from mammalian kinases and the results are
in agreement with what we find with the NetPhorest predictions (Figure S4.5).
We first examined whether there was an association between S/T/Y phosphorylation and
NetPhorest scores and found that the probability for a site to be phosphorylated strongly
65
increases with increasing NetPhorest scores in both mouse and human data (Figure S4.6).
Another result in support of this observation is that the fraction of state conserved sites
increases as a function of NetPhorest scores (Figure 4.2A) and this relationship is
independent from protein abundance. We also found that prediction scores are very similar
for StC sites (median scores: 0.32 for the human phosphorylation sites vs. 0.32 in mouse
ones, p-value = 0.54) and higher than those of sites conserved at residue level but non-
phosphorylated in both species (median scores: 0.32 for StC vs. 0.20 for non-
phosphorylated residues, p-value = 2.2 × 10-16; Figure 4.2B and Figure S4.7A-B). This
confirms again a strong association between NetPhorest scores and the probability that a
site is phosphorylated. Surprisingly, we found that scores of StC sites were also higher than
the scores of the phosphorylated residues in the StD class (median scores: 0.32 vs. 0.22 for
humans, p-value = 2.2 × 10-16; 0.32 vs. 0.26 for mouse, p-value = 2.2 × 10-16; Figure 4.2B-
C and Figure S4.7A-B). This means that sites that are conserved and phosphorylated in
both species have a significantly better match to consensus kinase motifs than those that are
conserved at the residue level but phosphorylated in one species only.
66
67
Figure 4.2. Analysis of NetPhorest scores for the different classes of sites.
(A) Fraction of StC sites as a function of NetPhorest scores and protein abundance. (B) Comparison
of NetPhorest scores for human and mouse phosphorylated and non-phosphorylated residues
(Wilcoxon tests). (C) Comparison of NetPhorest scores for StD sites (Wilcoxon tests). (D)
Correlation between human and mouse NetPhorest scores for StC sites (red) and StD sites
phosphorylated in human but not in mouse (black). (E) Correlation between human and mouse
NetPhorest scores for StC sites (red) and StD sites phosphorylated in mouse but not in human
(black). (F) Proportion of phosphorylated sites that have higher NetPhorest scores compared to their
corresponding site in the other species for StC and StD sites. Comparisons of human and mouse
scores calculated with position weight matrices are shown in Figure S4.5. *: p-value < 0.05; **: p-
value < 0.01; ***: p-value < 0.001.
There are several possible explanations for these differences. First, this result could derive
from how predictive tools have been developed. For instance, phosphorylation sites may be
more often studied on abundant proteins, which would imply that kinase prediction tools
are better trained at recognizing phosphorylation sites present on abundant proteins. We
tested this hypothesis and found that there is no increase in the average NetPhorest scores
as a function of protein abundance (Figure S4.8), showing that the NetPhorest classification
is not biased towards sites present in highly abundant proteins. Another possibility is that
StD sites contain a significantly higher proportion of false-positive phosphorylation sites
compared to StC sites, as the latter have been found to be phosphorylated in the two species
in completely independent experiments and thus have much stronger experimental support.
Indeed, false positive sites would have low NetPhorest scores, similar to non-
phosphorylated ones and would therefore contribute lowering the average NetPhorest score
for the residues that are phosphorylated in StD sites compared to StC sites. A third
possibility is that StD sites could contain a proportion of non-functional phosphorylation
sites with non-consensus motifs as shown before by Landry and collaborators (Landry et al,
2009) who found that phosphorylation sites matching kinase motifs have a higher degree of
evolutionary conservation and are thus more likely to be functional. Altogether, these
results suggest that the match to a consensus sequence motif could be used to the
68
prioritization of phosphorylate sites for downstream functional analysis in
phosphoproteomics experiments.
Despite these potentially confounding factors, we found evidence that StD is at least partly
caused by divergence in regulatory motifs. We found that scores of phosphorylated StD
sites are significantly higher than those of their non-phosphorylated orthologous
counterparts in both pairwise comparisons (phosphorylated in human vs. non-
phosphorylated in mouse, median scores: 0.216 vs. 0.214, p-value = 3.93 × 10-5;
phosphorylated in mouse vs. non-phosphorylated in humans, median scores: 0.255 vs.
0.245, p-value = 6.38 × 10-5; Figure 4.2C). The fact that we see the effects in both
directions rules out the possibility that NetPhorest scores are systematically higher in
humans. In order to identify among the set of StD sites the ones that have the potential to be
true StD sites, we directly compared matching orthologous NetPhorest scores of StC and
StD sites. We found a strong correlation between the NetPhorest scores for StC sites (rho =
0.95, p-value < 2.2 × 10-16) and a weaker correlation between the scores of the StD sites,
and this both for those phosphorylated in humans but not in mouse (rho = 0.89, p-value <
2.2 × 10-16, Figure 4.2D) and for those phosphorylated in mouse but not in humans (rho =
0.88, p-value < 2.2 × 10-16, Figure 4.2E). This result is confirmed when comparing the
proportion of StD sites having higher scores in humans than in mouse to the same
proportion calculated for StC. We found a slight but significant excess of StD sites having
higher scores in human than in mouse compared to StC sites (proportions: 0.284 vs. 0.258,
p-value = 8.69 × 10-13, Figure 4.2F). We found similar results for the StD sites having
higher scores in mouse compared to humans (proportions: 0.291 vs. 0.261, p-value = 8.69 ×
10-11, Figure 4.2F). By summing up all these excess StD sites that show high NetPhorest
scores in one organism but low scores in the other we concluded that that at least 5% of the
StD sites (either phosphorylated in human or mouse) present in our dataset have the
potential to be sites that are differentially regulated between species, despite a conservation
of the actual phosphorylatable residues. Our results do not depend on the NetPhorest
algorithm as we performed the same analyses using position weight matrices available from
the literature (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al,
2009; Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010;
69
Pike et al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012) and all of
our conclusions about StC and StD sites were mirrored in these tests, as shown in Figure
S4.5. Overall, our results show that in addition to the actual divergence in phosphorylated
sites (SiD), a significant fraction of the mouse and human phosphoproteomes have diverged
through changes in the kinase recognition motifs. These changes in the phosphoregulatory
status of proteins represent changes in the protein regulatory network, as illustrated for a
particular subnetwork in Figure 4.3.
Figure 4.3. Comparison of a pair of StC and StD sites.
(A) Example of StC site (human protein: NUCL; site S28). Both sites are predicted to be
phosphorylated by the same kinase (CK2) by NetPhorest. The human and mouse kinase-
phosphorylation networks are shown for the 10 StC sites with the highest NetPhorest scores (Table
S2). The width of the edges is proportional to the NetPhorest score. (B) Example of StD site
(human protein: NIN; site S1145). The two phosphorylation sites are predicted to be phosphorylated
by different kinases (human: CK2, mouse DMPK) by NetPhorest. The human and mouse kinase-
phosphorylation networks are shown for the 10 StD sites with the highest difference in NetPhorest
70
scores (Annex 2, Table S4.2). Dotted lines represent predicted kinase-phosphorylation site
associations that have been rewired in mouse considering the human network as reference.
Potential StD sites are located in proteins that have fundamental cellular functions, making
them good candidates for the investigation of species-specific mechanisms of regulation.
Further examples are available in Annex 2, Table S4.2.
4.5.3 - Evolutionary turnover of mammalian phosphorylation sites
We next examined whether the positional turnover of phosphorylation sites could
contribute to SiD between mouse and humans. One prediction of this model is that sites
that are lost in one lineage could be compensated for by the gain of other sites in the
proximity (Freschi et al, 2011). Similarly, sites could change their positions as a result of
insertions and deletions in the surrounding regions. In order to test this prediction, we
developed an algorithm to identify evolutionary clustered sites (Freschi et al, 2011), i.e.
pairs of sites that are SiD between mouse and humans and that are closer to each other in
the linear protein space than expected by chance alone (Annex 2, Figure S4.1).
We found that 123 site pairs belonging to 68 proteins show significant evolutionary
clustering of SiD phosphorylation sites (Annex 2, Table S4.3; alignments are available in
Annex 2, Dataset S1). Ninety percent of the proteins that contain evolutionary clustered site
pairs have only one or two of them (Annex 2, Figure S4.9) with few exceptions (Annex2,
Table S4.4). This number also excludes proteins for which we found a high number of
evolutionary clustered site pairs due to large clusters of sites that we did not consider
(NOL8, 10; KI67, 27; MDC1, 180 site pairs). The median NetPhorest score for these sites
is 0.29, suggesting that they are likely to be phosphorylated and not false-positives (0.20 is
the median score for non-phosphorylated residues while 0.32 is the median score for
phosphorylated residues). The typical window within which we found significant clustering
between SiD sites is 10 amino acids (Annex 2, Figure S4.10) and approximately 80% of the
sites are less than 40 amino acids distant in the alignment. The observed number of site
pairs (n = 123) is likely an underestimate of the contribution of evolutionary site turnover
71
because we need many possible configurations in the neutral model to identify them and
phosphorproteomes have likely been under sampled. We found that the proportion of
proteins that show significant evolutionary clustering increases with the proportion of
available sites (Annex 2, Figure S4.11). Furthermore, we found that the number of
evolutionary clustered sites is correlated with protein size (rho = 0.26, p-value = 0.03) and
may thus be biased towards large proteins.
If these clustered SiD sites were functionally equivalent at the network level between the
two species, we would expect them to be phosphorylated by the same kinases or group of
kinases. We used again NetPhorest to test this hypothesis. We determined the proportion of
StC, StD and evolutionary clustered sites that were likely to be phosphorylated by the same
kinases or group of kinases (overlap of one or more kinases among the three best kinases
predicted by NetPhorest) (Tan et al, 2009) and we compared these observations to the
random expectations obtained by shuffling the mouse kinase-substrate associations. We
found that the proportion of StC and StD sites predicted to be phosphorylated by the same
kinases or group of kinases was more than 7 times greater than expected by chance alone,
suggesting that, globally, these sites tend to be phosphorylated by the same kinases or
group of kinases (Figure 4.4A-B).
Figure 4.4. Proportion of sites that are phosphorylated by the same protein kinase.
(A) Proportion of sites phosphorylated by the same kinases (NetPhorest predictions) for the
different categories of sites (StD: state diverged, StC: state conserved, ECS: evolutionary clustered
72
sites). Black dots represent the observed proportion. Orange lines represent the range of proportions
expected by chance alone. P-values for StC and StD: < 0.0001; p-value for ECS: 0.03. The
histogram shows the distribution expected proportions for ECS. A similar analysis was performed
using position weight matrices (Figure S4.5). (B) Proportion of sites phosphorylated by one or more
shared kinases (kinase group) among the three best kinases predicted to be associated with each site
according to NetPhorest. P-values for StC, StD and ECS: < 0.01.
We found a slightly significant tendency (p-value = 0.03) for the evolutionary clustered
sites to be phosphorylated by the same kinase (Figure 4.4A). We then performed the same
analysis, but considering the three best kinases found by NetPhorest assuming that
phosphorylation sites could be functionally conserved if they are phosphorylated by closely
related kinases as well, as in Tan et al. (Tan et al, 2009). We found that evolutionary
clustered sites were 1.4 times more likely to be phosphorylated by the same group of
kinases than expected by chance alone (p-value < 0.01; Figure 4.4B). This result suggests
that, in general, many evolutionary clustered sites may actually be functionally equivalent.
Finally, we performed this analysis using position weight matrices available from the
literature (Bullock et al, 2009; Bullock et al, 2005; Bunkoczi et al, 2007; Davis et al, 2009;
Filippakopoulos et al, 2008; Gwinn et al, 2008; Hutti et al, 2004; Kikani et al, 2010; Pike et
al, 2008; Rennefahrt et al, 2007; Sheridan et al, 2008; Wong et al, 2012) and found
qualitatively similar results (Annex 2, Figure S4.5F).
Evolutionary clustered sites could arise through losses and gains of phosphorylation sites in
the two lineages. Our algorithm identifies evolutionary clustered sites, but it cannot tell
whether these represent gains of phosphorylation sites that compensated for deleterious
losses in the same lineage or whether they were simply the result of indels that affected the
position of the sites in the human and mouse protein alignments. We therefore aligned the
mouse and human proteins with several orthologs belonging to species that diverged after
the human-mouse divergence (Figure 4.5A) and manually curated the data in order to
identify the possible evolutionary steps that led to these configurations of phosphorylation
sites.
73
Figure 4.5. Evolutionary histories of candidate functionally redundant site pairs.
(A) Phylogeny of the species considered for the analysis of evolutionary clustered sites. For all
species we show the species name, the three-letter identifier and the common name. (B) Alignment
of the Fanconi anemia group M protein (FANCM). Evolutionary clustered sites are indicated in
bold. Residues that have been reported to be phosphorylated are on a green background. (C)
Alignment of the disabled homolog 2 protein (DAB2). (D) Alignment of the low-density lipoprotein
receptor-related protein (LRP2).
74
We manually identified many cases (n = 17, 14%) of evolutionary clustered sites that were
most likely caused by indels changing protein length and thus alignment. An example is in
the Fanconi anemia group M protein, an ATPase implicated in DNA repair (Meetei et al,
2005) in which S1673 and S1674 are shifted towards the C-terminal in the mouse lineage
(Figure 4.5B). The remaining 86% (n = 106) of the cases of evolutionarily clustered sites
could not be simply explained by indels and may thus represent compensatory evolutionary
events. We observed such a case in the protein DAB2 (human site: S723; mouse site:
S731), which plays a potential role in ovarian carcinogenesis (Fazili et al, 1999) (Figure
4.5C). The human S723 has been gained after the split of the Haplorrhini from the other
primates, while the second one (S731) has been lost after the split between the rodents and
the primates. Another example involves the human T4634 and the mouse site S4632 on
LRP2 (Figure 4.5D). This protein is a membrane receptor of absorptive epithelial cells.
Mutations in this protein are associated with Donnai-Barrow syndrome, a genetic syndrome
that leads to defects in vision, hearing, craniofacial features and structural abnormalities in
brain (Kantarci et al, 2007). In this case the human T4634 site appeared in primates after
the split from rodents, while the mouse S4632 site was lost after the split of the
Strepsirrhini from the other primates. The biological function of these phosphorylation sites
has not been determined but they represent prime candidates for exploring, at the molecular
level, the positional redundancy of phosphorylation sites.
4.6 – Conclusion
Here we compared the human and mouse phosphoproteomes in order to gain a detailed
picture of phosphoregulatory conservation and divergence between these two species. We
found that, globally, phosphorylation sites tend to be conserved between human and mouse.
By using phosphorylation data from both species, we showed that the number of the sites
that are phosphorylated in both human or mouse is 2.5 times higher than expected by
chance alone. In addition, we estimated phosphorylation status divergence. We found that
the majority of phosphorylation sites that are conserved at the residue level between human
and mouse are actually divergent with respect of their phosphorylation status (StD sites).
While this is most likely largely due to incomplete coverage between the two species, we
75
showed that at least 5% of the StD sites are actually diverging at the kinase-substrate
interaction level. We also found that phosphorylation sites that are phosphorylated in both
species are more likely to be functional and have higher kinase assignment scores,
suggesting that this conservation criterion could be used to prioritize phosphorylation sites
for further characterization (Beltrao et al, 2012; Landry et al, 2009). Taken together, these
results suggest that more data is needed in these two species to be able to completely assess
the conservation and divergence of their phosphoproteomes. Furthermore, the candidate
StD sites might have specific regulatory properties that still have to be characterized and
understood. A better understanding of these properties will allow us to make an important
step towards in our attempt to describe and explain how small regulatory differences map
to the important phenotypic differences among species. Mouse is the best model system to
study human biology and diseases. It is therefore important to understand how these two
species diverge and phosphoregulatory evolution may play an important role.
We identified sites that are phosphorylated in one species but that have diverged in the
other so that the site is not phosphorylatable (SiD sites). While the biological meaning of
the majority of these sites still remains to be assessed, our analysis suggests that many of
them could be functionally redundant. This result supports the finding by Moses and
collaborators that phosphorylation site evolutionary turnover has a role in shaping
phosphoregulation (Moses et al, 2007). If the redundancy hypothesis holds true, we might
need to revisit estimations of phosphorylation conservation, since omitting positional
redundancy may lead to an underestimation of phosphorylation site functional
conservation. Moreover, this implies that we should consider different categories of
phosphorylation sites: the ones for which the position along the protein is a determinant for
their function (positionally-dependent phosphorylation sites) and those for which the global
charge rather than the exact position is responsible for their function (positionally-flexible
phosphorylation sites).
76
4.7 – Acknowledgements
We thank A. Moses, A. Nguyen Ba and all members of the Landry laboratory for their
comments on the manuscript. We also thank B. Turk (Yale University) for providing the
position weight matrices used in this study. This work was supported by Canadian
Institutes of Health Research (CIHR) (GMX-191597). C. R Landry is a CIHR New
Investigator. L. Freschi was supported by a fellowship from the Fonds de Recherche du
Québec - Nature et Technologies (FRQ-NT) and L. Freschi and M. Osseni by the Quebec
Research Network on Protein Function, Structure and Engineering (PROTEO).
77
Chapter 5 – Cross-talk between O-GlcNAcylation and phosphorylation in mammalian proteomes
78
5.1 – Résumé
Les modifications post-traductionnelles sont des interrupteurs moléculaires qui permettent à
la cellule d’exercer un contrôle fin sur la fonction de ses protéines. Dans certains cas un
résidu peut subir plusieurs de ces modifications qui peuvent activer/désactiver la même
fonction de la protéine ou des fonctions différentes. C’est le cas, par exemple, de la
phosphorylation et de la glycosylation qui affectent les sérines et thréonines des protéines.
Ici, nous avons étudié si ces deux modifications pouvaient agir comme des interrupteurs
pour la même fonction biologique ou pour des fonctions différentes. Nous avons trouvé que
les résidus qui peuvent atteindre trois états (non modifié, phosphorylé, O-GlcNAcylé) ont
un niveau de conservation plus élevé comparé comparativement à ceux qui ne peuvent
atteindre que deux états (non modifié, phosphorylé ou non modifié, O-GlcNAcylé). De
plus, nous avons trouvé que les résidus qui peuvent atteindre trois états ont tendance à être
phosphorylés par des kinases différentes comparativement aux résidus qui peuvent
atteindre deux états seulement. Nos résultats supportent l’hypothèse que la phosphorylation
et la O-GlcNAcylation contrôlent deux fonctions différentes plutôt que la même fonction.
79
5.2 - Abstract
Post-translational modifications (PTMs) are molecular switches that allow the cell to finely
tune proteins functions. In some cases a residue can be modified by multiple and alternative
PTMs that can activate/deactivate the same protein function or different functions. This is
the case for serine and threonine residues, that can be phosphorylated and O-GlcNAcylated.
Here, we investigate wheather these two PTMs may act as switches for the same biological
function or different functions. We found that there is a greater evolutionary constraint for
the residues that can shuttle between 3 states (non-modified, phosphorylated, O-
GlcNAcylated) compared to the ones that can shuttle between 2 states only (non-modified,
phosphorylated or non-modified, O-GlcNAcylated). Moreover, we found that 3-state and 2-
state residues are likely to be regulated by different sets of kinases. Our results support the
hypothesis that at least in humans, phosphorylation and O-GlcNAcylation control multiple
functions rather than the same one.
80
5.3 - Introduction
Post-translational modifications (PTMs) are chemical modifications of proteins that allow
the modulation of protein functions and represent a mean to extend the coding capacity of
genes (Prabakaran et al, 2012). PTMs modulate protein activity, localization, degradation
and interactions (Khmelinskii et al, 2009; Madeo et al, 1998; Sprang et al, 1988; Vazquez
et al, 2000). Proteins can undergo several PTMs and progresses achieved in mass
spectrometry technologies in the last decade allow to screen entire proteomes for the
identification and quantification of these PTMs (Olsen & Mann, 2013). Examples of PTMs
include protein phosphorylation, the addition of a phosphate group to serines, threonines
and tyrosines and O-GlcNAcylation, the addition of an O-linked β-N-acetylglucosamine
moiety to serines and threonins (Zeidan & Hart, 2010). Given the large number of
modifications any protein can bear, one major question that emerged recently is whether
these PTMs affect each other’s function, i.e. whether they cross talk to each other (Beltrao
et al, 2013; Brooks & Gu, 2003; Hunter, 2007; Latham & Dent, 2007). This interaction
would in principle define a PTM “code” that would allow the cell to implement complex
regulatory programs at the level of single proteins. Indeed, each PTM allows the protein to
assume a new configuration or state that often determines changes in protein function
(Deribe et al, 2010).
Two general modes of cross-talk have been reported in the literature: positive and negative
cross-talk (Hunter, 2007). Positive cross-talk refers to a scenario in which one PTM
promotes the direct or indirect addition or removal of a second modification. An example
of this mode of cross-talk is the phosphorylation-dependent ubiquitynation of the Sic1p
protein in yeast (Nash et al, 2001; Verma et al, 1997). Sic1p is an inhibitor of the Cyclin
Dependent Kinases (CDK), important regulators of cell cycle progression. This inhibition
has to be released in order to to start DNA replication. The phosphorylation of Sic1p by
CDKs at multiple sites allows the ubiquitinylation of Sic1 by Cdc4. This event determines
Sic1p degradation, thus allowing the cell to progress through the cell cycle. Another
notable example of positive cross-talk is the interplay between lysine residues on the
human histones whereby the methylation of Lys-27 of histone H3 increases the probability
of the methylation of Lys-36 (Schwammle et al, 2014).
81
The opposite mode of action, the negative cross-talk, implies that one PTM impedes
another modification to occur. Examples of negative cross talk have been reported and
include again the methylated lysines of histone H3. For instance, the tri-methylation of Lys-
4 inhibits the methylation of Lys-9 (Schwammle et al, 2014). The importance of these
cross-talks is illustrated by their use by microorganisms to take the control of the cell or
shut down the immunitary response. Indeed, an example of this scenario is represented by
the human protein MAPKK6. The phosphorylation of this protein on critical serine and
threonine residues is required to activate the downstream MAPK kinases in the innate
immune response to pathogens. Mukherjee and collaborators (Mukherjee et al, 2006) found
that Yersinia species use the effector protein YopJ to acetylate these critical residues on
MAPKK6. This competition between phosphorylation and acetylation for the same sites
prevents the activation of MAPKK6, allowing Yersinia to usurp the eukaryotic cellular
signalling and block a pathway that is crucial for the innate immune response activation.
Of particular interest are the cross talks among PTMs that occur on the same residues.
PTMs occurring on the same residues are by definition exclusive and thus have the
potential to directly affect each other’s functions. Examples of such PTMs reported in the
literature are acetylation, ubiquitinylation, methylation and SUMOylation in lysines
residues (Latham & Dent, 2007) as well as O-GlcNAcylation and phosphorylation in
serines and threonines residues (Hart et al, 2011). Although different PTMs can regulate the
same function, they could also regulate different protein functions (Beltrao et al, 2013;
Benayoun & Veitia, 2009). For instance, previous studies have shown that the cross-talk
between protein lysine acetylation and ubiquitination has effects on protein stability (Caron
et al, 2005). Acetylation at lysine residues of proteins prevents their ubiquitination and,
ultimately, their degradation. In this case, the cross-talk between acetylation and
ubiquitinylation regulates the same protein property or function: protein stability. On the
other hand, different PTMs occurring on the same site can regulate two distinct functions in
different contexts, i.e. for instance in different tissues, steps of the cell-cycle or different
cell compartments. For instance, Kamemura and collaborators (Kamemura et al, 2002)
found that the Thr-58 residue of the c-Myc protein is preferentially phosphorylated or O-
82
GlcNAcylated in a condition dependent fashion in presence or absence of mitogens,
suggesting that different cellular roles of c-Myc are regulated by these two PTMs.
Here we examine the putative cross talk between protein phosphorylation and O-
GlcNAcylation. In the last decade the interest for O-GlcNAcylation and its interaction with
phosphorylation has grown as showed by the recent studies that have unveiled the role of
this modification in regulating key steps of cellular metabolism (Ruan et al, 2013). Further,
O-GlcNAcylation is one of the few post-translational modifications for which more than
1,000 sites have been experimentally detected (Khoury et al., 2011).
Phosphorylation and O-GlcNAcylation occur on a specific set of serine and threonine
residue. Some residues are not modified and therefore can be only found in one
configuration (1-state sites); others are phosphorylated but not glycosylated or vice-versa,
thus having one more possible configuration (2-state sites). Finally some sites can be
glycosylated and phosphorylated (on different molecules of the same protein or at different
times) and can therefore be found in three states (3-state sites). We sought to determine if
phosphorylation and O-GlcNAcylation may act on 3-state sites as two independent
switches that regulate different biological functions or may act as a single switch to
regulate one single function. We hypothesized that the two PTMs regulate two functions. In
this case, we should observe that (i) phosphorylation and O-GlcNAcylation do occur at the
same residues more often than expected by chance. In addition, we would expect to observe
that (ii) the 3-state sites evolve slower than the 2-state ones, since the two functions
constitute a stronger constraint. This trend should be observed if we consider the site or its
flanking regions, since PTM sites are defined by motifs of amino acids rather than single
amino acids. Finally, we would expect (iii) the 3-state sites to show different preferences
for protein kinases compared to the 2-state ones, since for the 3-state sites the
phosphorylation is expected to be more condition-dependent (e.g. (Kamemura et al, 2002)).
We tested all these predictions on the human and mouse phosphoproteomes. Overall, our
analyses support the hypotheses that there is a cross-talk between phosphorylation and O-
GlcNAcylation and that these two PTMs are likely to control different cellular functions.
83
5.4 - Methods
Phosphorylation, O-GlcNAcylation, protein disorder, sequence data and protein
abundance data
An extensive dataset of human and mouse phosphorylation and O-GlcNAcylation sites was
built by combining data from 8 different databases and experimental studies about
phosphorylation (Beltrao et al, 2012; Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al,
2012; Huttlin et al, 2010; Keshava Prasad et al, 2009; Minguez et al, 2012; Trinidad et al,
2012) and 5 ones about O-GlcNAcylation (Alfaro et al, 2012; Hornbeck et al, 2012; Lu et
al, 2013; Trinidad et al, 2012; Wang et al, 2011). To our knowledge this set of
phosphorylation and O-GlcNAcylation sites is representative of the data currently available
on the literature. All protein sequences and orthology relationships were retrieved from
ENSEMBL (version 69). Only protein sequences for which we could find orthology
relationships between a human protein and at least a mouse, dog and opossum protein were
considered. For humans and mouse, orthology relationships were determined for the
longest isoforms of each protein. Each group of orthologous sequences was aligned using
MUSCLE (Edgar, 2004). Disordered and ordered regions of proteins were predicted using
DISOPRED (Ward et al, 2004a). In order to map phosphorylation sites to our sequences,
the following procedure was applied. The sites that were already mapped onto proteins
associated with ENSEMBL IDs in the original datasets were directly mapped to our
sequences. For all other cases, phosphopeptides were mapped onto proteins using BLAT
(Kent, 2002). All peptides that mapped to more than one protein were removed at this step.
Mapped phosphorylation and O-GlcNAcylation sites and information about protein
disorder are available in Dataset S5.1 (available on request). Finally, data about protein
abudance was retrieved from PaxDb (Wang et al, 2012).
Shuffling procedure used to determine random expectations
In order to calculate the random expectation for 3-state modified sites, O-GlcNAcylation
sites were shuffled in each protein by preserving the overall proportion of sites for each
residue (S or T) and the localization in disordered/ordered regions. The null distributions
84
were estimated by iterating this procedure 1000 times, calculating each time the number of
3-state sites. To calculate the random expectations for the localization of the 3-state sites in
disordered or ordered regions we considered the two PTMs as one single modification and
we performed the shuffling reassigning this modification preserving the overall proportion
of co-occurrences per residue (S or T).
Evolutionary conservation
The Rate4Site software with default options was used to calculate the evolutionary rates for
the 1-state, 2-state and 3-state serines and threonines (Pupko et al, 2002). The raw
evolutationary rates were normalized with the following procedure. For each residue type
(e.g. serine located in a disordered region) the average evolutionary rate for that residue
type in that protein was calculated. Then, for each residue the evolutionary rate calculated
with Rate4Site was divided by the average evolutionary rate of the residue type in that
protein. In order to avoid the bias of having a different number of species in the alignments
used to determine the evolutionary rates, the same analysis was performed using alignments
from a previous study (Landry et al, 2009). Finally, in order to avoid the potential biased
determined by the algorithm used to calculate the evolutionary rates the analyses were also
performed with another algorithm, as described by (Gray & Kumar, 2011).
Kinase-phosphorylation site associations
NetPhorest (Miller et al, 2008) was downloaded from (http://netphorest.info) and was run
locally using default options. The kinase-phosphorylation site associations were determined
by ranking all possible associations determined by NetPhorest according to their score and
taking the one with the best score.
85
5.5 - Results
5.5.1 - An extensive dataset of phosphorylation and O-GlcNAcylated sites
We built a dataset of human (n = 86,065) and mouse (n = 43,013) phosphorylation sites by
collecting data from 8 different databases and experimental studies (Beltrao et al, 2012;
Dinkel et al, 2011; Gnad et al, 2011; Hornbeck et al, 2012; Huttlin et al, 2010; Keshava
Prasad et al, 2009; Minguez et al, 2012; Trinidad et al, 2012). We successfully mapped
these sites onto 8,889 human and 5,903 mouse proteins. We also built a dataset O-
GlcNAcylation sites in human (n = 613) and mouse (n = 810) from 5 different databases
and experimental studies (Alfaro et al, 2012; Hornbeck et al, 2012; Lu et al, 2013; Trinidad
et al, 2012; Wang et al, 2011). We mapped these sites onto 262 human and 316 mouse
proteins respectively, in which we counted 105 and 156 co-occurrences (in humans and
mouse, respectively). Sixty-five human proteins and 84 mouse proteins contained at least
one co-occurrence of phosphorylation and O-GlcNAcylation sites (3-state sites). Previous
studies reported that phosphorylation sites tend to be located in disordered (unstructured)
regions (Iakoucheva et al, 2004; Landry et al, 2009). Three-state residues also tend to be
located in disordered regions in both organisms (p-value < 0.005; Annex 3, Figure S5.1).
5.5.2 – Phosphorylation and O-GlcNAcylation are found in the same residues more
than expected by chance alone
We first examined whether within the co-modified proteins the two PTMs occur on the
same residue. We counted the number of 3-state residues and we randomly shuffled the O-
GlcNAcylation sites in each protein to generate a null model that reflects the random
expectations for each species separately. We found that the number of 3-state residues is
1.3-times greater than expected by chance in both species (p-value < 0.001), thus
supporting rejecting our null hypothesis of independence between phosphorylation and O-
GlcNAcylation (Figure 5.1).
86
Figure 5.1. Number of 3-state sites in human and mouse and comparisons to random
expectations.
Number of 3-state (phosphorylatable and O-GlcNAcylatable) sites in human (A) and mouse (B) and comparisons to random expectations (1000 iterations, p-value < 0.001).
One potential confounding factor of our analyses is that phosphorylation and O-
GlcNAcylation tend to be sampled more often on highly-abundant proteins, which would
artificially inflate their co-occurrence. For instance, we recently showed that there is a
detection bias for phosphorylation sites towards highly abundant proteins (e.g. (Freschi et
al, 2014)) and this could also be true for O-GlcNAcylated proteins, thereby increasing the
probably of finding both modifications on highly abundant proteins. Our results are in line
with these expectations (Figure 5.2A,B), suggesting that in general, PTMs tend to be
preferentially detected on highly abundant proteins.
87
88
Figure 5.2. Fraction of sites as a function of protein abundance for human and mouse O-
GlcNAcylation sites and comparison of average protein abundance between all proteins and
proteins that contain 3-state sites for humans and mouse.
Fraction of sites as a function of protein abundance for human (A) and mouse (B) O-
GlcNAcylation sites and comparison of average protein abundance between all proteins and
proteins that contain 3-state sites for humans (C) and mouse (B). Data about protein abudance was
retrieved from PaxDb (Wang et al, 2012).
We also found that 3-state sites tend to be preferentially found in proteins with high
average protein abundance in mouse, but not in humans (Figure 5.2C,D). This difference
could reflect a functional difference between humans and mice, but more likely it is a side
effect of biases in the protein sampling in mouse. We examined the distributions of protein
abundance for the proteins with 3-state sites, showing that the sampling bias towards highly
abundant that we observed in mouse is common across the different studies (Annex 3,
Figure S5.2).
5.5.3 - Clues of independent regulation of multiple functions in humans but not in mouse
In previous studies of phosphorylation site evolution, residue conservation has been
associated to functional roles in protein regulation (Beltrao et al, 2012). We therefore
hypothesized that if phosphorylation and O-GlcNAcylation regulate independent protein
functions, 3-state sites should be more conserved than 2-state sites while if they regulate the
same function, the evolutionary rates of the 3-state sites should be approximately the same
as 2-state sites. We estimated the rates of evolution of all serines and threonines of
phosphorylated or glycosylated proteins using alignments of 16 species. We normalized
each rate by the average evolutionary rate of each residue type within each protein (see
methods) so that the rates become independent of protein abundance or structural properties
(order or disorder), both of which have been shown to affect rates of evolution (Landry et
al, 2009; Levy et al, 2012). We compared the distribution of the evolutionary rates of 1-
state (non-modified) serines and threonines to those of 2-state and 3-state modified
residues. We found that 3-state modified residues are more conserved over evolution
89
compared to 2-state and 1-state residues (Figure 5.3A) in humans. We also observed that on
average O-GlcNAcylated sites are more conserved than phosphorylation sites. However,
we did not observe the same trend in mouse (Figure 5.3B).
90
Figure 5.3. Comparison of residue conservation for 1-state, 2-state and 3-state residues in the
human and mouse proteomes.
Comparison of residue conservation for 1-state, 2-state and 3-state in the human (A) and mouse (B)
proteomes (Wilcoxon tests: n.s.: non-significant; *: p-value < 0.05; **: p-value < 0.01; ***: p-value
< 0.001). Panels (C) and (D) show the same analysis on the human proteome using a different
measure of evolutionary conservation (Gray & Kumar, 2011) or different sequence alignments
(Landry et al, 2009). A green circle indicates phosphorylation, while a blue one indicates O-
GlcNAcylation.
In order to avoid a potential bias determined by the number of species used to calculate the
evolutionary rates or the method used, we performed the same analysis using an
independent method (Figure 5.3C) (Gray & Kumar, 2011) and alignments that have been
used to calculate evolutionary rates in previous studies (Figure 5.3D) (Landry et al, 2009).
We also looked at the regions (+/- 5 amino acids) surrounding the 1-state, 2-state and 3-
state sites and we found that 3-state sites tend to be more evolutionary conserved compared
to 1-state and 2-state sites (Figure 5.4).
91
Figure 5.4. Comparison of the evolutionary conservation of the regions surrounding 1-state, 2-
state and 3-state residues (+/- 5 amino acids) for the human proteome.
(Wilcoxon tests: *: p-value < 0.05; **: p-value < 0.01; ***: p-value < 0.001). A green circle
indicates phosphorylation, while a blue one indicates O-GlcNAcylation.
This result again supports out hypothesis that phosphorylation and O-GlcNAcylation
overall are likely to control independent functions.
5.5.4 – Three state sites and 2-state ones have different preferences for protein kinases
If 3-state sites allow the regulation of multiple functions, they should have some features
that distinguish them from 2-state ones. Both 2-state phosphorylated and 3-state sites can
be phosphorylated. We would therefore expect 3-state sites to be phosphorylated by a set of
kinases that differs from the ones for 2-state ones. To test this prediction, we determined
the kinase-phosphorylation site associations for all the phosphorylated and 3-state residues
using NetPhorest ((Miller et al, 2008), see also methods) and we compared the likelihoods
being phosphorylated by a given kinase for 2-state phosphorylated residues and 3-state
residues. We found that 3-state residues show a clear preference for certain kinases
92
compared to the residues that are phosphorylated only, and this holds true for both mouse
and human (Figure 5.5).
Figure 5.5. Kinase preferences of 3-state residues for human and mouse proteins.
Kinase preferences of 3-state residues for human (A) and mouse (B) proteins. The associations were
determined using NetPhorest. Colored bars represent significant trends (p-value < 0.05): green
indicates significat preference while orange indicates significat avoidance.
Examples of such kinases include ATM/ATR, GSK and RCK. The example of ATM is of
particular interest, since this protein is involved in the response to DNA damage and its
deregulation leads to cancer (Kastan, 2008). The link between O-GlcNAcylation and ATM
has been already been shown in a recent study (Miura et al, 2012). Moreover, the function
of this protein is also regulated by protein phosphorylation (Kozlov et al, 2011). The role of
the 3-state sites in this protein has now to be investigated experimentally in order to
understand how they integrate the regulatory programs encoded by phosphorylation and O-
GlcNAcylation.
93
5.6 - Conclusion
Here, we focused on the cross-talk between phosphorylation and O-GlcNAcylation in
human and mouse. We sought to extend the previous studies on the cross-talk about these
two PTMs (Hart et al, 2011; Zeidan & Hart, 2010) by performing a proteome-wide analysis
in which we used the most recent available proteomics data. We found that the number of
3-state (glycosylated and phosphorylated) serines and threonines is grater than expected by
chance, suggesting that these sites could have a potential role for protein regulation.
Evolutionary conservation is an indicator of functionality (Beltrao et al, 2012; Landry et al,
2009) and our results show that in humans the 3-state sites tend to be significantly more
conserved than both the 2-state ones. This suggests that phosphorylation and O-
GlcNAcylation may act as independent switches to regulate two sets of protein functions,
since if they acted on the same function we would have expected to see the 3-state modified
sites not differing in their level of conservation compared to the 2-state modified ones.
Finally, we tried to associate some putative functions to the 3-state modified sites and we
found that are more often associated with some kinases compared to sites that are
phosphorylated only. Our finding that phosphorylation and O-GlcNAcylation of 3-state
sites are likely to regulate independent functions does not rule out the fact that indeed for
many 3-state sites phosphorylation and glycosylation may act as one single switch. The
most realistic scenario is that the cell uses a mixture of these two modes of function to
finely tune its functions. The study of the cross-talk between phosphorylation and O-
GlcNAcylation is still at its dawn, but the die is cast.
95
Chapter 6 – General conclusions
6.1 - Summary of the study
In this thesis we sudied the evolution of PTM networks to answer the following key questions:
(i) how a PTM regulatory network is rewired after gene duplication and how this
process may contribute to increase the organismal complexity
(ii) how a PTM regulatory network evolves in different species
(iii) how two PTM regulatory networks that share the same target residue interfere
with each other and what are the possible functional consequences of this
interference
In Chapter 2 and 3 we focused on the first one of these questions and we determined to
which extent gene duplication followed by divergence contributed rewiring a specific
eukaryotic PTM regulatory network: the phosphoregulatory network of the budding yeast
S. cerevisiae. Our results (Chapter 2) show that 100 million years of evolution were
sufficient to extensively rewire this PTM network. We observed major changes both at the
level of phosphorylation sites and at the level of the network of kinases that phosphorylate
them so that 95% of the PTM profiles and up to 50% of the kinase-phosphorylation sites
associations have changed between paralogs. We then investigated the evolutionary
mechanisms responsible for this rewiring and we found that phosphorylation sites tended to
be lost rather than gained between paralogous proteins. We proposed that this mechanism
could potentially contribute increasing the biological complexity and the fitness of the cell.
Indeed, in the case of multi-functional proteins which functions are regulated by multiple
independent phosphorylation sites, a duplication event followed by the differential loss of
phosphorylation sites would allow the two duplicates to split the functions between each
other. Finally, we also showed that at least a fraction of sites that have been lost beween
paralogs may actually have been compensated by the emergence of new phosphorylation
sites at positions close to the original ones (evolutionary turnover of phosphotylation sites).
Overall, our results show the effects of a duplication event on a PTM regulatory network,
pointing out the importance of this mechanisms to lead to biological innovations.
96
In Chapter 3 we continued on the path paved by Chapter 2 by investigating in detail the
evolutionary trajectories of the phosphorylation sites that are lost in WGD paralogs. We
found that a significant fraction of them tend to be preferentially lost towards aspartic and
glutamic acid (Asp and Glu) residues. This is an interesting finding because these two
amino acids have chemical properties that mimick those of phosphorylated serines and
threonines (both Asp and Glu are negatively charged amino acids). By looking at the
genetic code we noticed that this kind of transitions from a phosphorylatable amino acids to
negatively charged ones necessarely require two mutations and imply a non-functional
intermediate, i.e. an amino acid that does not carry any negative charge and cannot be
phosphorylated. If a site is important to regulate an essential function, the presence of the
non-functional intermediate would lead to a fitness defect, making these kind of transitions
unlikely to occur due to purifying selection. We reasoned that gene duplication would
represent a way to bypass this problem since one of the duplicates proteins could
accumulate mutations while the other could exert the original function, allowing the second
mutation to occur. We searched in the literature and we found that examples compatible
with this evolutionary scenario in which gene duplication allows transitions between
phosphorylation sites and negatively charged amino acids have already been observed and
reported (e.g. (Basu et al, 2008)). Our results suggest that this mechanism could be general
and that it has the potential to lead to new regulatory opportunities for the cell. Moreover,
our results also point out once again the importance of gene duplication as a mechanism
that can lead to biological innovations.
Chapter 4 focuses on the second objective of this thesis by investigating the evolutionary
rewiring of the phosphoregulatory network between human and mouse. We found that a
large number of phosphorylation sites are conserved between these two species and that, in
general, purifying selection is acting to mantain them. However, we also found that a lot of
phosphorylation sites are species specific since to a phosphorylated site in one species
corresponds one non-phosphorylatable residue in the other one. These sites represent good
candidates to explain the molecular bases of the phenotypic diveregence observed between
human and mouse. We also found for the first time more subtle differences in
97
phosphoregulation, represented by sites that are conserved at residue level in the two
species but are divergent with respect to the phosphoregulation. The biological impact and
the functions of these sites have now to be assessed. Finally, we reported some
observations that support the hypothesis of the evolutionary turnover of phosphotylation
sites, that states that phosphorylation sites can jump to different but close locations during
evolution and still retain their biological function. In fact, by aligning the human and mouse
phosphoproteins we identified more than 100 site pairs that tended to be found in the same
region of the protein more than expected by chance and tended to be phosphorylated by the
same kinases. These results suggests that evolutionary turnover should be taken into
account when comparing different phosphoproteomes, in order to avoid overestimating the
divergence between them.
In the last chapter (Chapter 5) we focused on the third objective of the thesis by studying
one of the mechanisms by which PTMs regulate protein functions. Different PTMs can
occur at the same residues, meaning that at a given time a residue can carry no PTM or one
of the two PTMs but not both at the same time (interference or cross-talk between PTMs).
The presence of this interference raises the question of whether both PTMs control the
same protein function or each PTM type controls a specific one. We investigated this
problem by studying the phosphorylation and O-GlcNAcylation profiles of human and
mouse. We first found that these two PTMs tend to occur together more than expected by
chance, confirming that there is an interference or cross-talk between phosphorylation and
O-GlcNAcylation. Further, we found that the residues that have been experimentally found
to be phosphorylated and O-GlcNAcylated tend, in general, to have an higher level of
evolutionary conservation than the residues that have been found to be phosphorylated or
O-GlcNAcylated only, suggesting that they may be involved in the regulation of a larger set
of functions compared to the sites that can carry only one of the two PTMs. Finally we
found that the set of kinases that phosphorylate doubly modified residues differs from the
one that phosphorylate the sites that undergo only one modification. All these observations
represent a description of the effects of the interference between two PTM regulatory
networks at global level.
98
6.2 - Perspectives
Although in the last century we have made impressive progresses, a lot of work has still to
be done to understand how PTM networks are organized, how they evolve and how they
are implemented in different organisms. In this section we will propose some research
paths that emerge from this thesis and that can bring us closer to our objective.
First, from our study emerges that there is a need for large scale and small scale
experimental studies of PTMs. Large scale studies are needed in order to saturate the
coverage of the PTMs that have been extensively studied up to now (phosphorylation,
ubiquitination, glycosylation) or add some data for those that we have not studied yet. To
this aim, we need to develop new enrichment protocols that would allow us to take full
advantage of the last generation of mass spectrometry instruments. Small scale studies are
also needed to assign functions to PTMs. Indeed, even for protein phosphorylation, the
most studied PTM, although thousands of phosphorylation sites have been reported in the
literature, the ones with known function remain a small fraction (Landry et al, 2009).
Another important aspect that needs to be further developed is the study of PTM writers
and erasers, and this for several PTM networks. If we want to study how different PTM
networks are integrated and how they participate to cellular regulation we need to know the
spatio-temporal regulatory patterns of the writers and erasers as well as their specificities in
order to link this information to PTM profiles, substrate abundance and PTM site
occupancy.
A complete new dimension that has to be explored in order to understand how PTM
networks work is to study how they rewire in different conditions, as also suggested by
Beltrao and collaborators (Beltrao et al, 2013) in a recent review. These studies would
provide both information about the network configuration in different conditions and also
allow to identify the PTMs that are condition-specific, thus helping us in the task of
assigning functions to PTMs.
99
New data about the profiles of different PTMs in different organisms are also needed to
study the evolution of PTM networks. At the moment we have limited data even for the
model organisms (Gruhler et al, 2005; Holt et al, 2009; Huttlin et al, 2010; Li et al, 2007;
Sharma et al, 2014; Zielinska et al, 2009), however by having more data about model and
non-model organisms we could get a better understanding of species divergence. For
example, we could study how differences in regulation may reflect phenotypic differences.
Further, we could study more in detail the evolutionary events that shaped PTM networks
and contributed to complexify them. Chapters 2 and 3 of this thesis point out the
importance of events like gene duplication in these processes. By having more data about
PTMs (in particular in mammals) we could understand when specific regulations appeared
and how the same biological process can be regulated in a different way in different
species.
Another limit to the understanding of the PTM networks is also that comprehensive
datasets of PTMs for specific organs or tissues are still substantially missing (Hornbeck et
al, 2012). By having these datasets we could study which parts of the whole PTM
regulatory network are of key importance in a cell type, tissue or organ. This approach
should be applied to different PTM networks, to understand how they are integrated
together at different scales from cells to tissues to organs.
Future studies should also investigate the links between PTM regulation and disease. Even
if this topic has not been addressed directly in this thesis, a lot of phosphorylation sites are
located in proteins implicated in diseases but their role remains unknown. The role of
PTMs as markers for disease and disease progression has not been also fully assessed a part
from specific cases (e.g. (Jin & Zangar, 2009)).
All these considerations suggest that we are still far from cracking the code of PTM
networks, i.e. the logic and the dynamics by which PTM networks cross-talk to each other
to regulate cellular functions but, nevertheless the studies of the last decade as well as this
study have set the directions to take in order to get there.
101
Annex 1 – Supplementary information for Chapter 2
Figure S2.1. Comparison of independent studies allows measuring phosphosite conservation
among paralogs.
Cross-study reproducible sites are sites that are phosphorylated in one study and also found to be
phosphorylated in another study for the same set of proteins. Cross-study conserved sites are sites
found to be phosphorylated in one paralog in one study and in its paralog in a second study. Under a
scenario where paralogous proteins were perfectly conserved, these two numbers should be equal as
they are equally affected by false positive and false negative identifications. Thus, the ratio of
cross-study conservation over cross-study reproducibility provides an estimate of the true state-
conservation (Figure 2.1A). Only S/T sites conserved the two paralogs are considered. 0 and 1
indicate nonphosphorylated and phosphorylated sites respectively.
102
Figure S2.2: Distribution of all PWM scores and of the maximal scores used to assign a
protein kinase to a particular phosphosite.
Overall distribution of PWM scores (red) and distribution of the maximal scores for each phosphosite (blue).
Maximum scores were used to assign a kinase to each phosphosites. See methods for details.
103
Figure S2.3. State-conserved sites are more abundant than expected by chance alone.
In order to calculate the expected number of state-conserved sites, we randomly re-assigned
phosphosites to conserved S/T sites of the paralogous phosphoproteins and calculated how many
occurred at homologous positions. This process was repeated 1000 times. There are 118 sites that
occur at homologous positions (236 phosphosites), a number that is 7.4 times higher than expected
by chance alone (P<< 0.001).
104
Figure S2.4. Correlation between the relative abundance of kinases assigned to the global
phosphoproteome and those found in the WGD phosphoproteome.
There is strong positive correlation between the relative fractions of kinases assigned to the global
phosphoproteome and those found in the WGD phosphoproteome (rho = 0.99, p-value < 2×10-16).
This suggests that there is no specific group of kinases that preferentially phosphorylates WDG
phosphoproteins.
105
Figure S2.5. State-diverged nonphosphorylated S/T sites likely comprise 50% of
nonphosphorylated sites.
Nonphosphorylated S/T of the state-diverged sites likely comprise cases that are false negative
identifications. Because sites that have not been reported to be phosphorylated (i.e. randomly
selected S/T) have lower PWM scores than phosphorylated ones (Figure 2.1F), we can use PWM
scores to estimate this proportion. Thus, in order to estimate the true ratio of nonphosphorylated S/T
sites in state-diverged nonphosphorylated sites, we re-created the PWM score distribution ii (sites
not found to be phosphorylated among the state-diverged sites) from Figure 1F by randomly
sampling different ratios of PWM scores from distribution i (non-phosphorylated) and distribution
iv (state-conserved phosphorylated sites). We calculated the median of these distributions and
iterated this procedure 100 times. Each box-plot represents the distribution of the 100 medians
calculated for each of the ratios considered. A mixture of 50% of each distribution gives the same
median as the median score of the state-diverged nonphosphorylated S/T.
106
Figure S2.6. Gains and losses of phosphosites for the literature curated dataset (Dataset 2)
(Nguyen Ba & Moses, 2010).
This dataset contains 394 phosphosites mapped on 118 proteins. The comparison of the number of
gains and losses with the respective null models (see methods) confirms the trend observed in
phosphoproteomics experiments. Random sampling was performed as detailed for Figure 2.2C. The
black square represents the observed numbers in each case.
107
Figure S2.7. Gains and losses of phosphosites for the unfiltered dataset (Dataset 3).
The comparison of the number of gains and losses with the random expectation confirms the trend
for phosphosites to be preferentially lost. Here all phosphosites that correspond to phosphopeptides
that are assigned to more than one protein are also considered. Random sampling is done as detailed
for Figure 2.2C. The black square represents the observed numbers in each case.
108
Figure S2.8. Gains and losses of phosphosites for Dataset 4 (additional species considered for
the inference of the ancestral sequence).
The comparison of the number of gains and losses with the random expectation confirms the trend
for phosphosites to be preferentially lost. Here the ancestral sequences were reconstructed as
detailed in the methods section but including an additional species that diverged from S. cerevisiae
prior to the WGD. Random sampling was done as detailed for Figure 2.2C. The black square
represents the observed numbers in each case.
109
Supplementary Files
Supplementary files can be found at the address:
http://www.bio.ulaval.ca/landrylab/download
File S2.1 Phosphosites identified in L. kluyveri. The file includes information about the position of the site along the protein, the type of residue and the confidence score. File S2.2 Mini-website of alignments of the paralogous pairs, their orthologs and the ancestral sequences. Disordered regions of proteins are indicated by asterisks and phosphorylation sites are in bold.
111
Annex 2 – Supplementary information for Chapter 4 Human Mouse Study #prot S T Y #prot S T Y Minguez et al. 5899 25799 8179 6631 3278 11474 2689 1708 Beltrao et al. 7349 28666 9240 8460 5760 21876 4706 2361 Phosida et al. 2312 7948 2056 454 2818 10001 1704 238 HPRD 4670 21493 6511 2340 - - - - Phosphosite.Org 8355 39796 16160 15610 5607 26010 6538 3958 phosphoELM 2923 11085 2743 1203 1510 3070 636 379 Huttlin et al. - - - - 3193 14679 2849 426 Non-redundant 12341 61401 24257 20962 8179 38718 9821 5501 Table S4.1. Number of phosphoproteins and phosphorylation sites (sorted by phosphorylatable residue) for all the studies we considered as well as the corresponding non-redundant values. Protein Site Sequence NetPhorest
score Predicted
kinase NUCL_HUMAN
ENSP00000318195_28 PKEVEEDSEDEEMSE
0.649539 CK2
NUCL_MOUSE
ENSMUSP00000027438_28
PKEVEEDSEDEEMSE
0.649539 CK2
PAXB1_HUMAN
ENSP00000328992_262 REDENDASDDEDDDE
0.649397 CK2
PAXB1_MOUSE
ENSMUSP00000113835_264
REDENDASDDEDDDE
0.649397 CK2
ARI4A_HUMAN
ENSP00000347602_160 DEKEEESSEEEDEDK
0.649337 CK2
ARI4A_MOUSE
ENSMUSP00000035512_160
DEKEEESSEEEDEDK
0.649337 CK2
B3KYA7_HUMAN
ENSP00000429744_344 LEEEEENSDEDELDS
0.648665 CK2
B3KYA7_MOUSE
ENSMUSP00000018476_316
LEEEEENSDEDELDS
0.648665 CK2
RPC7L_HUMAN
ENSP00000358320_163 KKEEEVTSEEDEEKE
0.648586 CK2
RPC7L_MOUSE
ENSMUSP00000089544_163
KKEEEVTSEEDEEKE
0.648586 CK2
ARI4A_HUMAN
ENSP00000347602_159 EDEKEEESSEEEDED
0.648118 CK2
ARI4A_MOUS ENSMUSP00000035512_1 EDEKEEESSEEEDE 0.648118 CK2
112
E 59 D
ARI4B_HUMAN
ENSP00000355562_295 EKEKEDNSSEEEEEI 0.647742 CK2
ARI4B_MOUSE
ENSMUSP00000106163_295
EKEKEDNSSEEEEEI 0.647742 CK2
SENP3_HUMAN
ENSP00000403712_75 PSFDASASEEEEEEE 0.647679 CK2
SENP3_MOUSE
ENSMUSP00000005336_73
PSFDASASEEEEEEE 0.647679 CK2
TBD2B_HUMAN
ENSP00000300584_957 PDKGELVSDEEEDT 0.647633 CK2
TBD2B_MOUSE
ENSMUSP00000045413_959
PDKGELVSDEEEDT 0.647633 CK2
U5S1_HUMAN ENSP00000392094_19 YIGPELDSDEDDDE
L 0.647600 CK2
U5S1_MOUSE ENSMUSP00000021306_19
YIGPELDSDEDDDEL
0.647600 CK2
NIN_HUMAN ENSP00000245441_1145 VTRRHVLSDLEDDE
V 0.632661 CK2
NIN_MOUSE ENSMUSP00000082422_1133
PATKHFLSDLGDHEA
0.103702 DMPK
F111A_HUMAN
ENSP00000434435_607 QQDVEMMSDEDL 0.633808 CK2
F111A_MOUSE
ENSMUSP00000119518_610
VQNVEMLSIDF 0.139290 CK2
OSTP_HUMAN
ENSP00000378517_191 ATDEDITSHMESEEL
0.601545 CK2
OSTP_MOUSE ENSMUSP00000031243_176
ATDEDLTSHMKSGES
0.107948 CK1
ORC2_HUMAN
ENSP00000234296_177 LIVPRSHSDSESEYS 0.576608 CK2
ORC2_MOUSE ENSMUSP00000027198_176
IIASRSHYDSESEYS 0.090376 MAP2K6_MAP2K3_MAP2K4_MAP2K7
K1551_HUMAN
ENSP00000310338_1198 NSIKNSSSEEEKQKE
0.602660 CK2
K1551_MOUSE
ENSMUSP00000041180_956
VPQCHCSSTEKKEKD
0.119368 ACTR2_ACTR2B_T
GFbR2
113
LTV1_HUMAN
ENSP00000356548_211 YDSAGLLSDEDCMSV
0.612627 CK2
LTV1_MOUSE ENSMUSP00000019950_206
RSSAGFLSDGGDLSA
0.129578 CK2
RBP2_HUMAN
ENSP00000283195_2583 KCELSKNSDIEQSSD
0.593321 CK2
RBP2_MOUSE ENSMUSP00000003310_2421
KCELPQNSDIKQSSD
0.115794 GRK
SYCC_HUMAN
ENSP00000369897_264 LTGEEVNSCVEVLLE
0.567883 CK2
SYCC_MOUSE
ENSMUSP00000010899_264
LSGEEVDSKVQVLL 0.093388 CK2
TSN1_HUMAN
ENSP00000361072_156 CCGFTNYTDFEDSPY
0.585526 CK2
TSN1_MOUSE ENSMUSP00000030465_156
CCGFNNYTDFNASRF
0.115104 CK2
SETB1_HUMAN
ENSP00000271640_474 LSPQAGDSDLESQLA
0.543830 CK2
SETB1_MOUSE
ENSMUSP00000015841_473
LSPQAADTESLESQL
0.080428 CK2
Table S4.2. Comparison of StC and StD sites. The first ten site pairs present in the table are the pairs of StC sites with the highest NetPhorest scores. The last ten pairs are the pairs of StD sites with the highest difference of NetPhorest scores between the phosphorylated site and its non-phosphorylated counterpart. Green rows refer to phosphorylated sites while grey to non-phosphorylated ones. Differences between orthologous 15mers centered on each site are highlighted in yellow. Protein ID Description Human site Mouse site
PKP2 plakophilin 2 ENSP00000070846_203 ENSMUSP00000036890_181
PKP2 plakophilin 2 ENSP00000070846_267 ENSMUSP00000036890_235
PNN pinin, desmosome associated protein ENSP00000216832_552 ENSMUSP00000021381_559
NUP210 nucleoporin 210kDa ENSP00000254508_1862 ENSMUSP00000032179_1839
NUP210 nucleoporin 210kDa ENSP00000254508_1863 ENSMUSP00000032179_1839
VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) ENSP00000261517_542
ENSMUSP00000077040_839
VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) ENSP00000261517_734
ENSMUSP00000077040_839
114
VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) ENSP00000261517_736
ENSMUSP00000077040_839
VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) ENSP00000261517_1902
ENSMUSP00000077040_1956
DSG2 desmoglein 2 ENSP00000261590_984 ENSMUSP00000057096_921
ADNP2 ADNP homeobox 2 ENSP00000262198_1024 ENSMUSP00000068560_1052
LRP2 low density lipoprotein receptor-related protein 2 ENSP00000263816_4527 ENSMUSP00000079752_4533
LRP2 low density lipoprotein receptor-related protein 2 ENSP00000263816_4634 ENSMUSP00000079752_4632
EHBP1 EH domain binding protein 1 ENSP00000263991_751 ENSMUSP00000105191_765
EHBP1 EH domain binding protein 1 ENSP00000263991_769 ENSMUSP00000105191_765
ALMS1 Alstrom syndrome 1 ENSP00000264448_2751 ENSMUSP00000071904_1916
ALMS1 Alstrom syndrome 1 ENSP00000264448_2754 ENSMUSP00000071904_1916
NBN nibrin ENSP00000265433_402 ENSMUSP00000029879_429
NBN nibrin ENSP00000265433_497 ENSMUSP00000029879_533
NBN nibrin ENSP00000265433_516 ENSMUSP00000029879_543
FANCM Fanconi anemia, complementation group M ENSP00000267430_1413 ENSMUSP00000054797_1379
FANCM Fanconi anemia, complementation group M ENSP00000267430_1673 ENSMUSP00000054797_1638
FANCM Fanconi anemia, complementation group M ENSP00000267430_1686 ENSMUSP00000054797_1638
FANCM Fanconi anemia, complementation group M ENSP00000267430_1693 ENSMUSP00000054797_1638
FANCM Fanconi anemia, complementation group M ENSP00000267430_1721 ENSMUSP00000054797_1638
C10orf47 chromosome 10 open reading frame 47 ENSP00000277570_146 ENSMUSP00000060780_225
UHRF1BP1L UHRF1 binding protein 1-like ENSP00000279907_446 ENSMUSP00000020112_797
PDE3B phosphodiesterase 3B, cGMP-inhibited ENSP00000282096_561 ENSMUSP00000032909_536
RANBP2 RAN binding protein 2 ENSP00000283195_1146 ENSMUSP00000003310_1141
RANBP2 RAN binding protein 2 ENSP00000283195_2802 ENSMUSP00000003310_2638
RANBP2 RAN binding protein 2 9848] ENSP00000283195_2807 ENSMUSP00000003310_2641
ZNF646 zinc finger protein 646 29004] ENSP00000300850_1448 ENSMUSP00000052641_1412
CLSPN claspin ENSP00000312995_69 ENSMUSP00000045344_84
CLSPN claspin ENSP00000312995_949 ENSMUSP00000045344_948
115
CLSPN claspin ENSP00000312995_955 ENSMUSP00000045344_948
CLSPN claspin ENSP00000312995_1161 ENSMUSP00000045344_1123
DAB2 disabled homolog 2, mitogen-responsive phosphoprotein (Drosophila) ENSP00000313391_723
ENSMUSP00000079689_731
FAM123C family with sequence similarity 123C ENSP00000314914_307 ENSMUSP00000054748_267
MAP1S microtubule-associated protein 1S ENSP00000325313_582 ENSMUSP00000019405_532
MAP1S microtubule-associated protein 1S ENSP00000325313_640 ENSMUSP00000019405_573
MAP1S microtubule-associated protein 1S ENSP00000325313_643 ENSMUSP00000019405_573
DDX24 DEAD (Asp-Glu-Ala-Asp) box polypeptide 24 ENSP00000328690_302 ENSMUSP00000105628_329
LRRC16A leucine rich repeat containing 16A ENSP00000331983_1314 ENSMUSP00000072662_1320
EFCAB13 EF-hand calcium binding domain 13 ENSP00000332111_385 ENSMUSP00000116040_452
BMP2K BMP2 inducible kinase ENSP00000334836_728 ENSMUSP00000037970_715
BMP2K BMP2 inducible kinase ENSP00000334836_1011 ENSMUSP00000037970_888
BMP2K BMP2 inducible kinase ENSP00000334836_1080 ENSMUSP00000037970_888
KIF18B kinesin family member 18B ENSP00000341466_676 ENSMUSP00000021311_558
FGD6 FYVE, RhoGEF and PH domain containing 6 ENSP00000344446_553 ENSMUSP00000020208_557
FGD6 FYVE, RhoGEF and PH domain containing 6 ENSP00000344446_632 ENSMUSP00000020208_557
FGD6 FYVE, RhoGEF and PH domain containing 6 ENSP00000344446_693 ENSMUSP00000020208_557
HTT huntingtin ENSP00000347184_411 ENSMUSP00000078945_638
MKL1 megakaryoblastic leukemia (translocation) 1 ENSP00000347847_295 ENSMUSP00000105207_335
MKL1 megakaryoblastic leukemia (translocation) 1 ENSP00000347847_305 ENSMUSP00000105207_345
SVIL supervillin ENSP00000348128_221 ENSMUSP00000115078_218
SVIL supervillin ENSP00000348128_226 ENSMUSP00000115078_218
SVIL supervillin ENSP00000348128_240 ENSMUSP00000115078_218
SVIL supervillin ENSP00000348128_253 ENSMUSP00000115078_218
SVIL supervillin ENSP00000348128_253 ENSMUSP00000115078_248
SVIL supervillin ENSP00000348128_261 ENSMUSP00000115078_248
SVIL supervillin ENSP00000348128_263 ENSMUSP00000115078_24
116
8
SVIL supervillin ENSP00000348128_914 ENSMUSP00000115078_857
CLCC1 chloride channel CLIC-like 1 ENSP00000349456_506 ENSMUSP00000102224_502
CLCC1 chloride channel CLIC-like 1 ENSP00000349456_509 ENSMUSP00000102224_503
PDE3A phosphodiesterase 3A, cGMP-inhibited ENSP00000351957_475 ENSMUSP00000038749_472
PDE3A phosphodiesterase 3A, cGMP-inhibited ENSP00000351957_523 ENSMUSP00000038749_526
PCNT pericentrin ENSP00000352572_1703 ENSMUSP00000001179_1444
PCNT pericentrin ENSP00000352572_2370 ENSMUSP00000001179_1990
PARP9 poly (ADP-ribose) polymerase family, member 9 ENSP00000353512_61 ENSMUSP00000110528_20
C15orf39 chromosome 15 open reading frame 39 ENSP00000353854_586 ENSMUSP00000034846_579
RP5-862P8.2 Mitogen-activated protein kinase kinase kinase MLK4 ENSP00000355583_542
ENSMUSP00000034316_521
RP5-862P8.2 Mitogen-activated protein kinase kinase kinase MLK4 ENSP00000355583_546
ENSMUSP00000034316_521
PTPRC protein tyrosine phosphatase, receptor type, C ENSP00000356346_1281 ENSMUSP00000027645_1265
PTPRC protein tyrosine phosphatase, receptor type, C ENSP00000356346_1287 ENSMUSP00000027645_1271
CEP350 centrosomal protein 350kDa ENSP00000356579_1195 ENSMUSP00000120085_1200
CEP350 centrosomal protein 350kDa ENSP00000356579_1219 ENSMUSP00000120085_1200
CEP350 centrosomal protein 350kDa ENSP00000356579_2204 ENSMUSP00000120085_2219
CEP350 centrosomal protein 350kDa ENSP00000356579_2238 ENSMUSP00000120085_2219
CEP350 centrosomal protein 350kDa ENSP00000356579_2239 ENSMUSP00000120085_2221
F5 coagulation factor V (proaccelerin, labile factor) ENSP00000356770_1155 ENSMUSP00000083204_903
KNDC1 kinase non-catalytic C-lobe domain (KIND) containing 1 ENSP00000357561_257
ENSMUSP00000050586_267
SLK STE20-like kinase ENSP00000358770_569 ENSMUSP00000049977_554
DST dystonin ENSP00000359790_2635 ENSMUSP00000110756_2521
ZNF217 zinc finger protein 217 ENSP00000360526_568 ENSMUSP00000104783_621
ATRX alpha thalassemia/mental retardation syndrome X-linked ENSP00000362441_33 ENSMUSP00000109203_43
ATRX alpha thalassemia/mental retardation syndrome X-linked ENSP00000362441_52 ENSMUSP00000109203_43
ATRX alpha thalassemia/mental retardation syndrome X-linked ENSP00000362441_65 ENSMUSP00000109203_35
117
ATRX alpha thalassemia/mental retardation syndrome X-linked ENSP00000362441_706
ENSMUSP00000109203_669
ATRX alpha thalassemia/mental retardation syndrome X-linked ENSP00000362441_899
ENSMUSP00000109203_871
ATRX alpha thalassemia/mental retardation syndrome X-linked ENSP00000362441_1068
ENSMUSP00000109203_1075
ITPR3 inositol 1,4,5-trisphosphate receptor, type 3 ENSP00000363435_1861 ENSMUSP00000038150_1831
SPEN spen homolog, transcriptional regulator (Drosophila) ENSP00000364912_1622 ENSMUSP00000101412_1669
SPEN spen homolog, transcriptional regulator (Drosophila) ENSP00000364912_2014 ENSMUSP00000101412_2107
SPEN spen homolog, transcriptional regulator (Drosophila) ENSP00000364912_2486 ENSMUSP00000101412_2402
ATXN2 ataxin 2 ENSP00000366843_872 ENSMUSP00000056715_830
RTN3 reticulon 3 ENSP00000367050_268 ENSMUSP00000065810_225
HIVEP1 human immunodeficiency virus type I enhancer binding protein 1 ENSP00000368698_479
ENSMUSP00000056147_591
BRCA2 breast cancer 2, early onset ENSP00000369497_384 ENSMUSP00000038576_400
SHROOM2 shroom family member 2 ENSP00000370299_229 ENSMUSP00000098701_237
FANCA Fanconi anemia, complementation group A ENSP00000373952_850 ENSMUSP00000045217_1071
CCDC88C coiled-coil domain containing 88C ENSP00000374507_1023 ENSMUSP00000082177_794
ACD adrenocortical dysplasia homolog (mouse) ENSP00000377496_411 ENSMUSP00000048180_290
MYLK3 myosin light chain kinase 3 ENSP00000378288_450 ENSMUSP00000034133_432
GLI3 GLI family zinc finger 3 ENSP00000379258_850 ENSMUSP00000106137_851
CD44 CD44 molecule (Indian blood group) ENSP00000398632_686 ENSMUSP00000005218_728
CD44 CD44 molecule (Indian blood group) ENSP00000398632_717 ENSMUSP00000005218_728
CD44 CD44 molecule (Indian blood group) ENSP00000398632_717 ENSMUSP00000005218_773
GORASP2 golgi reassembly stacking protein 2, 55kDa ENSP00000410208_448 ENSMUSP00000028509_432
TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1360 ENSMUSP00000068896_1379
TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1361 ENSMUSP00000068896_1379
TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1392 ENSMUSP00000068896_1367
TOP2A topoisomerase (DNA) II alpha 170kDa ENSP00000411532_1495 ENSMUSP00000068896_1469
C9orf172 chromosome 9 open reading frame 172 ENSP00000412388_104 ENSMUSP00000109855_105
CAMSAP3 calmodulin regulated spectrin-associated protein ENSP00000416797_704 ENSMUSP00000125993_58
118
family, member 3 3
CAMSAP3 calmodulin regulated spectrin-associated protein family, member 3 ENSP00000416797_811
ENSMUSP00000125993_882
CCDC110 coiled-coil domain containing 110 ENSP00000427246_807 ENSMUSP00000092964_644
PRAGMIN Tyrosine-protein kinase SgK223 ENSP00000428054_148 ENSMUSP00000106118_131
RNF214 ring finger protein 214 ENSP00000431643_56 ENSMUSP00000060941_48
DMXL1 Dmx-like 1 ENSP00000439479_1841 ENSMUSP00000045559_1829
SLC1A5 solute carrier family 1 (neutral amino acid transporter), member 5 ENSP00000444408_9 ENSMUSP00000104136_33
C14orf38 chromosome 14 open reading frame 38 ENSP00000452964_279 ENSMUSP00000021494_313
OBSCN obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF ENSP00000455507_3159
ENSMUSP00000038264_3281
MKL2 MKL/myocardin-like 2 ENSP00000459626_852 ENSMUSP00000009713_846
Table S4.3. List of evolutionary clustered sites. The list includes for each pair of evolutionary clustered sites the name of the proteins, a description of the protein and the two identifiers.
Protein ID Description Num. ECS SVIL supervillin 8 ATRX alpha thalassemia/mental retardation syndrome X-linked 6 FANCM Fanconi anemia, complementation group M 5 CEP350 centrosomal protein 350kDa 5
TOP2A topoisomerase (DNA) II alpha 170kDa 4 VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) 4 CLSPN claspin 4 SPEN spen homolog, transcriptional regulator (Drosophila) 3 BMP2K BMP2 inducible kinase 3
NBN nibrin 3 CD44 CD44 molecule (Indian blood group) 3 MAP1S microtubule-associated protein 1S 3 FGD6 FYVE, RhoGEF and PH domain containing 6 3 RBP2 RAN binding protein 2 3
Table S4.4. List of proteins with more than two evolutionary clustered sites. The list includes for each pair of evolutionary clustered sites the name of the proteins where they are found, a description of the protein and the two identifiers of the sites.
119
Figure S4.1. Algorithm to detect evolutionary clustered sites (ECS).
(A) Estimation of the colocalization of phosphorylation sites inside a window of length L.
Calculations were performed for windows of amino acids of increasing length. (B) Shuffling of
phosphorylation sites respecting their biochemical properties (residue: S, T or Y; location in
ordered/disordered regions) and calculation of the null expectations for the colocalization inside a
window of length L. Calculations were performed for windows of amino acids of increasing length.
(C) Comparison of the observed and expected values of colocalization. (D) Determination of the
closest phosphorylation sites for which the observed colocalization score is higher than expected by
chance (null expectation).
120
Figure S4.2. Comparison of human and mouse phosphorylation sites present in our dataset.
(A) Global number of phosphorylation sites. (B) Proportion of the different phosphorylated residues
(S: serine, T: threonine, Y: tyrosine).
Figure S4.3. Localization of SiD, StC and StD sites.
Fraction of sites located in disordered, ordered or mixed regions for each of the three categories and
comparison with the expectations. Mixed regions are regions where one site is located in a
disordered region while the orthologous one is located in an ordered region.
121
Figure S4.4. Conservation and divergence of clusters of poly-S/T/Y.
There are 158,970 poly-S/T/Y clusters (stretches of two or more consecutive S/T/Y residues) in the
human proteome and 158,022 in the mouse. We defined three categories of clusters: i) Site-diverged
clusters (SiD-c): human or mouse clusters that do not overlap with a cluster in the other species,
even though they can overlap with single phosphorylation sites; ii) state-conserved clusters (StC-c):
overlapping human and mouse clusters in which both the human and the mouse clusters contain at
least one phosphorylation site: iii) state-diverged clusters (StD-c): overlapping human and mouse
clusters in which only one among the human and the mouse clusters contains at least one
phosphorylation site. The plots show the number of observed SiD-c, StC-c and StD-c clusters of
poly-S/T/Y (orange dots) and the comparison to random expectations (distributions in grey). The
null model was generated by 1,000 iterations in which human and mouse clusters were randomized.
122
123
Figure S4.5. Analysis of position weight matrice (PWM) scores for the different classes of sites
and probability of being phosphorylated by the same protein kinase. (A) Comparison of the
distributions of PWM scores for human and mouse phosphorylated and non-phosphorylated
residues (Wilcoxon tests). (B) Comparison of the distributions of PWM scores for StD sites
(Wilcoxon tests; *: p-value < 0.05; **: p-value < 0.01; ***: p-value < 0.001). (C) Correlation
between human and mouse PWM scores for StC sites (red) and StD sites phosphorylated in human
but not in mouse (black). (D) Correlation between human and mouse PWM scores for StC sites
(red) and StD sites phosphorylated in mouse but not in human (black). (E) Proportion of
phosphorylated sites that have higher PWM scores compared to their corresponding site in the other
species for StC and StD sites. (F) Proportion of sites phosphorylated by the same kinase for the
different categories of sites (StD: state diverged, StC: state conserved, ECS: evolutionary clustered
sites). Black dots represent the observed proportion. Orange lines represent the range of proportions
expected by chance. The histogram shows the distribution of random expectations for ECS. P-value
for StD and StC: < 0.00001; p-value for ECS: 0.006.
124
Figure S4.6. NetPhorest scores and phosphorylation sites. Fraction of phosphorylated sites
(human and mouse) as a function of the NetPhorest score.
Figure S4.7. Distributions of NetPhorest scores for the different classes of sites.
(A,B) Distribution of NetPhorest scores for StC, StD and non-phosphorylated sites. Non-
phosphorylated sites (red) are orthologous sites that are conserved at the residue level and both non-
phosphorylated according to our phosphoproteomics data. For StD sites (in which one site is
Den
sity
Den
sity
NetPhorest score NetPhorest score
StCStD (pho.)StD (non-pho.)Non phosphorylated
StCStD (pho.)StD (non-pho.)Non phosphorylated
A B
0.0 0.2 0.4 0.6 0.7
01
23
45
0.0 0.2 0.4 0.6 0.7
01
23
45
125
phosphorylated while the orthologous one is phosphorylatable but not phosphorylated) we present
two distributions: one for phosphorylated residues and another for non-phosphorylated residues.
Figure S4.8. Relationship between NetPhorest scores in state-conserved sites and protein
abundance.
Distributions of NetPhorest scores for state-conserved sites (only the scores for the human residue
were considered) for four classes of relative protein abundance.
126
Figure S4.9. Distribution of the number of evolutionary clustered sites per protein.
127
Figure S4.10. Distance between evolutionary clustered sites.
(A) Proportion of evolutionary clustered sites as a function of the length of the window (expressed
in number of amino acids) in which the clustered sites are contained. (B) Cumulative distribution of
the proportion of evolutionary clustered sites as a function of the distance between them (1-100 aa).
Figure S4.11. Relationship between evolutionary clustered sites and available sites.
(A) Proportion of protein pairs having evolutionary clustered sites as a function of the available
sites (SiD sites). (B) Distribution of available sites present in the proteins that have evolutionary
clustered sites.
128
Supplementary files
Supplementary files can be found at the address:
http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1004062#s4
Dataset S1. Alignments of orthologous mammalian proteins for the 68 proteins that show significant clustering of SiD phosphorylation sites (i.e. that contain evolutionary clustered sites). Proteins’ ENSEMBL IDs of the aligned proteins are provided. Alignments are in table format. The columns’ IDs provide information about the organism (following the ENSEMBL convention; e.g. “hsa” indicates Homo sapiens and “mus”, Mus musculus) and the type of data included (“aa” for amino acid and “p” for phosphorylation). Dataset S2. Human and mouse phosphorylation sites for the 11,150 proteins present in our dataset (i.e. that contains evolutionary clustered sites). ENSEMBL IDs of the aligned proteins are provided. The alignment is in table format. The column IDs provide information on the organism (following the ENSEMBL convention; e.g. “hsa” indicates Homo sapiens and “mus”, Mus musculus) and the type of data included (“aa” for amino acid and “p” for phosphorylation, “diso” for disorder/order). Two columns (“hsa” and “mus”) provide information about the position of the residues along the human or mouse sequences. Protein disorder is indicated by the “*” symbol, while order is indicated by the “.” symbol. For phosphorylation sites, we provide information (columns <organism_id>.p.db) about the papers/dataset that lists the site as being phosphorylated. ("Be", Beltrao et al., 2012; “Hp”, Keshava Prasad et al., 2009; “Hu”, Huttlin et al., 2010; "Mi", Minguez et al., 2012; “Pe”, Dinkel et al., 2011; “Ph”, Gnad et al., 2011; "Po", Hornbeck et al., 2012).
129
Annex 3 – Supplementary information for Chapter 5
Figure S5.1. Localization of 3-state sites (phosphorylatable and O-GlcNAcylatable) residues in disordered (A) or ordered regions (B) of human proteins. Localization of of 3-state sites (phosphorylatable and O-GlcNAcylatable) residues in disordered (C) or ordered regions (D) of mouse proteins. The distribution represents the random expectations for each one of the regions.
130
Figure S5.2. Distribution of human (A) and mouse (B) protein abundances for the proteins containing 3-state sites. PhosOrg: phosphositeOrg.
131
Annex 4 - qPCA: a scalable assay to measure the perturbation of protein-protein interactions in living cells
Abstract
One of the most important challenges in systems biology is to understand how cells
respond to genetic and environmental perturbations. Here we show that the yeast DHFR-
PCA, coupled with high-resolution growth profiling (DHFR-qPCA), is a straightforward
assay to study the modulation of protein-protein interactions (PPIs) in vivo as a response to
genetic, metabolic and drug perturbations. Using the canonical Protein Kinase A (PKA)
pathway as a test system, we show that changes in PKA activity can be measured in living
cells as a modulation of the interaction between its regulatory (Bcy1) and catalytic (Tpk1,
Tpk2) subunits in response to changes in carbon metabolism, caffeine and methyl
methanesulfonate (MMS) treatments and to modifications in the dosage of its enzymatic
regulators, the phosphodiesterases. Our results show that the DHFR-qPCA is easily
implementable and amenable to high-throughput. The DHFR-qPCA will pave the way to
the study of the effects of drug, genetic and environmental perturbations on in vivo PPI
networks, thus allowing the exploration of new spaces of the eukaryotic interactome.
Introduction
Protein-protein interactions (PPIs) are fundamental for all cellular functions (Vidal et al,
2011). In particular, they allow the cell to perceive external stimuli and generate
appropriate physiological responses. In the last decade, the development of high-throughput
techniques to study PPIs has led to the first maps of the protein interactome of several
model organisms (Giot et al, 2003; Ito et al, 2001; Krogan et al, 2006; Li et al, 2004; Rual
et al, 2005; Tarassov et al, 2008). These maps described new associations within and
among functional modules (Tarassov et al, 2008) and among protein complexes and
cellular functions (Krogan et al, 2006). One limit of these maps is that they have mainly
been determined in one single experimental condition. Studying how protein interactomes
132
change in different conditions would allow to understand how cells adapt to different
environments, how they respond to drugs, stressors and genetic perturbations as well as to
understand the basis of cellular development and differentiation (Ideker & Krogan, 2012).
In order to achieve these objectives, it is necessary to develop new techniques or to adapt
existing ones. Ideally, these techniques should be simple, easily implementable and
amenable to high-throughput. Because studying how cells respond to perturbations of PPIs
requires being able to detect them in their endogenous environments, these assays should
also be performed in living cells and among proteins that are natively regulated.
Protein Complementation Assay (PCA) is a family of techniques that is now widely used to
study PPI networks (Michnick et al, 2007; Morell et al, 2009). All different variants of
PCA are based on the same principles: two complementary fragments of a reporter protein
are fused to two proteins of interest. If the two proteins interact, the activity of the reporter
protein is reconstituted such that it provides a detectable signal. One of the most cost-
effective PCA techniques is the DHFR-PCA. In this case the reporter protein is a modified
murine dihydrofolate reductase (Dhfr) that confers resistance to the chemical methotrexate
(MTX) (Tarassov et al, 2008). Therefore, the presence of an budding yeast PPI network in
living cells (Tarassov et al, 2008). In yeast, the assay requires that the coding sequences of
the DHFR fragments (DHFR F[1,2] and DHFR F[3]) are inserted in the genome at the 3’
end of two genes of interest to produce proteins with the DHFR fragments fused at the C-
termini (Fig. 1).
133
Fig. 1 Rationale of the DHFR-PCA. The gene encoding the engineered mouse DHFR is split into two complementary fragments, DHFR F[1,2] and DHFR F[3] and fused to the genes encoding two proteins of interest A and B. The concentration of tetrahydrofolate (THF) produced increases with the amount of DHFR complexes formed and is expected to affect growth in a dose-dependent fashion.
The endogenous gene modification offers the advantage of minimally perturbing the
transcriptional regulation of the gene and does not require a modification of the protein
native localizations. The interaction between two proteins of interest can be detected and
measured as cellular growth on media with MTX. This PCA has recently been successfully
used to determine a large part of the Here we show that the fitness-based yeast DHFR-PCA
(Tarassov et al, 2008), combined with high-resolution growth profiling (DHFR-qPCA), can
be successfully used to study changes in PPIs in vivo in different conditions and genetic
backgrounds, and thus represents a tool that can be used to explore new dimensions of
protein interactomes. Using high-resolution growth profiling it is possible to follow the
growth of yeast strains in microchambers and in real-time. This allows a precise and
sensitive measurement of the growth curves of several strains in parallel in different growth
media.
134
Materials and methods
Bioinformatic analysis of previous DHFR-PCA data
The integrated dataset on protein abundance was downloaded from PaxDb (Wang et al,
2012). Data on colony size were taken from Tarassov et al. (Tarassov et al, 2008). In order
to determine the correlation between PCA signal and protein abundance we calculated the
Spearman’s rank correlation coefficient as implemented in R (The R project for Statistical
Computing).
Construction of the strains used to test the DHFR-qPCA
Diploid strains with different numbers of DHFR-fused genes were constructed as follows.
Haploid strains (BY4741 and BY4742 backgrounds; BY4741: MATa, his3Δ1, leu2Δ0,
met15Δ0, ura3Δ0; BY4742: MATα, his3Δ1, leu2Δ0, lys2Δ0, ura3Δ0) carrying a single
DHFR fragment (DHFR F[1,2] for MATa strains and DHFR F[3] for MATα strains) were
purchased from Open Biosystems (http://www.openbiosystems.com) and were crossed as
described by Tarassov et al. (Tarassov et al, 2008). For each interaction listed in
Supplementary Table 1, we crossed the corresponding BY4741 MATa GENE A-DHFR
F[1,2] strain with the BY4742 MATα GENE B-DHFR F[3] strain to generate diploid
heterozygous strains GENE A-DHFR F[1,2]/GENE A, GENE B/GENE B-DHFR F[3].
These strains are 1:1 strains, as only one allele of each gene is tagged with a DHFR
fragment. Then we sporulated these 1:1 strains, dissected the tetrads and genotyped the
segregation of the DHFR fragments (Treco & Winston, 2008) in order to obtain haploid
strains carrying one allele of each gene tagged (MATa GENE A-DHFR F[1,2] GENE B-
DHFR F[3] and MATα GENE A-DHFR F[1,2] GENE B-DHFR F[3], respectively). Finally,
we crossed these haploid strains to generate 2:2 diploid strains with both alleles of each
gene tagged (MATa/MATα GENE A-DHFR F[1,2]/GENE A-DHFR F[1,2], GENE B-DHFR
F[3]/GENE B-DHFR F[3]). All strain genotypes are listed in Supplementary Table 2. We
confirmed all integrations by colony PCR (Amberg et al, 2005) with oligonucleotides listed
in Supplementary Table 3.
135
High-resolution growth profiling
Saturated overnight cultures in YPD with suitable antibiotics were diluted to an OD600 of 1
and then diluted 1/30 in 150µl of SC (0.669% YNB w/o ammonium sulphate w/o amino
acids, 2% glucose, drop-out -lys, -met, -ade) with methotrexate 200µg/mL (Bioshop
Canada Inc.) or the methotrexate solvent DMSO (Bioshop Canada Inc.) in a 100 well
honeycomb plate (Growth Curves USA). We measured growth profiles using a Bioscreen C
(Growth Curves USA) by reading the OD420-580 every 15 minutes with continuous agitation
at 30°C. For the experiments in different conditions we added caffeine (Bioshop Canada
Inc.) and MMS (Sigma-Aldrich, concentration 99%) in a gradient of concentrations (final
concentrations: 1 mM, 2mM, 4mM, 6mM and 0.002%, 0.005%, 0.008%, 0.011%,
respectively) or we used 2% galactose instead of 2% glucose in the SC medium. Growth
curves where measured for at least three replicates from three independent precultures.
Comparison of the different parameters used to estimate growth
For each growth curve, three parameters were measured in order to obtain a quantitative
estimate of cellular growth: the maximum growth rate (∆OD/∆t), the efficiency (ODfinal-
ODinitial) and the lag time. The maximum growth rate was calculated as follows. We used a
sliding window approach to calculate a regression line for each interval spanning 31
measurements. Then, we sorted all regression coefficients and we determined the 98th
percentile. We assumed this value to be the maximum growth rate. This approach allowed
to eliminate any extreme value that could result from experimental errors (Hill & Otto,
2007). The lag time was defined as the average time point of the interval where the
maximum growth rate was observed. Finally, the efficiency was calculated as difference
between the final and initial ODs. All analyses were performed in R (The R project for
Statistical Computing).
Growth curve analysis
In order to measure the relative interaction score (M) we used the following algorithm. We
subtracted the lag time calculated in MTX (test) to the lag calculated in DMSO (control),
which estimates the effect of the perturbation on growth rate independently from the
136
interaction. We called this difference ∆L. Then, for each experiment, in order to evaluate
the differences in lag times between strains or conditions on a relative scale, we calculated
a relative interaction score value (M) as a proxy for effective cellular growth
(Supplementary Fig 2). We first computed the maximum ∆L among the strains or
conditions tested (∆L max). For each strain or condition we then determined M by
subtracting ∆L from ∆L max. The minimal interaction score is thus arbitrarily set to 0.
Spot assays
An overnight preculture in YPD was diluted to an OD600 = 1 and a ten fold serial dilution
was performed. Five µL of the cell suspensions were inoculated onto SC medium plates
with methotrexate 200µg/mL or DMSO.
Interactions between Ras and different RBD mutants
The DHFR F[1,2] coding sequence was amplified by PCR with oligonucleotides that
contain restriction sites BspEI and XhoI respectively, using the plasmid pAG25-linker-
DHFR-F[1,2]-ADHterm (Tarassov et al, 2008) as template and subcloned in the plasmid
p413Gal1-Ras-yCD-F[1] (Ear & Michnick, 2009) cutting with BspEI and XhoI to fuse the
Ras coding gene to the DHFR F[1,2]. Sequence encoding the DHFR F[3] was amplified
from pAG32-linker-DHFR(3)-ADHterm (Tarassov et al, 2008) using oligos that contain
restriction sites BspEI and XhoI. The resulting PCR fragment was digested with the
restriction enzymes BspEI and XhoI and subcloned in the six plasmids p415Gal1-RBD-
yCD-F[2] (Ear & Michnick, 2009) that contain the wild-type RBD residues 55-133 and six
mutant RBD 55-133. All constructed plasmids were verified by sequencing. A BY4741
strain was transformed with the plasmids p413Gal1-Ras-DHFR F[1,2] while six BY4742
strains were transformed with the plasmids p415Gal1-RBD-DHFR F[3] containing the
wild-type and the five RBD mutants. The BY4741 strain was crossed with all the six
BY4742 strains. The resulting diploid strains containing both plasmids were grown
overnight in SC-Raffinose medium (0.669% YNB w/o ammonium sulphate w/o amino
acids, 2% raffinose, drop-out –lys, -met, –ade, -his, -leu). High-resolution growth profiling
experiments were performed in SC-Galactose medium (0.669% YNB w/o ammonium
sulphate w/o amino acids, 2% galactose, drop-out –lys, -met, –ade, -his, -leu). Growth
137
curves where measured in two experiments. In the first one five replicates from five
independent pre-cultures were used to test each interaction. In the second one twelve
independent replicates were used to test each interaction. Fig 4 shows the combined results
of the two experiments.
PKA regulatory and catalytic subunits interactions in different conditions
In order to study the perturbation of the PKA complex in different conditions, we
constructed diploid strains using strains from the DHFR collection (Tarassov et al, 2008).
We crossed the MATa strains (BY4741 background) with the Tpk1 or the Tpk2 genes fused
to the DHFR F[1,2] with a MATα strain (BY4742 background) having the Bcy1 gene
tagged with DHFR F[3] to generate the diploid strains JFL001 and JFL002.
Strains carrying one additional copy of the PDE genes (PDE1 or PDE2) were obtained by
transforming the JFL001 and JFL002 strains with plasmids from the MoBY collection (Ho
et al, 2009). Plasmids of the MoBY collection carry one yeast gene (in this case PDE1 or
PDE2) under the control of its native promoter and terminator as well as an URA selection
cassette and a yeast centromeric sequence (CEN) (Ho et al, 2009). We also transformed the
JFL001 and JFL002 strains with an empty pRS316 plasmid, which also has a URA cassette
and a CEN sequence, to generate a no insert control strain. We then constructed strains
carrying a deletion of one allele of a different PDE gene. We performed two independent
transformations to obtain haploid MATα BY4742 strains where TPK1 or TPK2 genes were
tagged with the DHFR F[1,2] and BCY1 tagged with the DHFR F[3] (JFL003 and JFL004).
Then we crossed the JFL003 and JFL004 strains with the ho∆ (control strain), pde1∆ and
pde2∆ MATa strains from the YKO deletion collection (Winzeler et al, 1999). The strains
of the YKO collection are BY4741 MATa strains where a single gene is interrupted by a
fragment containing a KanMX cassette, which provides resistance to the antibiotic G418.
We therefore obtained diploid strains that carry a deletion of one allele of a different PDE
gene (JFL005, JFL006, JFL007 and JFL008) and control strains (JFL009 and JFL010). All
strains were grown in rich medium (YPD) with antibiotics. Strains carrying the DHFR
F[1,2] cassette were grown in presence of Nourseothricin (Werner BioAgents; 100
µg/mL), those carrying DHFR F[3] were grown in presence of Hygromycin B (Bioshop
138
Canada Inc.; 250 µg/mL). Finally, strains carrying a KanMX cassette were grown in
presence of G418 (Bioshop Canada Inc.; 200 µg/mL).
Co-immunoprecipitation and western blotting of Tpk2-Bcy1 (PKA)
Strain JFL011 used for immunoprecipitation was constructed as follows. Plasmid pYM17
(Janke et al, 2004) that contains six repeats of the HA tag and a natNT2 marker was
amplified with specific oligonucleotides for the integration at the BCY1 locus (C-terminus)
in a BY4742 strain. Then, we amplified plasmid pYM20 (Janke et al, 2004) that has nine
Myc tags in tandem repeats with oligos for integration at the TPK2 locus (C-terminus). All
oligonucleotides mentioned above are listed on Supplementary Table 3. This PCR product
was transformed into a BY4741 strain for homologous recombination. These haploid
strains were crossed to generate the JFL011 diploid strain. Six independent cultures of the
strain JFL011 were grown in 5mL of YPD overnight at 30°C with shaking. The next day,
cells were diluted to OD600 of 0.1 in 100mL and grown to OD600 of 0.5. Three of the
cultures were treated with caffeine (final concentration 6mM in water) while for the
remaining three (controls) treated with the same volume of water. All cultures were
incubated for 30 min. After incubation, the equivalent 15 OD600 of cells were collected,
washed once with 2mL of zymolyase buffer (1M Sorbitol, 0,01M Phosohate-buffer pH 7.6,
0,02M EDTA) and then resuspended in 4mL with the same buffer. 4µL of -
mercaptoethanol and 5µL of Zymolyase 20T (20mg/mL) were added to the cells and
incubated at 37°C for 23 min with agitation. The spheroplasts were washed with 1M
sorbitol, resuspended in 200µL of lysis buffer (50mM Tris-HCl pH 7.4, 0.01M EDTA,
150mM NaCl, 1% Triton X-100, PMSF 1mM, Aprotinin 2µg/mL, Leupeptin 20µg/mL,
Pepstatin A 2µg/mL) and incubated 2h on ice. Lysates were immunoprecipitated for 2h at
4°C with THE™ c-Myc Tag Antibody, mAB, Mouse (GenScript A00704) coupled to
Dynabeads M-280 Tosylactivated (Life Technologies Corporation), washed 3 times with
500µL of cold washing buffer (0.1M Na-Phospahte pH 7.4, 0.08% Tween 20) and eluted in
40 µL of boiling 2X Laemmli Buffer for 10 min. The primary antibodies for Western
blotting were the rabbit Anti-HA antibody (Rockland 600-401-384) for the HA tag, and
THE™ c-Myc Tag Antibody. The secondary antibodies were IRDye 680 conjugated Goat
Anti-Rabbit (926-32221) and IRDye 800 conjugated Goat Anti-mouse (926-32210)
139
(LiCor). Dried membranes were scanned and process using an Odyssey Infrared Imaging
System. Pixel quantification was performed using ImageJ64 (ImageJ).
Results and discussion
The DHFR-qPCA signal reflects the amount of protein complex formed in the cell
We first examined whether the DHFR-PCA signal provides a quantitative measure of PPIs
(Fig. 1), which is a minimal requirement for the measurement of changes in PPIs in
different conditions. Here by quantitative we mean that the PCA signal correlates with the
quantity of protein complex formed by two interacting proteins and changes in PCA signal
can be reproducibly measured. We first tested this hypothesis by combining protein
abundance data (Wang et al, 2012) with the PCA data from Tarassov et al. (Tarassov et al,
2008), where PCA signal is measured as colony size on agar plates containing
methotrexate. We found a highly significant correlation between the average abundance of
two interacting proteins and PCA signal (rho = 0.18, p-value < 2.2e-16), and this
correlation is significantly improved by considering the abundance of the least expressed
protein of the interacting pair (rho = 0.32, p-value < 2.2e-16, Fig 2). This result suggests
that PCA signal reflects the abundance of the protein complexes formed.
140
Fig. 2 Relationship between colony size (DHFR-PCA signal) and the abundance of the least
expressed protein of interacting pairs. Grey dots represent raw data and blue dots binned data.
We then tested whether changes in PPIs could be detected in the absence of modification of
protein abundance. We randomly selected 15 PPIs from the yeast DHFR-PCA network 2
with high (7 pairs), medium (2 pairs) and low (6 pairs) protein abundances (see
Supplementary Table 1) 12. We constructed yeast strains carrying different combinations
of DHFR tagged genes in diploid cells: one allele of each locus (1:1) or both alleles of each
locus (2:2) (Fig. 3A). These constructs allowed to directly manipulate by four fold the
amount of DHFR reconstituted without modifying protein abundance.
141
Fig. 3 (A) Fifteen diploid strains with one (1:1) or both alleles (2:2) of two genes (GENE A, light
blue and GENE B, yellow) tagged with DHFR fragments (DHFR F[1,2] in red and DHFR F[3] in
dark blue, respectively) were constructed in order to test whether the DHFR-PCA signal could be
modulated without changing protein abundance (the genes are under the control of their native
promoters). In these diploid strains, only the number of alleles of each locus that are fused to the
DHFR fragments varies. (B) Parameters used to describe yeast growth curves (Slope (∆OD/∆t),
Efficiency (ODfinal-ODinitial) and Lag time) and their correlations. (C) Example of raw data of a
DHFR-qPCA experiment showing the growth profiles for the Vps29-Vps35 interaction. Each curve
represents an independent biological replicate. While in DMSO (control) the 1:1 and 2:2 strains
have the same growth profile, in methotrexate (MTX) the 2:2 has a significantly shorter lag time
than the 1:1 (t-test; p-value < 0.001) (D) Results of DHFR-qPCA test for 14 PPIs (1:1 and 2:2
backgrounds; 15th interaction shown in panel C). All independent replicates are shown for each
interaction. Grey points represent growth in DMSO (control), while colored points represent growth
in MTX. Red points show interactions among proteins with high expression levels; blue, medium
expression and black low expression. Dashed lines associate the same interaction in the two
142
different backgrounds. The significance of the difference in lag time between the 1:1 and 2:2
backgrounds in MTX is shown for each interaction (t-test; ***: p-value < 0.001; *: p-value between
0.01 and 0.05; NS: non-significant). (E) Spot-dilution assays show that difference in growth rates
can also be detected on solid medium. Results for the Vps29-Vps35 interaction are shown (cell
dilution 1:10). An isogenic strain carrying the two DHFR fragments alone expressed on plasmids
provides a negative control. DMSO is the MTX solvent and is thus used as a control for growth.
When only one allele of each gene is fused to the DHFR fragments (A’ and B’ for the
tagged alleles, A and B for the untagged alleles), four types of protein complexes can be
formed: A’-B’, A-B’, A’-B, A-B. Therefore in the 1:1 strains, only the A’-B’ complexes
(1/4) would provide a DHFR-PCA signal. If both alleles of both genes are tagged (2:2
strains), all complexes would be of type A’-B’ and thus 100% of PPIs of the complex
would provide a DHFR-PCA signal. In both cases, the concentrations of proteins A and B
are unaltered. We applied high-resolution growth profiling (see Methods) to these strains in
DMSO (control) and MTX and estimated growth parameters from the growth profile. As a
first step, we determined which growth parameter would maximise the power to detect
changes in PPIs. We compared the slope, the efficiency and the lag time required to reach
the maximum growth rate (Fig. 3B). We found that all parameters are strongly correlated
with each other (Fig. 3B) and thus largely redundant. Because the lag time maximizes
correlation between replicates (rho = 0.90, p-value < 6.81e-5) and is therefore less sensitive
to experimental error, we used it as an estimate of growth and thus PCA signal. For 14 out
of 15 protein pairs, lag time was significantly lower for the 2:2 strains than for the 1:1
strains (Fig. 3C-D). These four-fold differences in DHFR-PCA signal can also be detected
on solid medium (Fig. 3E). We also sought to test whether the PCA signal would reflect the
known dissociation constant (Kd) of a protein complex. Block et al. (Block et al, 1996)
showed how point mutations in the Ras Binding Domain (RBD) affect the Kd of the Ras-
RBD complex. We tested the interactions between Ras and six RBD mutants with different
Kds by DHFR-qPCA. As expected, to a decrease in Kd corresponds an increase in PCA
signal (Fig. 4).
143
Fig. 4 DHFR-qPCA signal for interactions between Ras and different RBD mutants. PCA signal
increases with a decrease in Kd of the different mutant Ras-RBD protein complexes. The R89L
mutant shows no interaction and its M score is arbitrarily set on this graph. The numbers in
parenthesis indicate the Kd of each complex in µM units. Seventeen independent biological
replicates in two independent experiments were used to perform DHFR-qPCA assays for each
mutant Ras-RBD complex.
Altogether, these results (Fig. 2,3,4) indicate that the DHFR-qPCA provides a quantitative
readout of the amount of protein complexes formed between interaction partners, even
when changes in the amount of complex formed do not involve changes in protein
abundance.
DHFR-qPCA allows to study the effect of metabolic, drug and genetic perturbations on protein complexes
We next sought to test directly the approach on a canonical signaling pathway by using
different perturbations and combinations of perturbations. For this purpose, we chose the
well-characterized protein kinase A (PKA) complex. The PKA is a tetramer formed by two
regulatory and two catalytic subunits (Zhang et al, 2012) that is regulated by cAMP levels
(Fig. 5A).
144
145
Fig. 5 Condition-dependence of the interactions between the PKA regulatory and catalytic
subunits in response to different perturbations.
(A) The activation/inactivation of the PKA is regulated by intracellular levels of cAMP, which is
modulated among other mechanisms by the enzymes adenylate cyclase and phosphodiesterase
(Pde1 and Pde2 in yeast). (B) Comparison of the DHFR-qPCA signal for the Bcy1-Tpk2 interaction
in glucose and galactose (C) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in cells grown in
media supplemented with caffeine at different concentrations. (D) Co-immunoprecipitation of Bcy1
and Tpk2 in standard medium (YPD) and in YPD supplemented with caffeine confirms the DHFR-
qPCA results (t-test, p-value < 0.05). (E) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in cells
grown in media supplemented with methyl methanesulfonate (MMS). (F) DHFR-qPCA signal for
the Bcy1-Tpk2 interaction in cells grown in media supplemented with galactose and MMS at
different concentrations. (G) DHFR-qPCA signal for the Bcy1-Tpk2 interaction in strains carrying
an additional copy (on a low copy number plasmid) or a deletion of one copy (heterozygous strain)
of the genes coding for the PDE enzymes (left and right panel, respectively). In all cases, n
represents the number of independent biological replicates.
The PKA pathway regulates different processes such as glucose metabolism (Dechant &
Peter, 2008), protein translation (Ashe et al, 2000), ribosome biogenesis (Martin et al,
2004), stress responses (Ramachandran et al, 2011), autophagy (Budovskaya et al, 2005)
and lifespan (Longo, 2003). In yeast, the regulatory subunits are encoded by the gene
BCY1, while the catalytic subunits are encoded by three different genes: TPK1, TPK2 and
TPK3 (Johnson et al, 1987; Toda et al, 1987). We studied the interactions between the
regulatory subunit (Bcy1) and the catalytic subunits (Tpk1 and Tpk2, respectively) by
DHFR-qPCA in response to four different perturbations: 1) galactose, which leads to a
decrease in PKA activity relative to glucose (Portela et al, 2003); 2) caffeine, which was
shown to indirectly inhibit the PKA (Soulard et al, 2010) through the TORC1 pathway; 3)
methyl methanesulfonate (MMS), a DNA damaging agent, which has recently been
associated with the PKA pathway (Bandyopadhyay et al, 2010) and was shown to lead to
the phosphorylation of the PKA regulatory subunit (Searle et al, 2011) and 4) dosage of the
PKA regulator PDE (phosphodiesterase), which negatively regulates the PKA by degrading
cAMP (Ma et al, 1999). Each of these conditions is known or has been hypothesized to
146
affect the PKA pathway in yeast but has not been assessed at the level of the protein
complex dissociation.
We estimated the effect of each perturbation in control conditions (without MTX) and
subtracted this effect to estimate the net PCA signal in MTX. We obtained the difference in
lag time between MTX and DMSO (∆L) from which we computed a relative interaction
score value (M, see Methods). Our results show that the DHFR-qPCA can detect changes
in the PKA activity as a response to metabolic, drug and genetic perturbations (Fig. 5 for
Bcy1-Tpk2; Supplementary Fig 1 for Bcy1-Tpk1). We observed more interaction between
the regulatory and catalytic subunits of the PKA in galactose (Fig. 5B; Supplementary Fig
1A) and in the presence of caffeine (Fig. 5C; Supplementary Fig 1B) when compared with
standard growth conditions. This confirms that the PKA is inhibited in these conditions
(Portela et al, 2003; Soulard et al, 2010) and this inhibition involves changes in Bcy1-Tpk1
and Bcy1-Tpk2 interactions. Further, our results show that the DHFR-qPCA can detect
concentration-dependent effects on the PKA complex, as PKA inhibition increases as
caffeine concentration increases (Fig. 5C; Supplementary Fig 1B). In this particular case,
we also measured changes in PKA complex using co-immunoprecipitation and confirmed
this quantitative effect on the PKA (for Bcy1-Tpk2) (Fig. 5D). The DHFR-qPCA signal
appears to provide a larger amplitude of changes in the interaction than co-
immunoprecipitation, which is most likely due to the fact that the DHFR-qPCA assay is a
fitness based assay which serves as a signal amplifier. This property could be exploited for
instance for the measurement of subtle changes in PPIs. We also found that the effect of
MMS on the PKA was stronger in galactose than in glucose (Fig. 5E,F; Supplementary Fig
1C,D). The PKA might indeed be maximally activated in glucose and the addition of MMS
does not allow to activate it to an extent that can be detected with this assay. It is also
possible that DNA damage affects the PKA pathway in a carbon source-dependent fashion,
with a greater effect in non-fermentable carbon sources. Finally, we found that modifying
the dosage of PDE1 or PDE2 leads to significant changes in the amount of PKA complex
formed, consistent with their roles as negative regulators of the PKA complex (Ma et al,
1999) (Fig. 5G; Supplementary Fig 1E). This result shows that the DHFR-qPCA allows the
147
detection of subtle quantitative genetic perturbations (50% of gene product) that affect
PPIs.
Conclusions
PPIs regulate many cellular processes and are therefore expected to be dynamic and
condition-dependent, i.e. the degree of association among proteins will depend on the
conditions to which cells are exposed. There is therefore a strong need for the development
of simple assays to measure changes in PPIs. Here we show that the yeast DHFR-qPCA is a
quantitative technique that allows the screening of PPIs in different conditions at low cost.
Our results, with those of Schlect et al. (Schlecht et al, 2012), show that the DHFR-qPCA
can be used to study PPIs in different conditions and at high-throughput. Ninety-six
interactions could be tested simultaneously in a standard plate-reader or more in dedicated
instruments (see Methods). Unlike PCA assays based on luciferase (Stefan et al, 2007), this
assay does not allow to measure dynamic PPIs in real-time, because it is based on fitness.
However, this offers the advantage that fitness differences among conditions or strains can
be amplified through generations and may thus allow to detect very small changes of
interactions. The PCA signal might be saturated for interactions with low-Kd and/or highly
abundant proteins. However, we expect that this would occur for a limited number of
interactions under natural conditions as we see a strong correlation between the abundance
of proteins and PCA signal over five orders of magnitude of protein abundance without
saturation (Fig. 2). With the availability of entire yeast collections tagged with the DHFR
fragments (Tarassov et al, 2008), any pairwise interaction of interest could be investigated
in different conditions and in different genetic backgrounds. The DHFR-qPCA will pave
the way to the study of the effects of drug, genetic and environmental perturbations on in
vivo PPI networks, thus allowing the exploration of new spaces of the model eukaryotic
interactome.
Acknowledgements
This work was supported by a Canadian Institute of Health Research (CIHR) Grant GMX-
191597 and partly by a Human Frontier Science Program grant (RGY0073/2010) and a
148
Genome Québec grant. C. R Landry is a CIHR New Investigator. L. Freschi was supported
by a fellowship from the Fonds de recherche du Québec – Nature et technologies
(FRQNT). L. Freschi and F. Torres-Quiroz were supported by fellowships from the Quebec
Research Network on Protein Function, Structure and Engineering (PROTEO). We thank
all members of the Landry laboratory and N. Aubin-Horth for their comments on the
manuscript.
Supplementary information
Gene A-DHFR F[1,2]
Gene B-DHFR F[3]
Interactions (Gene names)
Interaction (ORF names)
Abundance
VMA21 VPH1 VMA21-VPH1 YGR105W-YOR270C H ARX1 YBR267W ARX1-
YBR267W YDR101C-YBR267W H
DHH1 EDC3 DHH1-EDC3 YDL160C-YEL015W H DHH1 LSM7 DHH1-LSM7 YDL160C-YNL147W H DHH1 PBP1 DHH1-PBP1 YDL160C-YGR178C H SGN1 PUB1 SGN1-PUB1 YIR001C-YNL016W H TOM70 ALO1 TOM70-ALO1 YNL121C-YML086C H CKB1 CKA2 CKB1-CKA2 YGL019W-YOR061W M NOT5 MOT2 NOT5-MOT2 YPR072W-YER068W M LSB3 CUE5 LSB3-CUE5 YFR024C-A-YOR042W L MMS2 SIP5 MMS2-SIP5 YGL087C-YMR140W L PEX14 PEX17 PEX14-PEX17 YGL153W-YNL214W L SLA1 END3 SLA1-END3 YBL007C-YNL084C L YKE2 GIM5 YKE2-GIM5 YLR200W-YML094W L VPS29 VPS35 VPS29-VPS35 YHR012W-YJL154C L For each pair we used the data by Ghaemmaghami et al. 1 and we calculated the average abundance. Then, we classified the pairs in 3 classes: low abundance (L), medium abundance (M) and high abundance (H). Supplementary Table 1. Protein-protein interactions selected to test for the relationship between growth and the amount of DHFR complex formed. References 1. S. Ghaemmaghami, W. Huh, K. Bower, R. W. Howson, A. Belle, N. Dephoure, E. K.
O'Shea and J. S. Weissman, Nature, 2003, 425, 737-741.
149
Strain Genotype LTQ001 MATa, VMA21-DHFR F[1,2]-natNT2, VPH1-DHFR F[3]-hphNT1
LTQ002 MATa, ARX1-DHFR F[1,2]-natNT2, YBR267W-DHFR F[3]-hphNT1
LTQ003 MATa, DHH1-DHFR F[1,2]-natNT2, EDC3-DHFR F[3]-hphNT1
LTQ004 MATa, DHH1-DHFR F[1,2]-natNT2, LSM7-DHFR F[3]-hphNT1
LTQ005 MATa, DHH1-DHFR F[1,2]-natNT2, PBP1-DHFR F[3]-hphNT1
LTQ006 MATa, SGN1-DHFR F[1,2]-natNT2, PUB1-DHFR F[3]-hphNT1
LTQ007 MATa, TOM70-DHFR F[1,2]-natNT2, ALO1-DHFR F[3]-hphNT1
LTQ008 MATa, CKB1-DHFR F[1,2]-natNT2, CKA2-DHFR F[3]-hphNT1
LTQ009 MATa, NOT5-DHFR F[1,2]-natNT2, MOT2-DHFR F[3]-hphNT1
LTQ010 MATa, LSB3-DHFR F[1,2]-natNT2, CUE5-DHFR F[3]-hphNT1
LTQ011 MATa, MMS2-DHFR F[1,2]-natNT2, SIP5-DHFR F[3]-hphNT1
LTQ012 MATa, PEX14-DHFR F[1,2]-natNT2, PEX17-DHFR F[3]-hphNT1
LTQ013 MATa, SLA1-DHFR F[1,2]-natNT2, END3-DHFR F[3]-hphNT1
LTQ014 MATa, YKE3-DHFR F[1,2]-natNT2, GIM5-DHFR F[3]-hphNT1
LTQ015 MATa, VPS29-DHFR F[1,2]-natNT2, VPS35-DHFR F[3]-hphNT1
LTQ016 MATα, VMA21-DHFR F[1,2]-natNT2, VPH1-DHFR F[3]-hphNT1
LTQ017 MATα, ARX1-DHFR F[1,2]-natNT2, YBR267W-DHFR F[3]-hphNT1
LTQ018 MATα, DHH1-DHFR F[1,2]-natNT2, EDC3-DHFR F[3]-hphNT1
LTQ019 MATα, DHH1-DHFR F[1,2]-natNT2, LSM7-DHFR F[3]-hphNT1
LTQ020 MATα, DHH1-DHFR F[1,2]-natNT2, PBP1-DHFR F[3]-hphNT1
LTQ021 MATα, SGN1-DHFR F[1,2]-natNT2, PUB1-DHFR F[3]-hphNT1
LTQ022 MATα, TOM70-DHFR F[1,2]-natNT2, ALO1-DHFR F[3]-hphNT1
LTQ023 MATα, CKB1-DHFR F[1,2]-natNT2, CKA2-DHFR F[3]-hphNT1
LTQ024 MATα, NOT5-DHFR F[1,2]-natNT2, MOT2-DHFR F[3]-hphNT1
LTQ025 MATα, LSB3-DHFR F[1,2]-natNT2, CUE5-DHFR F[3]-hphNT1
LTQ026 MATα, MMS2-DHFR F[1,2]-natNT2, SIP5-DHFR F[3]-hphNT1
LTQ027 MATα, PEX14-DHFR F[1,2]-natNT2, PEX17-DHFR F[3]-hphNT1
150
LTQ028 MATα, SLA1-DHFR F[1,2]-natNT2, END3-DHFR F[3]-hphNT1
LTQ029 MATα, YKE3-DHFR F[1,2]-natNT2, GIM5-DHFR F[3]-hphNT1
LTQ030 MATα, VPS29-DHFR F[1,2]-natNT2, VPS35-DHFR F[3]-hphNT1
JFL001 MATa/MATα, TPK1-DHFR F[1,2]-natNT2/TPK1, BCY1/ BCY1-DHFR F[3]-hphNT1
JFL002 MATa/MATα, TPK2-DHFR F[1,2]-natNT2/TPK2, BCY1/BCY1-DHFR F[3]-hphNT1
JFL003 MATα, TPK1-DHFR F[1,2]-natNT2, BCY-DHFR F[3]-hphNT1
JFL004 MATα, TPK2-DHFR F[1,2]-natNT2, BCY-DHFR F[3]-hphNT1
JFL005 MATa/MATα, TPK1/TPK1-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, pde1∆-KanMX/PDE1
JFL006 MATa/MATα, TPK1/TPK1-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, pde2∆-KanMX/PDE2
JFL007 MATa/MATα, TPK2/TPK2-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, pde1∆-KanMX/PDE1
JFL008 MATa/MATα, TPK2/TPK2-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, pde2∆-KanMX/PDE2
JFL009 MATa/MATα,TPK1/ TPK1-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, ho∆-KanMX/HO
JFL010 MATa/MATα, TPK2/TPK2-DHFR F[1,2]-natNT2, BCY1/BCY1-DHFR F[3]-hphNT1, ho∆-KanMX/HO
JFL011 MATa/MATα, TPK2-Myc-hphNT1/TPK2, BCY1/BCY1-HA-natNT2
Supplementary Table 2. Genotypes of the strains constructed in this study.
Experiments Primer Information Primer Sequence 5’ to 3’
qPCA C Oligo Forward YGR105W (VMA21) GTTTAGCTGCTGCAATGGCC
qPCA C Oligo Forward YOR270C (VPH1) AAGTTTTTCGTGGGTGAAGG
qPCA C Oligo Forward YDR101C (ARX1) GCCAAGGATAAGAGGTTCGG
qPCA C Oligo Forward YBR267W (YBR267W) GACTCAACAGCGTGTTTGGC
qPCA C Oligo Forward YDL160C (DHH1) ACAGGCGTATCCTCCACCGC
qPCA C Oligo Forward YEL015W (EDC3) CTGGCTGGCCTTTGATTGCC
qPCA C Oligo Forward YDL160C (DHH1) ACAGGCGTATCCTCCACCGC
qPCA C Oligo Forward YNL147W (LSM7) TTATAGGTGTCCTAAAAGGC
qPCA C Oligo Forward YDL160C (DHH1) ACAGGCGTATCCTCCACCGC
qPCA C Oligo Forward YGR178C (PBP1) AGCGAACGGGTCGGCAATGC
151
qPCA C Oligo Forward YIR001C (SGN1) AAAAACACTTCAACAGTGCC
qPCA C Oligo Forward YNL016W (PUB1) ACAGCAGCAGCAACAGGGCG
qPCA C Oligo Forward YNL121C (TOM70) ATTACTTTTGCTGAAGCCGC
qPCA C Oligo Forward YML086C (ALO1) AGGATTTGAAAAAGTTCCGG
qPCA C Oligo Forward YGL019W (CKB1) GATGAGGCAGTATCTGGTCC
qPCA C Oligo Forward YOR061W (CKA2) ATTAGCTGTTCCTGAAGTGG
qPCA C Oligo Forward YPR072W (NOT5) AATCTGAGGAGGAATCATGG
qPCA C Oligo Forward YER068W (MOT2) TAAGGTTCCTATTCAGCAGC
qPCA C Oligo Forward YFR024C-A (LSB3) ACCATTCAGAAAGGGTGACG
qPCA C Oligo Forward YOR042W (CUE5) GAACCCCTGGATACTACACC
qPCA C Oligo Forward YGL087C (MMS2) ACTGGAAAAGAGCCTACACC
qPCA C Oligo Forward YMR140W (SIP5) CGAACTTGAAGATCAAATGG
qPCA C Oligo Forward YGL153W (PEX14) GATAGCAACGCCTCCATTCC
qPCA C Oligo Forward YNL214W (PEX17) TTAACAGATAGGTCCCGAGC
qPCA C Oligo Forward YBL007C (SLA1) TTACAGAACCAACCTACTGG
qPCA C Oligo Forward YNL084C (END3) GTCGATAACTGATGACTTGG
qPCA C Oligo Forward YLR200W (YKE2) ATGCGAAAAGAACATAAGGG
qPCA C Oligo Forward YML094W (GIM5) TTCCTTGTCCATCGAGGCCC
qPCA C Oligo Forward YHR012W (VPS29) TAATTCACCAAGTTTCTGCC
qPCA C Oligo Forward YJL154C (VPS35) CACCAACTGAAGTATATCCC
qPCA Oligo Reverse to test DHFR integration CCATCTTTTCGTAAATTTCTG
PKA BCY1-DHFR integration Forward TGCAGTAGACGTATTAAAGCTCAATGATCCTACAAGACATGGCGGTGGCGGATCAGGAGGC
PKA BCY1-DHFR integration Reverse AGGAAATTCATGTGGATTTAAGATCGCTTCCCCTTTTTACTTCGACACTGGATGGCGGCGTTAG
PKA TPK1-DHFR integration Forward TCAAGGTGAAGACCCATATGCTGATCTTTTCCGGGACTTCGGCGGTGGCGGATCAGGAGGC
PKA TPK1-DHFR integration Reverse AATATAGATACGAGAGGAAAATACAACAAAACATTAGTCATTCGACACTGGATGGCGGCGTTAG
152
PKA TPK2-DHFR integration Forward TCAAGGCGATGATCCATATGCTGAATACTTTCAAGATTTCGGCGGTGGCGGATCAGGAGGC
PKA TPK2-DHFR integration Reverse GTACTTGAAAATTGTTTTTGTGTTTTTTGGTTCATGGAACTTCGACACTGGATGGCGGCGTTAG
PKA C Oligo Forward BCY1 GTGATCAAGGGGAGAACTTTTATTT
PKA C Oligo Forward TPK1 CGACTCTAACACGATGAAAACCTAT
PKA C Oligo Forward TPK2 GGTATCGGTGACACGTCT
CoIP BCY1-HA Forward TACTGGGTCCTGCAGTAGACGTATTAAAGCTCAATGATCCTACAAGACATCGTACGCTGCAGGTCGAC
CoIP BCY1-HA Reverse AAGAGAAAGGAAATTCATGTGGATTTAAGATCGCTTCCCCTTTTTACTTAATCGATGAATTCGAGCTCG
CoIP TPK1-MYC Forward ACTACGGTGTTCAAGGTGAAGACCCATATGCTGATCTTTTCCGGGACTTCCGTACGCTGCAGGTCGAC
CoIP TPK1-MYC Reverse AAAAAAAAATATAGATACGAGAGGAAAATACAACAAAACATTAGTCATTAATCGATGAATTCGAGCTCG
CoIP TPK2-MYC Forward ATTATGGTATTCAAGGCGATGATCCATATGCTGAATACTTTCAAGATTTCCGTACGCTGCAGGTCGAC
CoIP TPK2-MYC Reverse AGAGAAAGTACTTGAAAATTGTTTTTGTGTTTTTTGGTTCATGGAACTTAATCGATGAATTCGAGCTCG
CoIP Oligo Reverse to test MYC or HA integration
CGACAGTCACATCATGC
Kd Oligo used to check Ras and RBD's plasmids constructions
CAACATTTTCGGTTTGTATTAC
Kd Oligo Forward to amplify DHFR F[1,2] and clone in p413Gal1-Ras contain a restriction site BspEI
ATCGCAGGCTCCGGAGGTGGAGGTTCTGGAGGTATGGTTCGACCATTGAACTGC
Kd Oligo Reverse to amplify DHFR F[1,2] and clone in p413Gal1-Ras contain a
CGATGCCCGCCCCCGCTCGAGCTATGTTCTAGATTAGGTA
153
restriction site Xho1 CCCAA
Kd Oligo Forward to amplify DHFR F[3] and clone in p415Gal1-RBD contain a restriction site BspEI
CGTTGAGGCTCCGGAGGTGGAGGTTCTGGAGGTATGAGTAAAGTAGACATGGTT
Kd Oligo Reverse to amplify DHFR F[3] and clone in p415Gal1-RBD contain a restriction site Xho1
AGATCGCCGCCCCCGCTCGAGCTAAGTTCTAGATTAGTCTTTCTT
C Oligos were used to confirm the integration at the proper locus. Supplementary Table 3. Oligonucleotides used in this study.
154
155
Supplementary Fig. 1. Dynamics of the interactions between the PKA regulatory and catalytic subunits in response to different perturbations. (A) Comparison of the DHFR-qPCA signal for the Bcy1-Tpk1 interaction in glucose and galactose (B) DHFR-qPCA signal for the interaction Bcy1-Tpk1 in cells grown in media supplemented with caffeine at different concentrations. (C) DHFR-qPCA signal for the Bcy1-Tpk1 interaction in cells grown in media supplemented with methyl methanesulfonate. (D) DHFR-qPCA signal for the Bcy1-Tpk1 interaction in cells grown in media supplemented with galactose and methyl methanesulfonate at different concentrations. (E) DHFR-qPCA signal for the interaction Bcy1-Tpk1 in strains carrying an additional copy (on a low copy number plasmid) or a deletion of one copy (heterozygous strain) of the genes coding for the PDE enzymes (left and right panel, respectively). In all cases, n represents the number of independent replicates.
156
Supplementary Fig. 2. Comparing PPIs using the M relative interaction score. (A) The difference between the lag times in DMSO and MTX (∆L) is calculated for all interactions. (B) M scores are calculated for each interaction by subtracting to the maximum ∆L of all interaction the ∆L of a specific interaction. (C) Bar graphs are generated to compare the relative interaction scores of all interactions tested.
157
References Albuquerque CP, Smolka MB, Payne SH, Bafna V, Eng J, Zhou HL (2008) A multidimensional chromatography technology for in-depth phosphoproteome analysis. Mol Cell Proteomics 7: 1389-1396 Alfaro JF, Gong CX, Monroe ME, Aldrich JT, Clauss TR, Purvine SO, Wang Z, Camp DG, 2nd, Shabanowitz J, Stanley P, Hart GW, Hunt DF, Yang F, Smith RD (2012) Tandem mass spectrometry identifies many mouse brain O-GlcNAcylated proteins including EGF domain-specific O-GlcNAc transferase targets. Proc Natl Acad Sci U S A 109: 7280-7285 Amberg DC, Burke DJ, Strathern JN (2005) Methods in yeast genetics: Cold Spring harbor Laboratory Press. Amoutzias GD, He Y, Gordon J, Mossialos D, Oliver SG, Van de Peer Y (2010) Posttranslational regulation impacts the fate of duplicated genes. Proceedings of the National Academy of Sciences of the United States of America 107: 2967-2971 Ashe MP, De Long SK, Sachs AB (2000) Glucose depletion rapidly inhibits translation initiation in yeast. Mol Biol Cell 11: 833-848 Ba ANN, Moses AM (2010) Evolution of characterized phosphorylation sites in budding yeast. Molecular Biology and Evolution 27: 2027-2037 Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guenole A, van Attikum H, Shokat KM, Kolodner RD, Huh WK, Aebersold R, Keogh MC, Krogan NJ et al (2010) Rewiring of Genetic Networks in Response to DNA Damage. Science 330: 1385-1389 Barr RK, Bogoyevitch MA (2001) The c-Jun N-terminal protein kinase family of mitogen-activated protein kinases (JNK MAPKs). Int J Biochem Cell B 33: 1047-1063 Basu U, Wang YB, Alt FW (2008) Evolution of Phosphorylation-Dependent Regulation of Activation-Induced Cytidine Deaminase. Mol Cell 32: 285-291 Beausoleil SA, Jedrychowski M, Schwartz D, Elias JE, Villen J, Li J, Cohn MA, Cantley LC, Gygi SP (2004) Large-scale characterization of HeLa cell nuclear phosphoproteins. Proceedings of the National Academy of Sciences of the United States of America 101: 12130-12135 Bell SP, Dutta A (2002) DNA replication in eukaryotic cells. Annu Rev Biochem 71: 333-374 Beltrao P, Albanese V, Kenner LR, Swaney DL, Burlingame A, Villen J, Lim WA, Fraser JS, Frydman J, Krogan NJ (2012) Systematic functional prioritization of protein posttranslational modifications. Cell 150: 413-425 Beltrao P, Bork P, Krogan NJ, van Noort V (2013) Evolution and functional cross-talk of protein post-translational modifications. Mol Syst Biol 9: 714 Beltrao P, Trinidad JC, Fiedler D, Roguev A, Lim WA, Shokat KM, Burlingame AL, Krogan NJ (2009) Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species. PLoS Biology 7: e1000134-e1000134
158
Benayoun BA, Veitia RA (2009) A post-translational modification code for transcription factors: sorting through a sea of signals. Trends in cell biology 19: 189-197 Block C, Janknecht R, Herrmann C, Nassar N, Wittinghofer A (1996) Quantitative structure-activity analysis correlating Ras/Raf interaction in vitro to Raf activation in vivo. Nat Struct Biol 3: 244-251 Bodenmiller B, Mueller LN, Mueller M, Domon B, Aebersold R (2007) Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nature Methods 4: 231-237 Boekhorst J, van Breukelen B, Heck A, Jr., Snel B (2008) Comparative phosphoproteomics reveals evolutionary and functional conservation of phosphorylation across eukaryotes. Genome biology 9: R144 Boulais J, Trost M, Landry CR, Dieckmann R, Levy ED, Soldati T, Michnick SW, Thibault P, Desjardins M (2010) Molecular characterization of the evolution of phagosomes. Mol Syst Biol 6: 423 Brewster RC, Weinert FM, Garcia HG, Song D, Rydenfelt M, Phillips R (2014) The transcription factor titration effect dictates level of gene expression. Cell 156: 1312-1323 Brooks CL, Gu W (2003) Ubiquitination, phosphorylation and acetylation: the molecular basis for p53 regulation. Curr Opin Cell Biol 15: 164-171 Budovskaya YV, Stephan JS, Deminoff SJ, Herman PK (2005) An evolutionary proteomics approach identifies substrates of the cAMP-dependent protein kinase. Proceedings of the National Academy of Sciences of the United States of America 102: 13933-13938 Bullock AN, Das S, Debreczeni JE, Rellos P, Fedorov O, Niesen FH, Guo K, Papagrigoriou E, Amos AL, Cho S, Turk BE, Ghosh G, Knapp S (2009) Kinase domain insertions define distinct roles of CLK kinases in SR protein phosphorylation. Structure 17: 352-362 Bullock AN, Debreczeni J, Amos AL, Knapp S, Turk BE (2005) Structure and substrate specificity of the Pim-1 kinase. The Journal of biological chemistry 280: 41675-41682 Bunkoczi G, Salah E, Filippakopoulos P, Fedorov O, Muller S, Sobott F, Parker SA, Zhang H, Min W, Turk BE, Knapp S (2007) Structural and functional characterization of the human protein kinase ASK1. Structure 15: 1215-1226 Caron C, Boyault C, Khochbin S (2005) Regulatory cross-talk between lysine acetylation and ubiquitination: role in the control of protein stability. BioEssays : news and reviews in molecular, cellular and developmental biology 27: 408-415 Chen C, Turk BE (2010) Analysis of serine-threonine kinase specificity using arrayed positional scanning peptide libraries. Current protocols in molecular biology / edited by Frederick M Ausubel [et al] Chapter 18: Unit 18 14 Chi A, Huttenhower C, Geer LY, Coon JJ, Syka JEP, Bai DL, Shabanowitz J, Burke DJ, Troyanskaya OG, Hunt DF (2007) Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proceedings of the National Academy of Sciences of the United States of America 104: 2193-2198 Choudhary C, Kumar C, Gnad F, Nielsen ML, Rehman M, Walther TC, Olsen JV, Mann M (2009) Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science 325: 834-840
159
Cohen P (2000) The regulation of protein function by multisite phosphorylation - a 25 year update. Trends in Biochemical Sciences 25: 596-601 Conant GC, Wolfe KH (2008) Turning a hobby into a job: how duplicated genes find new functions. Nature reviews Genetics 9: 938-950 Courcelles M, Lemieux S, Voisin L, Meloche S, Thibault P (2011) ProteoConnections: a bioinformatics platform to facilitate proteome and phosphoproteome analyses. Proteomics 11: 2654-2671 Crick F (1970) Central dogma of molecular biology. Nature 227: 561-563 Davis TL, Walker JR, Allali-Hassani A, Parker SA, Turk BE, Dhe-Paganon S (2009) Structural recognition of an optimized substrate for the ephrin family of receptor tyrosine kinases. Febs J 276: 4395-4404 Dean AM, Thornton JW (2007) Mechanistic approaches to the study of evolution: the functional synthesis. Nature reviews Genetics 8: 675-688 Dechant R, Peter M (2008) Nutrient signals driving cell growth. Curr Opin Cell Biol 20: 678-687 Deribe YL, Pawson T, Dikic I (2010) Post-translational modifications in signal integration. Nature structural & molecular biology 17: 666-672 Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, Diella F (2011) Phospho.ELM: a database of phosphorylation sites--update 2011. Nucleic acids research 39: D261-267 Dulai KS, von Dornum M, Mollon JD, Hunt DM (1999) The evolution of trichromatic color vision by opsin gene duplication in New World and Old World primates. Genome research 9: 629-638 Ear PH, Michnick SW (2009) A general life-death selection strategy for dissecting protein functions. Nature methods 6: 813-816 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792-1797 Efstratiadis A, Posakony JW, Maniatis T, Lawn RM, O'Connell C, Spritz RA, DeRiel JK, Forget BG, Weissman SM, Slightom JL, Blechl AE, Smithies O, Baralle FE, Shoulders CC, Proudfoot NJ (1980) The structure and evolution of the human beta-globin gene family. Cell 21: 653-668 Eisenberg E, Levanon EY (2003) Human housekeeping genes are compact. Trends Genet 19: 362-365 Fazili Z, Sun WP, Mittelstaedt S, Cohen C, Xu XX (1999) Disabled-2 inactivation is an early step in ovarian tumorigenicity. Oncogene 18: 3104-3113 Ferris SD, Whitt GS (1979) Evolution of the Differential Regulation of Duplicate Genes after Polyploidization. J Mol Evol 12: 267-317 Filippakopoulos P, Kofler M, Hantschel O, Gish GD, Grebien F, Salah E, Neudecker P, Kay LE, Turk BE, Superti-Furga G, Pawson T, Knapp S (2008) Structural coupling of SH2-kinase domains links Fes and Abl substrate recognition and kinase activation. Cell 134: 793-803
160
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 1531-1545 Freschi L, Courcelles M, Thibault P, Michnick SW, Landry CR (2011) Phosphorylation network rewiring by gene duplication. Mol Syst Biol 7 Freschi L, Osseni M, Landry CR (2014) Functional divergence and evolutionary turnover in mammalian phosphoproteomes. PLoS Genet 10: e1004062 Gasch AP, Moses AM, Chiang DY, Fraser HB, Berardini M, Eisen MB (2004) Conservation and evolution of cis-regulatory systems in ascomycete fungi. PLoS Biol 2: e398-e398 Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M et al (2003) A protein interaction map of Drosophila melanogaster. Science 302: 1727-1736 Glotzer M, Murray AW, Kirschner MW (1991) Cyclin is degraded by the ubiquitin pathway. Nature 349: 132-138 Gnad F, de Godoy LMF, Cox J, Neuhauser N, Ren S, Olsen JV, Mann M (2009) High-accuracy identification and bioinformatic analysis of in vivo protein phosphorylation sites in yeast. Proteomics 9: 4642-4652 Gnad F, Gunawardena J, Mann M (2011) PHOSIDA 2011: the posttranslational modification database. Nucleic acids research 39: D253-260 Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M (2007) PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome biology 8: R250 Gordon JL, Byrne KP, Wolfe KH (2009) Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome. PLoS Genetics 5: e1000485-e1000485 Gordon R (1994) Evolution Escapes Rugged Fitness Landscapes by Gene or Genome Doubling - the Blessing of Higher Dimensionality. Comput Chem 18: 325-331 Gough NR, Wong W (2010) Focus Issue: The Evolution of Complexity. Sci Signal 3: eg5-eg5 Gray VE, Kumar S (2011) Rampant purifying selection conserves positions with posttranslational modifications in human proteins. Mol Biol Evol 28: 1565-1568 Gruhler A, Olsen JV, Mohammed S, Mortensen P, Faergeman NJ, Mann M, Jensen ON (2005) Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Molecular & Cellular Proteomics: MCP 4: 310-327 Gu X, Zhang Z, Huang W (2005) Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proceedings of the National Academy of Sciences of the United States of America 102: 707-712 Gu ZL, Nicolae D, Lu HHS, Li WH (2002) Rapid divergence in expression between duplicate genes inferred from microarray data. Trends in Genetics 18: 609-613
161
Gwinn DM, Shackelford DB, Egan DF, Mihaylova MM, Mery A, Vasquez DS, Turk BE, Shaw RJ (2008) AMPK phosphorylation of raptor mediates a metabolic checkpoint. Mol Cell 30: 214-226 Hansen TF, Carter AJR, Chiu CH (2000) Gene conversion may aid adaptive peak shifts. J Theor Biol 207: 495-511 Hart GW, Slawson C, Ramirez-Correa G, Lagerlof O (2011) Cross talk between O-GlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease. Annu Rev Biochem 80: 825-858 He XL, Zhang JZ (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169: 1157-1164 Herbig U, Griffith JW, Fanning E (2000) Mutation of cyclin/cdk phosphorylation sites in HsCdc6 disrupts a late step in initiation of DNA replication in human cells. Mol Biol Cell 11: 4117-4130 Hill JA, Otto SP (2007) The role of pleiotropy in the maintenance of sex in yeast. Genetics 175: 1419-1427 Ho CH, Magtanong L, Barker SL, Gresham D, Nishimura S, Natarajan P, Koh JLY, Porter J, Gray CA, Andersen RJ, Giaever G, Nislow C, Andrews B, Botstein D, Graham TR, Yoshida M, Boone C (2009) A molecular barcoded yeast ORF library enables mode-of-action analysis of bioactive compounds. Nat Biotechnol 27: 369-377 Holmberg CI, Tran SEF, Eriksson JE, Sistonen L (2002) Multisite phosphorylation provides sophisticated regulation of transcription factors. Trends in Biochemical Sciences 27: 619-627 Holt LJ, Tuch BB, Villen J, Johnson AD, Gygi SP, Morgan DO (2009) Global Analysis of Cdk1 Substrate Phosphorylation Sites Provides Insights into Evolution. Science 325: 1682-1686 Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic acids research 40: D261-270 Hsueh KW, Fu SL, Chang CB, Chang YL, Lin CH (2013) A novel Aurora-A-mediated phosphorylation of p53 inhibits its interaction with MDM2. Biochimica et biophysica acta 1834: 508-515 Hunter T (2000) Signaling - 2000 and beyond. Cell 100: 113-127 Hunter T (2007) The age of crosstalk: phosphorylation, ubiquitination, and beyond. Mol Cell 28: 730-738 Hurles M (2004) Gene duplication: the genomic trade in spare parts. PLoS Biology 2: 900-904 Hutti JE, Jarrell ET, Chang JD, Abbott DW, Storz P, Toker A, Cantley LC, Turk BE (2004) A rapid method for determining protein kinase phosphorylation specificity. Nature methods 1: 27-29 Huttlin EL, Jedrychowski MP, Elias JE, Goswami T, Rad R, Beausoleil SA, Villen J, Haas W, Sowa ME, Gygi SP (2010) A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143: 1174-1189 Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic acids research 32: 1037-1049 Ideker T, Krogan NJ (2012) Differential network biology. Mol Syst Biol 8
162
ImageJ -- imagej.nih.gov/ij/ Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America 98: 4569-4574 Janke C, Magiera MM, Rathfelder N, Taxis C, Reber S, Maekawa H, Moreno-Borchart A, Doenges G, Schwob E, Schiebel E, Knop M (2004) A versatile toolbox for PCR-based tagging of yeast genes: new fluorescent proteins, more markers and promoter substitution cassettes. Yeast 21: 947-962 Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P (2006) Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature 443: 594-597 Jin H, Zangar RC (2009) Protein modifications as potential biomarkers in breast cancer. Biomarker insights 4: 191-200 Johnson KE, Cameron S, Toda T, Wigler M, Zoller MJ (1987) Expression in Escherichia-Coli of Bcy1, the Regulatory Subunit of Cyclic Amp-Dependent Protein-Kinase from Saccharomyces-Cerevisiae - Purification and Characterization. J Biol Chem 262: 8636-8642 Kaganovich M, Snyder M (2012) Phosphorylation of yeast transcription factors correlates with the evolution of novel sequence and function. Journal of Proteome Research 11: 261-268 Kamemura K, Hayes BK, Comer FI, Hart GW (2002) Dynamic interplay between O-glycosylation and O-phosphorylation of nucleocytoplasmic proteins: alternative glycosylation/phosphorylation of THR-58, a known mutational hot spot of c-Myc in lymphomas, is regulated by mitogens. J Biol Chem 277: 19229-19235 Kantarci S, Al-Gazali L, Hill RS, Donnai D, Black GC, Bieth E, Chassaing N, Lacombe D, Devriendt K, Teebi A, Loscertales M, Robson C, Liu T, MacLaughlin DT, Noonan KM, Russell MK, Walsh CA, Donahoe PK, Pober BR (2007) Mutations in LRP2, which encodes the multiligand receptor megalin, cause Donnai-Barrow and facio-oculo-acoustico-renal syndromes. Nature genetics 39: 957-959 Kapoor M, Hamm R, Yan W, Taya Y, Lozano G (2000) Cooperative phosphorylation at multiple sites is required to activate p53 in response to UV radiation. Oncogene 19: 358-364 Kastan MB (2008) DNA damage responses: mechanisms and roles in human disease: 2007 G.H.A. Clowes Memorial Award Lecture. Molecular cancer research : MCR 6: 517-524 Kellis M, Birren BW, Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428: 617-624 Kent WJ (2002) BLAT---The BLAST-Like Alignment Tool. Genome Research 12: 656-664 Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M et al (2009) Human Protein Reference Database--2009 update. Nucleic acids research 37: D767-772
163
Khmelinskii A, Roostalu J, Roque H, Antony C, Schiebel E (2009) Phosphorylation-Dependent Protein Interactions at the Spindle Midzone Mediate Cell Cycle Regulation of Spindle Elongation. Dev Cell 17: 244-256 Khoury GA, Baliban RC, Floudas CA (2011) Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Scientific reports 1 Kikani CK, Antonysamy SA, Bonanno JB, Romero R, Zhang FF, Russell M, Gheyi T, Iizuka M, Emtage S, Sauder JM, Turk BE, Burley SK, Rutter J (2010) Structural bases of PAS domain-regulated kinase (PASK) activation in the absence of activation loop phosphorylation. The Journal of biological chemistry 285: 41034-41043 Kim DS, Hahn Y (2011) Identification of novel phosphorylation modification sites in human proteins that originated after the human-chimpanzee divergence. Bioinformatics 27: 2494-2501 Kim W, Bennett EJ, Huttlin EL, Guo A, Li J, Possemato A, Sowa ME, Rad R, Rush J, Comb MJ, Harper JW, Gygi SP (2011) Systematic and quantitative assessment of the ubiquitin-modified proteome. Mol Cell 44: 325-340 Koivomagi M, Valk E, Venta R, Iofik A, Lepiku M, Balog ER, Rubin SM, Morgan DO, Loog M (2011) Cascades of multisite phosphorylation control Sic1 destruction at the onset of S phase. Nature 480: 128-131 Kondrashov FA, Koonin EV (2004) A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet 20: 287-290 Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV (2002) Selection in the evolution of gene duplications. Genome biology 3 Kozlov SV, Graham ME, Jakob B, Tobias F, Kijas AW, Tanuji M, Chen P, Robinson PJ, Taucher-Scholz G, Suzuki K, So S, Chen D, Lavin MF (2011) Autophosphorylation and ATM activation: additional sites add to the complexity. J Biol Chem 286: 9107-9119 Krogan NJ, Cagney G, Yu HY, Zhong GQ, Guo XH, Ignatchenko A, Li J, Pu SY, Datta N, Tikuisis AP, Punna T, Peregrin-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440: 637-643 Kurmangaliyev YZ, Goland A, Gelfand MS (2011) Evolutionary patterns of phosphorylated serines. Biol Direct 6 Landry CR, Levy ED, Michnick SW (2009) Weak functional constraints on phosphoproteomes. Trends in Genetics: TIG 25: 193-197 Latham JA, Dent SY (2007) Cross-regulation of histone modifications. Nature structural & molecular biology 14: 1017-1024 Levine AJ, Oren M (2009) The first 30 years of p53: growing ever more complex. Nature reviews Cancer 9: 749-758
164
Levy E, Michnick S, Landry C (2012) Protein abundance is key to distinguish promiscuous from functional phosphorylation based on evolutionary information. Philosophical transactions of the Royal Society of London Series B, Biological sciences 367: 2594-2606 Li M, Luo J, Brooks CL, Gu W (2002) Acetylation of p53 inhibits its ubiquitination by Mdm2. J Biol Chem 277: 50607-50611 Li SM, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JDJ, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF et al (2004) A map of the interactome network of the metazoan C-elegans. Science 303: 540-543 Li X, Gerber SA, Rudner AD, Beausoleil SA, Haas W, Villen J, Elias JE, Gygi SP (2007) Large-scale phosphorylation analysis of alpha-factor-arrested Saccharomyces cerevisiae. Journal of Proteome Research 6: 1190-1197 Lienhard GE (2008) Non-functional phosphorylations? Trends in Biochemical Sciences 33: 351-352 Lim WA, Pawson T (2010) Phosphotyrosine signaling: evolving a new cellular communication system. Cell 142: 661-667 Lin DI, Barbash O, Kumar KG, Weber JD, Harper JW, Klein-Szanto AJ, Rustgi A, Fuchs SY, Diehl JA (2006) Phosphorylation-dependent ubiquitination of cyclin D1 by the SCF(FBX4-alphaB crystallin) complex. Mol Cell 24: 355-366 Liu X, Yu X, Zack DJ, Zhu H, Qian J (2008) TiGER: a database for tissue-specific gene expression and regulation. BMC Bioinformatics 9: 271 Livanova NB, Chebotareva NA, Eronina TB, Kurganov BI (2002) Pyridoxal 5'-phosphate as a catalytic and conformational cofactor of muscle glycogen phosphorylase B. Biochemistry Biokhimiia 67: 1089-1098 Longo VD (2003) The Ras and Sch9 pathways regulate stress resistance and longevity. Exp Gerontol 38: 807-811 Lu CT, Huang KY, Su MG, Lee TY, Bretana NA, Chang WC, Chen YJ, Chen YJ, Huang HD (2013) DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res 41: D295-305 Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151-1155 Lynch M, Force A (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics 154: 459-459 Lynch M, Sung W, Morris K, Coffey N, Landry CR, Dopman EB, Dickinson WJ, Okamoto K, Kulkarni S, Hartl DL, Thomas WK (2008) A genome-wide view of the spectrum of spontaneous mutations in yeast. Proceedings of the National Academy of Sciences of the United States of America 105: 9272-9277 Ma PS, Wera S, Van Dijck P, Thevelein JM (1999) The PDE1-encoded low-affinity phosphodiesterase in the yeast Saccharomyces cerevisiae has a specific function in controlling agonist-induced cAMP signaling. Mol Biol Cell 10: 91-104
165
Macek B, Gnad F, Soufi B, Kumar C, Olsen JV, Mijakovic I, Mann M (2008) Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Molecular & cellular proteomics : MCP 7: 299-307 Madeo F, Schlauer J, Zischka H, Mecke D, Frohlich KU (1998) Tyrosine phosphorylation regulates cell cycle-dependent nuclear localization of Cdc48p. Mol Biol Cell 9: 131-141 Malik R, Nigg EA, Korner R (2008) Comparative conservation analysis of the human mitotic phosphoproteome. Bioinformatics 24: 1426-1432 Mann M, Jensen ON (2003) Proteomic analysis of post-translational modifications. Nat Biotechnol 21: 255-261 Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) The protein kinase complement of the human genome. Science 298: 1912-1934 Marcantonio M, Trost M, Courcelles M, Desjardins M, Thibault P (2008) Combined enzymatic and data mining approaches for comprehensive phosphoproteome analyses: application to cell signaling events of interferon-gamma-stimulated macrophages. Molecular & Cellular Proteomics: MCP 7: 645-660 Martin DE, Soulard A, Hall MN (2004) TOR regulates ribosomal protein gene expression via PKA and the forkhead transcription factor FHL1. Cell 119: 969-979 Meetei AR, Medhurst AL, Ling C, Xue Y, Singh TR, Bier P, Steltenpool J, Stone S, Dokal I, Mathew CG, Hoatlin M, Joenje H, de Winter JP, Wang W (2005) A human ortholog of archaeal DNA repair protein Hef is defective in Fanconi anemia complementation group M. Nature genetics 37: 958-963 Michnick SW, Ear PH, Manderson EN, Remy I, Stefan E (2007) Universal strategies in research and drug discovery based on protein-fragment complementation assays. Nat Rev Drug Discov 6: 569-582 Miller ML, Jensen LJ, Diella F, Jorgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, Sicheritz-Ponten T, Olhovsky M, Pasculescu A, Alexander J, Knapp S, Blom N, Bork P, Li S, Cesareni G, Pawson T, Turk BE et al (2008) Linear motif atlas for phosphorylation-dependent signaling. Science signaling 1: ra2 Minguez P, Parca L, Diella F, Mende DR, Kumar R, Helmer-Citterich M, Gavin AC, van Noort V, Bork P (2012) Deciphering a global network of functionally associated post-translational modifications. Mol Syst Biol 8: 599 Miura Y, Sakurai Y, Endo T (2012) O-GlcNAc modification affects the ATM-mediated DNA damage response. Biochimica et biophysica acta 1820: 1678-1685 Mok J, Kim PM, Lam HYK, Piccirillo S, Zhou X, Jeschke GR, Sheridan DL, Parker SA, Desai V, Jwa M, Cameroni E, Niu H, Good M, Remenyi A, Ma J-LN, Sheu Y-J, Sassi HE, Sopko R, Chan CSM, De Virgilio C et al (2010) Deciphering protein kinase specificity through large-scale analysis of yeast phosphorylation site motifs. Science Signaling 3: ra12-ra12 Moll UM, Petrenko O (2003) The MDM2-p53 interaction. Molecular cancer research : MCR 1: 1001-1008 Morell M, Ventura S, Aviles FX (2009) Protein complementation assays: Approaches for the in vivo analysis of protein interactions. Febs Lett 583: 1684-1691 Moses AM, Landry CR (2010) Moving from transcriptional to phospho-evolution: generalizing regulatory evolution? Trends in Genetics: TIG 26: 462-467
166
Moses AM, Liku ME, Li JJ, Durbin R (2007) Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites. Proceedings of the National Academy of Sciences of the United States of America 104: 17713-17718 Mukherjee S, Keitany G, Li Y, Wang Y, Ball HL, Goldsmith EJ, Orth K (2006) Yersinia YopJ acetylates and inhibits kinase activation by blocking phosphorylation. Science 312: 1211-1214 Musso G, Costanzo M, Huangfu M, Smith AM, Paw J, San Luis B-J, Boone C, Giaever G, Nislow C, Emili A, Zhang Z (2008) The extensive and condition-dependent nature of epistasis among whole-genome duplicates in yeast. Genome Research 18: 1092-1099 Nash P, Tang X, Orlicky S, Chen Q, Gertler FB, Mendenhall MD, Sicheri F, Pawson T, Tyers M (2001) Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication. Nature 414: 514-521 Nguyen Ba AN, Moses AM (2010) Evolution of Characterized Phosphorylation Sites in Budding Yeast. Molecular Biology and Evolution 27: 2027-2037 Nussinov R, Tsai CJ, Xin F, Radivojac P (2012) Allosteric post-translational modification codes. Trends in biochemical sciences 37: 447-455 Ohno S (1970) Evolution by gene duplication, London, New York,: Allen & Unwin; Springer-Verlag. Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M (2006) Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127: 635-648 Olsen JV, Mann M (2013) Status of large-scale analysis of post-translational modifications by mass spectrometry. Mol Cell Proteomics 12: 3444-3452 Papp B, Pál C, Hurst LD (2003) Evolution of cis-regulatory elements in duplicated genes of yeast. Trends in Genetics: TIG 19: 417-422 Pearlman SM, Serber Z, Ferrell JE (2011) A Mechanism for the Evolution of Phosphorylation Sites. Cell 147: 934-946 Pike AC, Rellos P, Niesen FH, Turnbull A, Oliver AW, Parker SA, Turk BE, Pearl LH, Knapp S (2008) Activation segment dimerization: a mechanism for kinase autophosphorylation of non-consensus sites. Embo J 27: 704-714 Pincus D, Letunic I, Bork P, Lim WA (2008) Evolution of the phospho-tyrosine signaling machinery in premetazoan lineages. Proc Natl Acad Sci U S A 105: 9680-9684 Portela P, Van Dijck P, Thevelein JM, Moreno S (2003) Activation state of protein kinase A as measured in permeabilised Saccharomyces cerevisiae cells correlates with PKA-controlled phenotypes in vivo. Fems Yeast Res 3: 119-126
167
Prabakaran S, Lippens G, Steen H, Gunawardena J (2012) Post-translational modification: nature's escape from genetic imprisonment and the basis for dynamic information encoding. Wiley interdisciplinary reviews Systems biology and medicine 4: 565-583 Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, McCartney RR, Schmidt MC, Rachidi N, Lee S-J, Mah AS, Meng L, Stark MJR, Stern DF, De Virgilio C, Tyers M et al (2005) Global analysis of protein phosphorylation in yeast. Nature 438: 679-684 Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18 Suppl 1: S71-77 Ramachandran V, Shah KH, Herman PK (2011) The cAMP-Dependent Protein Kinase Signaling Pathway Is a Key Regulator of P Body Foci Formation. Mol Cell 43: 973-981 Reinders J, Wagner K, Zahedi RP, Stojanovski D, Eyrich B, van der Laan M, Rehling P, Sickmann A, Pfanner N, Meisinger C (2007) Profiling phosphoproteins of yeast mitochondria reveals a role of phosphorylation in assembly of the ATP synthase. Mol Cell Proteomics 6: 1896-1906 Rennefahrt UE, Deacon SW, Parker SA, Devarajan K, Beeser A, Chernoff J, Knapp S, Turk BE, Peterson JR (2007) Specificity profiling of Pak kinases allows identification of novel phosphorylation sites. The Journal of biological chemistry 282: 15667-15678 Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta OR, Strauss SH, Brunner AM, DiFazio SP (2012) Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus. Genome research 22: 95-105 Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li SM et al (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437: 1173-1178 Ruan HB, Singh JP, Li MD, Wu J, Yang X (2013) Cracking the O-GlcNAc code in metabolism. Trends in endocrinology and metabolism: TEM 24: 301-309 Scannell DR, Wolfe KH (2008) A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast. Genome research 18: 137-147 Schlecht U, Miranda M, Suresh S, Davis RW, St Onge RP (2012) Multiplex assay for condition-dependent changes in protein-protein interactions. Proceedings of the National Academy of Sciences of the United States of America Schwammle V, Aspalter CM, Sidoli S, Jensen ON (2014) Large scale analysis of co-existing post-translational modifications in histone tails reveals global fine structure of cross-talk. Molecular & cellular proteomics : MCP 13: 1855-1865 Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M (2011) Global quantification of mammalian gene expression control. Nature 473: 337-342
168
Searle JS, Wood MD, Kaur M, Tobin DV, Sanchez Y (2011) Proteins in the Nutrient-Sensing and DNA Damage Checkpoint Pathways Cooperate to Restrain Mitotic Progression following DNA Damage. Plos Genetics 7 Seo J, Lee KJ (2004) Post-translational modifications and their biological functions: Proteomic analysis and systematic approaches. J Biochem Mol Biol 37: 35-44 Seoighe C, Wolfe KH (1999) Yeast genome evolution in the post-genome era. Current opinion in microbiology 2: 548-554 Serber Z, Ferrell JE (2007) Tuning bulk electrostatics to regulate protein function. Cell 128: 441-444 Serber Z, Ferrell Jr JE (2007) Tuning Bulk Electrostatics to Regulate Protein Function. Cell 128: 441-444 Sharma K, D'Souza RC, Tyanova S, Schaab C, Wisniewski JR, Cox J, Mann M (2014) Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell reports 8: 1583-1594 Sheridan DL, Kong Y, Parker SA, Dalby KN, Turk BE (2008) Substrate discrimination among mitogen-activated protein kinases through distinct docking sequence motifs. J Biol Chem 283: 19511-19520 Skou JC (1965) Enzymatic Basis for Active Transport of Na+ and K+ across Cell Membrane. Physiol Rev 45: 596-& Souciet J-L, Dujon B, Gaillardin C, Johnston M, Baret PV, Cliften P, Sherman DJ, Weissenbach J, Westhof E, Wincker P, Jubin C, Poulain J, Barbe Vr, Ségurens Ba, Artiguenave Fß, Anthouard Vr, Vacherie B, Val M-E, Fulton RS, Minx P et al (2009) Comparative genomics of protoploid Saccharomycetaceae. Genome Research 19: 1696-1709 Soulard A, Cremonesi A, Moes S, Schutz F, Jeno P, Hall MN (2010) The Rapamycin-sensitive Phosphoproteome Reveals That TOR Controls Protein Kinase A Toward Some But Not All Substrates. Mol Biol Cell 21: 3475-3486 Sprang SR, Acharya KR, Goldsmith EJ, Stuart DI, Varvill K, Fletterick RJ, Madsen NB, Johnson LN (1988) Structural changes in glycogen phosphorylase induced by phosphorylation. Nature 336: 215-221 Stefan E, Aquin S, Berger N, Landry CR, Nyfeler B, Bouvier M, Michnick SW (2007) Quantification of dynamic protein complexes using Renilla luciferase fragment complementation applied to protein kinase A activities in vivo. Proceedings of the National Academy of Sciences of the United States of America 104: 16916-16921 Tan CS, Bodenmiller B, Pasculescu A, Jovanovic M, Hengartner MO, Jorgensen C, Bader GD, Aebersold R, Pawson T, Linding R (2009) Comparative analysis reveals conserved protein phosphorylation networks implicated in multiple diseases. Science signaling 2: ra39 Tarassov K, Messier V, Landry CR, Radinovic S, Molina MMS, Shames I, Malitskaya Y, Vogel J, Bussey H, Michnick SW (2008) An in vivo map of the yeast protein interactome. Science 320: 1465-1470 Tarrant MK, Cole PA (2009) The Chemical Biology of Protein Phosphorylation. Annu Rev Biochem 78: 797-825
169
Taylor SS, Radzioandzelm E, Hunter T (1995) Protein-Kinases .8. How Do Protein-Kinases Discriminate between Serine Threonine and Tyrosine - Structural Insights from the Insulin-Receptor Protein-Tyrosine Kinase. Faseb J 9: 1255-1266 The R project for Statistical Computing -- www.r-project.org/ Thingholm TE, Jørgensen TJD, Jensen ON, Larsen MR (2006) Highly selective enrichment of phosphorylated peptides using titanium dioxide. Nature Protocols 1: 1929-1935 Tirosh I, Barkai N (2007) Comparative analysis indicates regulatory neofunctionalization of yeast duplicates. Genome Biology 8: R50-R50 Toda T, Cameron S, Sass P, Zoller M, Wigler M (1987) Three different genes in S. cerevisiae encode the catalytic subunits of the cAMP-dependent protein kinase. Cell 50: 277-287 Treco DA, Winston F (2008) Growth and manipulation of yeast. Current protocols in molecular biology Chapter 13: Unit 13.12-Unit 13.12 Trinidad JC, Barkan DT, Gulledge BF, Thalhammer A, Sali A, Schoepfer R, Burlingame AL (2012) Global identification and characterization of both O-GlcNAcylation and phosphorylation at the murine synapse. Mol Cell Proteomics 11: 215-229 Ubersax JA, Ferrell JE, Jr. (2007) Mechanisms of specificity in protein phosphorylation. Nat Rev Mol Cell Biol 8: 530-541 Ubersax JA, Woodbury EL, Quang PN, Paraz M, Blethrow JD, Shah K, Shokat KM, Morgan DO (2003) Targets of the cyclin-dependent kinase Cdk1. Nature 425: 859-864 Uckun FM, Ma H, Zhang J, Ozer Z, Dovat S, Mao C, Ishkhanian R, Goodman P, Qazi S (2012) Serine phosphorylation by SYK is critical for nuclear localization and transcription factor function of Ikaros. Proc Natl Acad Sci U S A 109: 18072-18077 van Hoof A (2005) Conserved Functions of Yeast Genes Support the Duplication, Degeneration and Complementation Model for Gene Duplication. Genetics 171: 1455-1461 Vazquez F, Ramaswamy S, Nakamura N, Sellers WR (2000) Phosphorylation of the PTEN tail regulates protein stability and function. Mol Cell Biol 20: 5010-5018 Verma R, Annan RS, Huddleston MJ, Carr SA, Reynard G, Deshaies RJ (1997) Phosphorylation of Sic1p by G1 Cdk required for its degradation and entry into S phase. Science 278: 455-460 Vidal M, Cusick ME, Barabasi AL (2011) Interactome networks and human disease. Cell 144: 986-998 Wagner A (2001a) Birth and death of duplicated genes in completely sequenced eukaryotes. Trends Genet 17: 237-239 Wagner A (2001b) The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Molecular Biology and Evolution 18: 1283-1292 Wang J, Torii M, Liu H, Hart GW, Hu ZZ (2011) dbOGAP - an integrated bioinformatics resource for protein O-GlcNAcylation. BMC Bioinformatics 12: 91
170
Wang M, Weiss M, Simonovic M, Haertinger G, Schrimpf SP, Hengartner MO, von Mering C (2012) PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life. Mol Cell Proteomics 11: 492-500 Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT (2004a) The DISOPRED server for the prediction of protein disorder. Bioinformatics 20: 2138-2139 Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004b) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of molecular biology 337: 635-645 Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, Chu AM, Connelly C, Davis K, Dietrich F, Dow SW, EL Bakkoury M, Foury F, Friend SH, Gentalen E, Giaever G et al (1999) Functional characterization of the S-cerevisiae genome by gene deletion and parallel analysis. Science 285: 901-906 Wolfe KH, Shields DC (1997) Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387: 708-713 Wong A, Zhang YW, Jeschke GR, Turk BE, Rudnick G (2012) Cyclic GMP-dependent Stimulation of Serotonin Transport Does Not Involve Direct Transporter Phosphorylation by cGMP-dependent Protein Kinase. J Biol Chem 287: 36051-36058 Yang ZH (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Molecular biology and evolution 24: 1586-1591 Zeidan Q, Hart GW (2010) The intersections between O-GlcNAcylation and phosphorylation: implications for multiple signaling pathways. Journal of cell science 123: 13-22 Zhang JZ (2003) Evolution by gene duplication: an update. Trends Ecol Evol 18: 292-298 Zhang P, Smith-Nguyen EV, Keshwani MM, Deal MS, Kornev AP, Taylor SS (2012) Structure and Allostery of the PKA RII beta Tetrameric Holoenzyme. Science 335: 712-716 Zhao Y, Jensen ON (2009) Modification-specific proteomics: strategies for characterization of post-translational modifications using enrichment techniques. Proteomics 9: 4632-4641 Zhu H, Klemic JF, Chang S, Bertone P, Casamayor A, Klemic KG, Smith D, Gerstein M, Reed MA, Snyder M (2000) Analysis of yeast protein kinases using protein chips. Nat Genet 26: 283-289 Zielinska DF, Gnad F, Jedrusik-Bode M, Wisniewski JR, Mann M (2009) Caenorhabditis elegans has a phosphoproteome atypical for metazoans that is enriched in developmental and sex determination proteins. J Proteome Res 8: 4039-4049 Zielinska DF, Gnad F, Wisniewski JR, Mann M (2010) Precision mapping of an in vivo N-glycoproteome reveals rigid topological and sequence constraints. Cell 141: 897-907