1 protein interaction networks : modelstuvalu.santafe.edu/.../courses/3352/csci3352_2020_l10.pdf ·...

7
Biological Networks, CSCI 3352 Lecture 10 Prof. Aaron Clauset Spring 2020 1 Protein Interaction Networks : Models Recall that protein interaction networks represent a multitude of signaling and regulatory pathways that cells use to sense and react to its environment, as well as manage internal processes like cell growth, cell division, etc. 1 The complexity of these networks is dizzying, and that complexity evokes a simple basic science question: How does evolution generate biological innovation in molecular networks? Novel functionality can appear both at the molecular level—a protein that does something new— and at the systems level—a behavior of the network that is new. As best we can tell, the process that generates new functionality is grounded in gene duplication, at least among eukaryotes, coupled with mutation and selection. 2 1.1 Gene duplication and mutation, under evolution When a cell divides, it duplicates its entire genome in order to put one copy in each of its two daughter cells. The genome copying process is orchestrated by a host of gene copying proteins, which are not error-free. One error they can make is to copy some stretch of DNA twice, so that the new genome has two copies of that sequence while the original has only one. When the copied stretch includes a protein-coding gene, we say that gene was duplicated. 3 After such an event, evolution takes over to determine what happens to the duplicated gene. For our purposes, we will consider only genes that code for proteins that appear in the protein-protein interaction network. (But, the following diverging paths apply equally well to any duplicated sequence, their function, and interactions.) There are three possibilities for the duplicated gene, with respect to the fitness of the organism (the likelihood of it leaves offspring): 1. disadvantageous: selection subsequently eliminates individuals with the duplicate from the population (because of competition or apoptosis), and hence we do not see these in the wild. 2. neutral: neither helps nor hurts fitness, and hence the proportion of population that has the duplicate follows an unbiased random walk, as in genetic drift. 4 1 PPINs don’t represent any signaling or regulation processes that bypasses proteins, e.g., those mediated by RNA, of which there are many. 2 Bacteria can acquire new genes from each other via “horizontal gene transfer” and even from viruses (bacterio- phages). Eukaryotes sometimes acquire new genes from viruses, or even horizontally, e.g., some plants. 3 Less commonly, the machinery will duplicate or delete much larger stretches of the genome, entire chromosomes, and even the entire genome. Many plant species we eat have experienced a whole-genome duplication event in their history. Plants seem general more robust to these events than do animals. For example, in humans, Down syndrome is caused by having an extra chromosome 21.. 4 See https://en.wikipedia.org/wiki/Population_genetics#Genetic_drift 1

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Protein Interaction Networks : Modelstuvalu.santafe.edu/.../courses/3352/csci3352_2020_L10.pdf · 2020-03-17 · Biological Networks, CSCI 3352 Lecture 10 Prof. Aaron Clauset Spring

Biological Networks, CSCI 3352Lecture 10

Prof. Aaron ClausetSpring 2020

1 Protein Interaction Networks : Models

Recall that protein interaction networks represent a multitude of signaling and regulatory pathwaysthat cells use to sense and react to its environment, as well as manage internal processes like cellgrowth, cell division, etc.1 The complexity of these networks is dizzying, and that complexityevokes a simple basic science question:

How does evolution generate biological innovation in molecular networks?

Novel functionality can appear both at the molecular level—a protein that does something new—and at the systems level—a behavior of the network that is new. As best we can tell, the processthat generates new functionality is grounded in gene duplication, at least among eukaryotes, coupledwith mutation and selection.2

1.1 Gene duplication and mutation, under evolution

When a cell divides, it duplicates its entire genome in order to put one copy in each of its twodaughter cells. The genome copying process is orchestrated by a host of gene copying proteins,which are not error-free. One error they can make is to copy some stretch of DNA twice, so thatthe new genome has two copies of that sequence while the original has only one. When the copiedstretch includes a protein-coding gene, we say that gene was duplicated.3

After such an event, evolution takes over to determine what happens to the duplicated gene. Forour purposes, we will consider only genes that code for proteins that appear in the protein-proteininteraction network. (But, the following diverging paths apply equally well to any duplicatedsequence, their function, and interactions.) There are three possibilities for the duplicated gene,with respect to the fitness of the organism (the likelihood of it leaves offspring):

1. disadvantageous: selection subsequently eliminates individuals with the duplicate from thepopulation (because of competition or apoptosis), and hence we do not see these in the wild.

2. neutral: neither helps nor hurts fitness, and hence the proportion of population that has theduplicate follows an unbiased random walk, as in genetic drift.4

1PPINs don’t represent any signaling or regulation processes that bypasses proteins, e.g., those mediated by RNA,of which there are many.

2Bacteria can acquire new genes from each other via “horizontal gene transfer” and even from viruses (bacterio-phages). Eukaryotes sometimes acquire new genes from viruses, or even horizontally, e.g., some plants.

3Less commonly, the machinery will duplicate or delete much larger stretches of the genome, entire chromosomes,and even the entire genome. Many plant species we eat have experienced a whole-genome duplication event in theirhistory. Plants seem general more robust to these events than do animals. For example, in humans, Down syndromeis caused by having an extra chromosome 21..

4See https://en.wikipedia.org/wiki/Population_genetics#Genetic_drift

1

Page 2: 1 Protein Interaction Networks : Modelstuvalu.santafe.edu/.../courses/3352/csci3352_2020_L10.pdf · 2020-03-17 · Biological Networks, CSCI 3352 Lecture 10 Prof. Aaron Clauset Spring

Biological Networks, CSCI 3352Lecture 10

Prof. Aaron ClausetSpring 2020

3. advantageous: selection subsequently favors individuals with the duplicate, and hence pop-ulation genetics tells us that the frequency of the duplicate should, eventually, converge onfixation in the population.

In either the neutral or advantageous situations, there are two stylized possibilities for how evolutionwill modify the gene (see figure below5):

A. neofunctionalization: after duplication, the duplicate evolves toward new functions, andthe original retains its existing functions. Examples of apparent neofunctionalization eventsinclude (i) the duplication of a digestive gene in a species of arctic fish that subsequentlybecame an antifreeze gene, and (ii) duplications that have led to novel snake venoms.

B. subfunctionalization: the duplicate and the original divide up the original functions (eachloses different previous functions due to mutations). In a signaling pathway, subfunctional-izations may explain the evolution of redundant paths that represent a logical AND (bothare required).

We say “stylized possibilities” because these represent extremes. More commonly, we might expecta mixture of these, because functionality is not usually binary; instead, a gene with differentfunctions will do each of them well to differing degrees, and after a duplication event, selection isfree to vary those degrees more independently. These variation could lead to further optimizationof each gene, so that each performs far better some task on which the original did just okay.

5Image from https://en.wikipedia.org/wiki/Gene_duplication.

2

Page 3: 1 Protein Interaction Networks : Modelstuvalu.santafe.edu/.../courses/3352/csci3352_2020_L10.pdf · 2020-03-17 · Biological Networks, CSCI 3352 Lecture 10 Prof. Aaron Clauset Spring

Biological Networks, CSCI 3352Lecture 10

Prof. Aaron ClausetSpring 2020

From a network perspective, a duplication event is one in which we take an existing network G,choose some node within it, and make a copy of both it and all its links. This duplication eventis the basic idea underlying all gene duplication models of protein interaction network evolution.Post-duplication, we may then modify the links of either the original or the copied node in orderto capture differing degrees of sub- or neo-functionalization, or even function loss.6

1.2 Duplication-divergence network models

Of the several flavors of gene duplication network models, all are built on the idea of a duplicationevent, and differ mainly in how they translate the biological idea of functional divergence into thelanguage of network edges. The rate of gene duplications is far lower than the rate of edge modifi-cation, and this allows us to assume a “separation of timescales.” Gene duplications (copy errors)happen on timescales measured in 10s of generations, while edge modifications (evolution) happenon a timescale measured in 1000s and 10,000s of generations. As a result, we assume here that allmodifications to a particular node’s interactions occur before any new gene duplication event occurs.

In the network, each duplication event is initiated by first choosing a single node u uniformly atrandom (meaning, we assume that every node is equally likely to be duplicated). We then makev, a copy of u, and give it copies of each of u’s edges. That is, if (u, x) existed before duplication,after duplication, there also exists a (v, x). In models like the popular duplication-mutation withcomplementation (DMC) model, edges are undirected, while in some other duplication-divergencemodels (e.g., the simulation in Section 1.3 below), edges may be directed.

duplication

u

<latexit sha1_base64="wR/8DHtXNMj5YLV5ya/3+dU3PMM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2Vmmm/XHGr7hxklXg5qUCORr/81RvELI1QGiao1l3PTYyfUWU4Ezgt9VKNCWVjOsSupZJGqP1sfuiUnFllQMJY2ZKGzNXfExmNtJ5Ege2MqBnpZW8m/ud1UxPe+BmXSWpQssWiMBXExGT2NRlwhcyIiSWUKW5vJWxEFWXGZlOyIXjLL6+S9kXVq1Uvm7VK/SqPowgncArn4ME11OEOGtACBgjP8ApvzqPz4rw7H4vWgpPPHMMfOJ8/4KmM9Q==</latexit>

x

<latexit sha1_base64="eKoNv4YJl9BKr1SL7aqaXZyXLmQ=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKyREL7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8AeU1jPg=</latexit>

y

<latexit sha1_base64="Vai9mb4MqWcLXMNZy+LKkUz2jYM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2VmpN+ueJW3TnIKvFyUoEcjX75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaaT2JAtsZUTPSy95M/M/rpia88TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14l7YuqV6teNmuV+lUeRxFO4BTOwYNrqMMdNKAFDBCe4RXenEfnxXl3PhatBSefOYY/cD5/AOa5jPk=</latexit>

z

<latexit sha1_base64="MBYVJynjt47motuWVliM1ruuQF8=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKwJEr7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8Aeg9jPo=</latexit>

u

<latexit sha1_base64="wR/8DHtXNMj5YLV5ya/3+dU3PMM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2Vmmm/XHGr7hxklXg5qUCORr/81RvELI1QGiao1l3PTYyfUWU4Ezgt9VKNCWVjOsSupZJGqP1sfuiUnFllQMJY2ZKGzNXfExmNtJ5Ege2MqBnpZW8m/ud1UxPe+BmXSWpQssWiMBXExGT2NRlwhcyIiSWUKW5vJWxEFWXGZlOyIXjLL6+S9kXVq1Uvm7VK/SqPowgncArn4ME11OEOGtACBgjP8ApvzqPz4rw7H4vWgpPPHMMfOJ8/4KmM9Q==</latexit>

v

<latexit sha1_base64="zp67ACcg+xnHfdvcfw2tNcxaTpA=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzCwJIXyBFw8a49VP8ubfOMAeFKykk0pVd7q7gkRwbVz328ltbG5t7+R3C3v7B4dHxeOTpo5TxbDBYhGrdkA1Ci6xYbgR2E4U0igQ2ApG93O/NUaleSwfzSRBP6IDyUPOqLFSfdwrltyyuwBZJ15GSpCh1it+dfsxSyOUhgmqdcdzE+NPqTKcCZwVuqnGhLIRHWDHUkkj1P50ceiMXFilT8JY2ZKGLNTfE1MaaT2JAtsZUTPUq95c/M/rpCa886dcJqlByZaLwlQQE5P516TPFTIjJpZQpri9lbAhVZQZm03BhuCtvrxOmldlr1K+rldK1ZssjjycwTlcgge3UIUHqEEDGCA8wyu8OU/Oi/PufCxbc042cwp/4Hz+AOItjPY=</latexit>

x

<latexit sha1_base64="eKoNv4YJl9BKr1SL7aqaXZyXLmQ=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKyREL7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8AeU1jPg=</latexit>

y

<latexit sha1_base64="Vai9mb4MqWcLXMNZy+LKkUz2jYM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2VmpN+ueJW3TnIKvFyUoEcjX75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaaT2JAtsZUTPSy95M/M/rpia88TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14l7YuqV6teNmuV+lUeRxFO4BTOwYNrqMMdNKAFDBCe4RXenEfnxXl3PhatBSefOYY/cD5/AOa5jPk=</latexit>

z

<latexit sha1_base64="MBYVJynjt47motuWVliM1ruuQF8=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKwJEr7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8Aeg9jPo=</latexit>

E = {(u, x), (u, y), (u, z)}

<latexit sha1_base64="99Kj1x5wZfgq0U9hcHgvCAJnADE=">AAACBHicbVDLSsNAFJ34rPUVddnNYBFaKCWR+tgIBRFcVrAPaEKZTKft0MkkzEzEGLpw46+4caGIWz/CnX/jNM1CWw/M5XDOvdy5xwsZlcqyvo2l5ZXVtfXcRn5za3tn19zbb8kgEpg0ccAC0fGQJIxy0lRUMdIJBUG+x0jbG19O/fYdEZIG/FbFIXF9NOR0QDFSWuqZhSt4AZ0ElqLKfbmia5zWhzJ0Jj2zaFWtFHCR2BkpggyNnvnl9AMc+YQrzJCUXdsKlZsgoShmZJJ3IklChMdoSLqacuQT6SbpERN4pJU+HARCP65gqv6eSJAvZex7utNHaiTnvan4n9eN1ODcTSgPI0U4ni0aRAyqAE4TgX0qCFYs1gRhQfVfIR4hgbDSueV1CPb8yYukdVy1a9WTm1qxfprFkQMFcAhKwAZnoA6uQQM0AQaP4Bm8gjfjyXgx3o2PWeuSkc0cgD8wPn8A0weVAA==</latexit>

E = {(u, x), (u, y), (u, z),

(v, x), (v, y), (v, z)}

<latexit sha1_base64="qwkuDZxTHbo6ExB9eZV0MziTW88=">AAACGnicbZBLS8NAEMc3Pmt9RT16WSxCC6UkUh8XoSCCxwr2AU0om+2mXbp5sLsJxtDP4cWv4sWDIt7Ei9/GbZqDtg7s8Oc3M8zO3wkZFdIwvrWl5ZXVtfXCRnFza3tnV9/bb4sg4pi0cMAC3nWQIIz6pCWpZKQbcoI8h5GOM76a1jsx4YIG/p1MQmJ7aOhTl2IkFerr5jW8hFYKy1H1vlJVOcnyQ6UKLQuW44zGGY0Vhdakr5eMmpEFXBRmLkogj2Zf/7QGAY484kvMkBA90wilnSIuKWZkUrQiQUKEx2hIekr6yCPCTrPTJvBYkQF0A66eL2FGf0+kyBMi8RzV6SE5EvO1Kfyv1ouke2Gn1A8jSXw8W+RGDMoATn2CA8oJlixRAmFO1V8hHiGOsFRuFpUJ5vzJi6J9UjPrtdPbeqlxlttRAIfgCJSBCc5BA9yAJmgBDB7BM3gFb9qT9qK9ax+z1iUtnzkAf0L7+gGV65uc</latexit>

This duplication “mechanism” (meaning, a formal notion of cause and effect) implies that onlynodes that already have connections can gain new ones, which occurs only if one of their neigh-bors duplicates. However, duplication alone would lead to an increasingly clique-like network,and would not capture the effects of subfunctionalization and neofunctionalization. To incorporatethese, we “rewire” some of the duplicated edges, which captures the effect of mutation on connectiv-ity. Different models diverge on precisely how those modifications are done. Here, we will considera simple model, and leave an exploration of a more complicated model like DMC for an in-class lab.

6It’s also the basic idea of how the early World Wide Web evolved, and both that model (from 1999), and thegene-duplication models are instances of what we might call “vertex copy models,” in which the network grows overtime by copying or duplicating existing nodes.

3

Page 4: 1 Protein Interaction Networks : Modelstuvalu.santafe.edu/.../courses/3352/csci3352_2020_L10.pdf · 2020-03-17 · Biological Networks, CSCI 3352 Lecture 10 Prof. Aaron Clauset Spring

Biological Networks, CSCI 3352Lecture 10

Prof. Aaron ClausetSpring 2020

Once the new node v has been created, with is cohort of edges, we flip a coin for each of v’s connec-tions (v, z), such that with probability q we keep the connection (v, z) and with probability 1 − qwe rewire that connection to be (v, w), where w is a uniformly random node (a so-called “uniformattachment” mechanism). In this way, v has the same degree as u, and any edge that is rewiredrepresents both a subfunctionalization (loss of previous function) and a neofunctionalization (ac-quisition of a new function).

Finally, in some models, we include what is called complementation, which means adding an edge(v, u), which captures the idea that proteins can often “dimerize,” i.e., bind with themselves. Hereis a schematic illustrating a duplication and divergence event in a toy network.

duplication

u

<latexit sha1_base64="wR/8DHtXNMj5YLV5ya/3+dU3PMM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2Vmmm/XHGr7hxklXg5qUCORr/81RvELI1QGiao1l3PTYyfUWU4Ezgt9VKNCWVjOsSupZJGqP1sfuiUnFllQMJY2ZKGzNXfExmNtJ5Ege2MqBnpZW8m/ud1UxPe+BmXSWpQssWiMBXExGT2NRlwhcyIiSWUKW5vJWxEFWXGZlOyIXjLL6+S9kXVq1Uvm7VK/SqPowgncArn4ME11OEOGtACBgjP8ApvzqPz4rw7H4vWgpPPHMMfOJ8/4KmM9Q==</latexit>

x

<latexit sha1_base64="eKoNv4YJl9BKr1SL7aqaXZyXLmQ=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKyREL7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8AeU1jPg=</latexit>

y

<latexit sha1_base64="Vai9mb4MqWcLXMNZy+LKkUz2jYM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2VmpN+ueJW3TnIKvFyUoEcjX75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaaT2JAtsZUTPSy95M/M/rpia88TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14l7YuqV6teNmuV+lUeRxFO4BTOwYNrqMMdNKAFDBCe4RXenEfnxXl3PhatBSefOYY/cD5/AOa5jPk=</latexit>

z

<latexit sha1_base64="MBYVJynjt47motuWVliM1ruuQF8=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKwJEr7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8Aeg9jPo=</latexit>

u

<latexit sha1_base64="wR/8DHtXNMj5YLV5ya/3+dU3PMM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2Vmmm/XHGr7hxklXg5qUCORr/81RvELI1QGiao1l3PTYyfUWU4Ezgt9VKNCWVjOsSupZJGqP1sfuiUnFllQMJY2ZKGzNXfExmNtJ5Ege2MqBnpZW8m/ud1UxPe+BmXSWpQssWiMBXExGT2NRlwhcyIiSWUKW5vJWxEFWXGZlOyIXjLL6+S9kXVq1Uvm7VK/SqPowgncArn4ME11OEOGtACBgjP8ApvzqPz4rw7H4vWgpPPHMMfOJ8/4KmM9Q==</latexit>

v

<latexit sha1_base64="zp67ACcg+xnHfdvcfw2tNcxaTpA=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzCwJIXyBFw8a49VP8ubfOMAeFKykk0pVd7q7gkRwbVz328ltbG5t7+R3C3v7B4dHxeOTpo5TxbDBYhGrdkA1Ci6xYbgR2E4U0igQ2ApG93O/NUaleSwfzSRBP6IDyUPOqLFSfdwrltyyuwBZJ15GSpCh1it+dfsxSyOUhgmqdcdzE+NPqTKcCZwVuqnGhLIRHWDHUkkj1P50ceiMXFilT8JY2ZKGLNTfE1MaaT2JAtsZUTPUq95c/M/rpCa886dcJqlByZaLwlQQE5P516TPFTIjJpZQpri9lbAhVZQZm03BhuCtvrxOmldlr1K+rldK1ZssjjycwTlcgge3UIUHqEEDGCA8wyu8OU/Oi/PufCxbc042cwp/4Hz+AOItjPY=</latexit>

x

<latexit sha1_base64="eKoNv4YJl9BKr1SL7aqaXZyXLmQ=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKyREL7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8AeU1jPg=</latexit>

y

<latexit sha1_base64="Vai9mb4MqWcLXMNZy+LKkUz2jYM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2VmpN+ueJW3TnIKvFyUoEcjX75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaaT2JAtsZUTPSy95M/M/rpia88TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14l7YuqV6teNmuV+lUeRxFO4BTOwYNrqMMdNKAFDBCe4RXenEfnxXl3PhatBSefOYY/cD5/AOa5jPk=</latexit>

z

<latexit sha1_base64="MBYVJynjt47motuWVliM1ruuQF8=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKwJEr7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8Aeg9jPo=</latexit>

E = {(u, x), (u, y), (u, z)}

<latexit sha1_base64="99Kj1x5wZfgq0U9hcHgvCAJnADE=">AAACBHicbVDLSsNAFJ34rPUVddnNYBFaKCWR+tgIBRFcVrAPaEKZTKft0MkkzEzEGLpw46+4caGIWz/CnX/jNM1CWw/M5XDOvdy5xwsZlcqyvo2l5ZXVtfXcRn5za3tn19zbb8kgEpg0ccAC0fGQJIxy0lRUMdIJBUG+x0jbG19O/fYdEZIG/FbFIXF9NOR0QDFSWuqZhSt4AZ0ElqLKfbmia5zWhzJ0Jj2zaFWtFHCR2BkpggyNnvnl9AMc+YQrzJCUXdsKlZsgoShmZJJ3IklChMdoSLqacuQT6SbpERN4pJU+HARCP65gqv6eSJAvZex7utNHaiTnvan4n9eN1ODcTSgPI0U4ni0aRAyqAE4TgX0qCFYs1gRhQfVfIR4hgbDSueV1CPb8yYukdVy1a9WTm1qxfprFkQMFcAhKwAZnoA6uQQM0AQaP4Bm8gjfjyXgx3o2PWeuSkc0cgD8wPn8A0weVAA==</latexit>

divergence

x

<latexit sha1_base64="eKoNv4YJl9BKr1SL7aqaXZyXLmQ=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKyREL7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8AeU1jPg=</latexit>

y

<latexit sha1_base64="Vai9mb4MqWcLXMNZy+LKkUz2jYM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2VmpN+ueJW3TnIKvFyUoEcjX75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaaT2JAtsZUTPSy95M/M/rpia88TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14l7YuqV6teNmuV+lUeRxFO4BTOwYNrqMMdNKAFDBCe4RXenEfnxXl3PhatBSefOYY/cD5/AOa5jPk=</latexit>

z

<latexit sha1_base64="MBYVJynjt47motuWVliM1ruuQF8=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKwJEr7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8Aeg9jPo=</latexit>

u

<latexit sha1_base64="wR/8DHtXNMj5YLV5ya/3+dU3PMM=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkfhwLXjy2YGuhDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ+BCMb2f+wxMqzWN5byYJ+hEdSh5yRo2Vmmm/XHGr7hxklXg5qUCORr/81RvELI1QGiao1l3PTYyfUWU4Ezgt9VKNCWVjOsSupZJGqP1sfuiUnFllQMJY2ZKGzNXfExmNtJ5Ege2MqBnpZW8m/ud1UxPe+BmXSWpQssWiMBXExGT2NRlwhcyIiSWUKW5vJWxEFWXGZlOyIXjLL6+S9kXVq1Uvm7VK/SqPowgncArn4ME11OEOGtACBgjP8ApvzqPz4rw7H4vWgpPPHMMfOJ8/4KmM9Q==</latexit>

v

<latexit sha1_base64="zp67ACcg+xnHfdvcfw2tNcxaTpA=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzCwJIXyBFw8a49VP8ubfOMAeFKykk0pVd7q7gkRwbVz328ltbG5t7+R3C3v7B4dHxeOTpo5TxbDBYhGrdkA1Ci6xYbgR2E4U0igQ2ApG93O/NUaleSwfzSRBP6IDyUPOqLFSfdwrltyyuwBZJ15GSpCh1it+dfsxSyOUhgmqdcdzE+NPqTKcCZwVuqnGhLIRHWDHUkkj1P50ceiMXFilT8JY2ZKGLNTfE1MaaT2JAtsZUTPUq95c/M/rpCa886dcJqlByZaLwlQQE5P516TPFTIjJpZQpri9lbAhVZQZm03BhuCtvrxOmldlr1K+rldK1ZssjjycwTlcgge3UIUHqEEDGCA8wyu8OU/Oi/PufCxbc042cwp/4Hz+AOItjPY=</latexit>

w

<latexit sha1_base64="BVybMKowyYLxP9IKaDk2Ns5vLZg=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHYNPo4kXjxCIo8ENmR26IWR2dnNzKyGEL7AiweN8eonefNvHGAPClbSSaWqO91dQSK4Nq777eTW1jc2t/LbhZ3dvf2D4uFRU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWj25nfekSleSzvzThBP6IDyUPOqLFS/alXLLlldw6ySryMlCBDrVf86vZjlkYoDRNU647nJsafUGU4EzgtdFONCWUjOsCOpZJGqP3J/NApObNKn4SxsiUNmau/JyY00nocBbYzomaol72Z+J/XSU1440+4TFKDki0WhakgJiazr0mfK2RGjC2hTHF7K2FDqigzNpuCDcFbfnmVNC/KXqV8Wa+UqldZHHk4gVM4Bw+uoQp3UIMGMEB4hld4cx6cF+fd+Vi05pxs5hj+wPn8AeOxjPc=</latexit>

E = {(u, x), (u, y), (u, z),

(v, x), (v, y), (v, w), (v, u)}

<latexit sha1_base64="upsoXcMSdQA2PC9LH2hhrNPw54s=">AAACJHicbZBLS8NAEMc39VXrq+rRy2JRWgglkfoAEQoieKxgH9CEstlu26WbB7ubaA39MF78Kl48+MCDFz+L2zQHbR3Y4c9vZpidvxMwKqRhfGmZhcWl5ZXsam5tfWNzK7+90xB+yDGpY5/5vOUgQRj1SF1SyUgr4AS5DiNNZ3g5qTcjwgX1vVs5Cojtor5HexQjqVAnf34FL6AVw0NYDPX7kq7yKMkPJR1a1oRHCY8SHul3SQ5L0Bp38gWjbCQB54WZigJIo9bJv1tdH4cu8SRmSIi2aQTSjhGXFDMyzlmhIAHCQ9QnbSU95BJhx8mRY3igSBf2fK6eJ2FCf0/EyBVi5Dqq00VyIGZrE/hfrR3K3pkdUy8IJfHwdFEvZFD6cOIY7FJOsGQjJRDmVP0V4gHiCEvla06ZYM6ePC8aR2WzUj6+qRSqJ6kdWbAH9kERmOAUVME1qIE6wOARPINX8KY9aS/ah/Y5bc1o6cwu+BPa9w+4iZ4d</latexit>

E = {(u, x), (u, y), (u, z),

(v, x), (v, y), (v, z), (v, u)}

<latexit sha1_base64="R4pw0wd3h7yx3rSnPgPIZ4qDTw8=">AAACJHicbZBLS8NAEMc39VXrq+rRy2JRWiglkfoAEQoieKxgW6EJZbPdtEs3m7C7KdbQD+PFr+LFgw88ePGzuEl70NaBHf78ZobZ+bsho1KZ5peRWVhcWl7JrubW1jc2t/LbO00ZRAKTBg5YIO5cJAmjnDQUVYzchYIg32Wk5Q4uk3prSISkAb9Vo5A4Pupx6lGMlEad/PkVvIB2DA9hMSrfl8o6j9L8UCpD2074MOXDlA8TrnNUgva4ky+YFTMNOC+sqSiAadQ7+Xe7G+DIJ1xhhqRsW2aonBgJRTEj45wdSRIiPEA90taSI59IJ06PHMMDTbrQC4R+XMGU/p6IkS/lyHd1p49UX87WEvhfrR0p78yJKQ8jRTieLPIiBlUAE8dglwqCFRtpgbCg+q8Q95FAWGlfc9oEa/bkedE8qljVyvFNtVA7mdqRBXtgHxSBBU5BDVyDOmgADB7BM3gFb8aT8WJ8GJ+T1owxndkFf8L4/gG9M54g</latexit>

complementationduplication

}

<latexit sha1_base64="bGOSNrY5xls2ecIB0Pbah+RBoUo=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKexKfBwDXjxGMQ9IljA7mU2GzM4uM71CWPIHXjwo4tU/8ubfOEn2oIkFDUVVN91dQSKFQdf9dgpr6xubW8Xt0s7u3v5B+fCoZeJUM95ksYx1J6CGS6F4EwVK3kk0p1EgeTsY38789hPXRsTqEScJ9yM6VCIUjKKVHnrTfrniVt05yCrxclKBHI1++as3iFkacYVMUmO6npugn1GNgkk+LfVSwxPKxnTIu5YqGnHjZ/NLp+TMKgMSxtqWQjJXf09kNDJmEgW2M6I4MsveTPzP66YY3viZUEmKXLHFojCVBGMye5sMhOYM5cQSyrSwtxI2opoytOGUbAje8surpHVR9WrVy/tapX6Vx1GEEziFc/DgGupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP52IjWM=</latexit>

}

<latexit sha1_base64="bGOSNrY5xls2ecIB0Pbah+RBoUo=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKexKfBwDXjxGMQ9IljA7mU2GzM4uM71CWPIHXjwo4tU/8ubfOEn2oIkFDUVVN91dQSKFQdf9dgpr6xubW8Xt0s7u3v5B+fCoZeJUM95ksYx1J6CGS6F4EwVK3kk0p1EgeTsY38789hPXRsTqEScJ9yM6VCIUjKKVHnrTfrniVt05yCrxclKBHI1++as3iFkacYVMUmO6npugn1GNgkk+LfVSwxPKxnTIu5YqGnHjZ/NLp+TMKgMSxtqWQjJXf09kNDJmEgW2M6I4MsveTPzP66YY3viZUEmKXLHFojCVBGMye5sMhOYM5cQSyrSwtxI2opoytOGUbAje8surpHVR9WrVy/tapX6Vx1GEEziFc/DgGupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP52IjWM=</latexit>

mutation

}

<latexit sha1_base64="bGOSNrY5xls2ecIB0Pbah+RBoUo=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKexKfBwDXjxGMQ9IljA7mU2GzM4uM71CWPIHXjwo4tU/8ubfOEn2oIkFDUVVN91dQSKFQdf9dgpr6xubW8Xt0s7u3v5B+fCoZeJUM95ksYx1J6CGS6F4EwVK3kk0p1EgeTsY38789hPXRsTqEScJ9yM6VCIUjKKVHnrTfrniVt05yCrxclKBHI1++as3iFkacYVMUmO6npugn1GNgkk+LfVSwxPKxnTIu5YqGnHjZ/NLp+TMKgMSxtqWQjJXf09kNDJmEgW2M6I4MsveTPzP66YY3viZUEmKXLHFojCVBGMye5sMhOYM5cQSyrSwtxI2opoytOGUbAje8surpHVR9WrVy/tapX6Vx1GEEziFc/DgGupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP52IjWM=</latexit>

One immediate limitation of this model is that it is a pure growth model: at every step in time,we add a node to the network, and we never either delete nodes or delete existing edges. Thisis pretty unrealistic, as real PPIs are constructed by both growth and loss. Edges can be lost bydecay due to mutations (if they are not sufficiently deleterious), and nodes can be lost by the samemechanisms that lead to duplication. Hence, duplication-divergence models provide an incompleteexplanation for the observed structure of PPIs.

1.2.1 Network measures

Degree distribution: There are two ways the degree of some node i could increase:

1. one of i’s neighbors is duplicated by the new node, in which case with probability q theconnection to i will also be duplicated, or

2. it is chosen directly for uniform attachment.

Let’s treat these two possibilities separately, and assume we’re working in a multi-graph setting(which makes the mathematics simpler). The probability that any particular node is chosen to be

4

Page 5: 1 Protein Interaction Networks : Modelstuvalu.santafe.edu/.../courses/3352/csci3352_2020_L10.pdf · 2020-03-17 · Biological Networks, CSCI 3352 Lecture 10 Prof. Aaron Clauset Spring

Biological Networks, CSCI 3352Lecture 10

Prof. Aaron ClausetSpring 2020

duplicated is 1/n, where n is the current size of the network. Without loss of generality, let’s saythat node i has degree ki already. Then, the probability that such a randomly chosen node connectsto i is ki/n. Because each connection from the duplicated node is preserved independently withprobability q, the probability that i increases its degree as a result of the duplication step is kiq/n.

On the other hand, the probability that i receives a new connection as a result of the uniformattachment depends on the mean degree of the network. For convenience, let’s fix that value at aconstant value c. Because connections of the duplicated node are copied independently with prob-ability q, the number of the duplicated node’s connections that will be discarded is (1 − q)c, eachof which is replaced with a uniformly random connection for the new node. Thus, the probabilitythat i receives one of these connections is (1− q)c/n.

Since either of these two possibilities could lead to i increasing its degree, we combining the twoterms to obtain the total probability that node i will increase its degree:

Pr(ki → ki + 1) =kiq

n+

(1− q)cn

=kiq + (1− q)c

n(1)

(2)

One notable thing about this expression is that q and c are constants. Hence, the probabilitythat i increases its degree is proportional to its current degree. This pattern is called “preferentialattachment,” and is mathematically equivalent to something called the Yule process.7

Using a technique called a master equation, we can take the above expression and mathematicallyshow that the long-term degree distribution converges on a form like

Pr(k) ∝ k−α α = 1 + 1/q (3)

which is exactly a power-law distribution.

The fidelity or accuracy of the duplication mechanism (how accurately genes are copied) tells ushow heavy-tailed a distribution the mechanism produces. Perfect or near-perfect copying (q ≈ 1)yields exponents at or close to 2, while poor copying (q ≈ 0) yields arbitrarily larger exponents. Itis crucial to remember, however, that this analysis, and in fact, the entire mechanism, ignores any

7The Yule process was first studied by the statistician Udny Yule (1871-1951), and later very famously studied bythe Nobel Prize and Turing award winning economist Herbert Simon (1916-2001) and Derek de Solla Price (1922-1983). Price’s study of it was the first network version, and he called it cumulative advantage, to reflect the fact thata node’s degree is like “wealth” that feeds on itself. In the late 1990s, it was reinvented by Albert-Laszlo Barabasiand Reka Albert and given the name preferential attachment.

5

Page 6: 1 Protein Interaction Networks : Modelstuvalu.santafe.edu/.../courses/3352/csci3352_2020_L10.pdf · 2020-03-17 · Biological Networks, CSCI 3352 Lecture 10 Prof. Aaron Clauset Spring

Biological Networks, CSCI 3352Lecture 10

Prof. Aaron ClausetSpring 2020

effect for deletion. How different could the dynamics be if we had a model of gene deletion (loss)?

That point aside, the fact that this model produces a growing network implies that the oldest nodestend to be the ones with the largest degrees – they’ve simply had more changes to be copied, andhence more chances to have their degree increase. This is a testable prediction of the model. Usingphylogenetic techniques, and genomic information from a wide variety of species, we could estimatethe date of origin backward in our lineage for different human proteins. We would then calculate thecorrelation between protein age and protein degree, and compare that with what the model predicts.

Clustering coefficient: We expect it to be much larger than in a configuration model, becauseeach time a node duplicates, it creates q ki new triangles, because these are the edges that areduplicated but not rewired.

Mean geodesic distance: As in random graph models, we expect the diameter and mean geodesicpath length to be O(log n).

1.3 Simulating PPIN evolution

Gene duplication models are relatively straightforward to simulate, and this will be out in-classactivity on Thursday. The figure below shows three snapshots of a single simulation of a di-rected version of the simple duplication-divergence model described above, with q = 1/2, forn = {10, 100, 1000} nodes. The mean degree here is close to 1, which makes the larger networksvery tree-like.

6

Page 7: 1 Protein Interaction Networks : Modelstuvalu.santafe.edu/.../courses/3352/csci3352_2020_L10.pdf · 2020-03-17 · Biological Networks, CSCI 3352 Lecture 10 Prof. Aaron Clauset Spring

Biological Networks, CSCI 3352Lecture 10

Prof. Aaron ClausetSpring 2020

Several things are noticeable about these networks. For instance, they exhibit many short loops, un-like random graph models, particularly for small n. As n increases, the degree distribution exhibitsa heavy tail structure, and visually, these high-degree nodes become easy to spot within the network.

By plotting the mean degree of a node vs. when in the simulation it was created (averaged overmany simulations), we can clearly show the strong correlation between age and degree. The figurebelow shows this correlation for three choices of the copy probability q. When q is very small,most connections are random rather than copied, meaning that most nodes have similar degrees.(There’s still some correlation, because the older a node is, the more chances it’s had to receive auniform attachment link.) In contrast, for very high q, almost no connections are rewired, leadingto a greater concentration of edges among the oldest nodes. (The dashed black line shows thetransition from the original seed network to the portion of time when the network is growing.)

100

101

102

103

100

101

102

103

time of creation, t

me

an

de

gre

e

q=0.01

q=0.50

q=0.95

Supplemental readings

1. M. Middendorf, E. Ziv, and C.H. Wiggins, “Inferring network mechanisms: The Drosophilamelanogaster protein interaction netework.” Proc. Natl. Acad. Sci. USA 102(9), 3192–3197(2005).https://dx.doi.org/10.1073/pnas.0409515102

2. A. Vazquez et al., “Modeling of protein interaction networks.” Complexus 1, 38–44 (2003).https://arxiv.org/abs/cond-mat/0108043

3. M. M. Saint-Antoine and A. Singh, “Network Inference in Systems Biology: Recent Develop-ments, Challenges, and Applications.” Preprint (2019).https://arxiv.org/abs/1911.04046

7