random sequence-matching model for emergent gene-regulatory networks ayşe erzan istanbul technical...

52
Random sequence-matching model for emergent gene- regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu Balcan (İTÜ) Muhittin Mungan (BÜ) Alkan Kabakçıoğlu (Padova) Ayşe H. Bilge (İTÜ) Yasemin Şengün (İTÜ)

Post on 19-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Random sequence-matching model for emergent gene-regulatory

networks

Ayşe Erzan

Istanbul Technical University, Gürsey Institute,

Collegium Budapest

Duygu Balcan (İTÜ) Muhittin Mungan (BÜ)

Alkan Kabakçıoğlu (Padova) Ayşe H. Bilge (İTÜ)

Yasemin Şengün (İTÜ)

Page 2: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

outline• Random and “real” networks

• “central dogma” of gene regulation

• RNA interference and more

• sequence matching model for gene regulatory networks

• simulations and analytical results

• comparison with experiments

• outcomes of similar models

Page 3: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

‘“classical” random networks

Erdös and Renyi(Publ. Math.Inst. Hung. Acad. Sci.

5, 17 (1960)

N vertices

N(N-1)/2 possible connections

with probability p

• “degree distribution”

Poissonian for large N

P(k) ~ e –z zk / k!

z = <k> = pN zc=1

• Average minimal path length

lER ~ ln N / ln z =(1+ln p/ln N)-1

• Clustering coefficient

CER = z/N = p

Page 4: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Random Networks

Probability of a connection between any two nodes same, p

N nodes has an average number Np of connections

“Small world” propertyDistances between nodes grow very weakly with NMost highly connected nodes

Directly reach 25% of the rest

Page 5: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

naturally occuring networks –

• Social and economic networks• Citation and collaborative networks

• Technological networks • www, communications networks

• Biological networks:Neural networks Food networks

Co-evolutionary networks Genomic networks

R.Albert and A.-L. Barabasi, Rev.Mod. Phys. 74, 47 (2002) S.N. Dorogovtsev and J.F.F. Mendes, Adv. Phys 51, 1079 (2002)

Page 6: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

“Real Networks”

considerable number of very highly connected nodes

Their first neighbors 60% of the total

most frequent are nodes with very few connections (1)

Small world!

Page 7: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

“small world” / scale free networks

• High clustering coefficient

<Ci >= 2 Ei / ki (ki-1)

> CER= z /N

• Short average minimum path length <lmin>

(comparable to ER nw

for same C and N, differs from regular lattices)

• Scale free degree distribution

P(k) ~ k - , cutoff kc

a realisation:

Barabasi-Albert model of

“preferential attachment”

growing network with probability of attachment of new edge to vertex i is ~ ki

P(k) ~ k –3

(exact)

(Models with preferential attachment 2)

Page 8: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

gene regulation networks - transcription regulatory networks

the “central dogma”DNA

promoter1 gene1 promoter2 gene2 promoter3 gene3

transcription

RNA mRNA chain

amino acid Transcription Factors

translation

Proteins(structural and regulatory)

Ribosome tRNArRNA

Adapted from Alvis Brazma, www.ebi.ac.uk/microarray/research/networks/genetics

Page 9: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Transcription regulatory

network of protein interactions in E. coli

 

(from S. Maslov)

Data from Regulon Database606 interactions424 operonsOut degree 1<kout < 85 broader

In degree 1<kin<6

Page 10: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Transcription regulatory 

network of protein interactions

in Homo Sapiens

data obtained from literature search

1449 regulations689 proteinskout < 96

kin < 40

(from S. Maslov)

n(kout ) kout –2.5

= 2.5

Page 11: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

•A. Wagner, Mol. Bio. Evol. 18, 1283 (2001).

Duplication and divergence of genes - interaction

between their regulatory proteins

Page 12: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

for a review of properties see:

R.V. Sole and R. Pastor Satorras, in Handbook of Graphs and Networks (Bornholdt and Schuster eds., Wiley-VCH, Berlin 2002)

Previous “wisdom”

•out degree distribution scale free with = 2.5 !!?•A. Wagner, Mol. Bio. Evol. 18, 1283 (2001); Jeong, Mason, Barabasi, Oltvai, Nature 411, 41 (2001); Maslov and Sneppen, Science 296, 910 (2002)

•narrower in-degree distribution than out-degree distribution

•small world with non-classical clustering

Transcription Regulatory genomic and

Protein Interaction Networks (interactions between regulatory proteins)

Page 13: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

RNA interference

Page 14: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

New paradigm? Post Transcriptional Gene Suppression (PTGS)

Page 15: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

“RNA can bind directly on similar DNA sequences

and silence genes at the transcriptional stage”

Page 16: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Watson-Crick base pairing between nucleic acids

DNA – Adenine, Guanine , Thymine, Cytosine A-T C-G

RNA - Adenine, Guanine , Uracil, Cytosine A-U C-G

stabilisation, replication and transcription of DNA

RNA interference (siRNA binding to mRNA or chr. DNA)

binding of regulatory proteins on to mRNA

Basic mechanism of (lock-and-key combinations) :

sequence matching

•D. Balcan , AE, Eur. Phys. J. B 38, 253 (2004);

•M, Mungan, A. Kabakcioglu, D. Balcan, AE , q-bio.MN/0406049

Page 17: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

• three- dimensional architecture (secondary structure) also sequence dependent -amino-acid recognition by tRNA -amino-acid binding by rRNA in Ribosome -binding of transcription factors to promoter regions

Greater generality for modeling genomic interactions?Stay tuned!

Page 18: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

10010101112110111012100121020011000010101000210110101011010201120010111011022

Modeling the “chromosome” – a random sequence of 0,1,2Gi Gj

0,1 coding regions , probability (1-p)/2xi =

2 start/stop signs for “genes,” probability p

string Gi = { xi1,xi2,..xi…xil} for xi 2 gene

l = length of gene <l> = (1-p) /p < n(l)> = Lp2 (1-p)-l

L = length of chromosome l = 0 a null gene

emergent gene expression networks?

Page 19: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

sequence matching gene regulation ? Model connectivity matrix of genomic network

1 iff the string Gi is embedded inside the string Gj

wij = (Gi Gj ) ; li lj

0 otherwise.

1101

interference(suppression)

kout = 2

1101 2011000101201000110211 11011101 201010 1122

kin=1

directed

Page 20: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

simulations:clustering coefficient

Ci = 2E(i)/ k(i) [k(i)-1]

number of edges connecting nn /total number of possible connections

For incoming or outgoing bonds to the site i

<Cout> = 0.034

<Cin> = 0.648

<C> =0.534 = < z > / < s >

non-classical bhvr

Page 21: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

giant cluster breaks up for p < pc(L)

( L p = frequency of stop-start signs) N (number of genes) too small, genes too long

exponent ~ -3/4(preliminary)

“percolation threshold” pc :

Page 22: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

“extremely small world” networks!

cluster radius =average minimum path length

directed edges (in or out) lmin=1 (transitivity!)

undirected edges

lmin = 1 lmax 4 11111 1 001101 0 00000

<lmin > depends very weakly on p for fixed L

pc < p < ½ :most genes of length unity

lmin undefined for p pc (L)

L = 15000 <lmin > ~ 1.66 <lmin > 1.87 as p pc

Page 23: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

random point mutations

• x = (0 , 1) ; x mod 2 (x + 1)

• x = 2 ; x 1 x random walk steps taken by STOP and GO signs

long range modifications due to change in reading frame

simulations: network robust under random mutations

Page 24: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

peaks ~ geometrically spaced for kout small (log-periodic) ~ periodic for kout large

last peak - the size distribution of the giant cluster (single bit genes connect to almost all others)

Degree

distribution:

preliminary

simulation

results

Page 25: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

nm ~ kout -

Maxima of the peaks

0.9 small k

0.4 large k

no double scalingfor p=0.05

0.45

0

Page 26: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

n(k) k -

(1.1 , 1.8)

Page 27: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Simulation results: Crossover in the scaling behaviour

of the degree distribution

dc

__ analytical

° simulation

Page 28: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

1. The matching probabilitiesProbability of a given string of length l to be reproduced in a randomly chosen string of length k for an alphabet of r letters,

p (l, k) = 1- (1- r -l ) k - l+1 r –l ( k-l+1) for l large neglecting correlations between overlaps

r –l number of l – strings with r letters

( k-l+1) number of shifted l- substrings in a k – string

(1 for k = l )

very good approximation for r –l ( k-l ) < 1

Analytical calculations

Page 29: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Computing the matching probabilitiesstrings x and y of length k l

ya,l = substring of y, of length l that has been shifted by a

U(x, yal) = Hamming dist. bet. x and ya,l (U = 0, match, U 0, nomatch)

1- fa (x,y ; ) = 1- exp[ - U(x, yal)] 0 or 1 for (counts nomatches)

p (l, k; x, ) = 1- ( number of nomatches / r k ) summed over y

p (l, k; x, )= 1- r - k [ 1- fa (x,y ; )] all nomatch for any shift a

y a < k-l

Cluster expansion. Do x averages; 2-pt averages over the f factorise;

approximate all higher orders with factorised ones

for k l get

p (l, k) = 1- (1- z l ) k - l+1 ; z = [1+(r-1)-] / r

z l ( k-l+1) for l large

Page 30: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

matching probability for r =2

p( l, k) = 1- ( 1- 2 - l ) k-l+1 2 –l ( k-l+1) for l k

0 otherwise

Curves with embeding string k =16,14,12,10,8,6,4,2

from top to bottom, k l

p (l, k)

l

exact enumeration

__ above expression

Page 31: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

2. Understanding the sequence matching data

Matching l with d: long “genes” small degree

Page 32: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

number of out-edges from a randomly chosen gene

of length l to genes of length k

Xlk = Xlk ()

: different realisations of genes of length k

Xlk () independent random var, binomially distributed : p(l,k)

Poission for small p(l,k) – large l

total number of

out-edges from a randomly chosen gene of length l

Xl = Xlk Gaussian distributed via the Central Limit Theorem

with mean < Xl> and variance < Xl 2 >- < Xl > 2

Xl Poisson for large l

3. Calculation of the out-degree distribution

Page 33: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

•mean out-degree for genes of length l

for model with exponential gene length distribution

< n(k)> = L p 2 q – k q = 1 - p, probability of a coding element

dl = < Xk > = k l <Xkl > = k l p(l, k) <n(k)>

= Lp (q z) l / (p+ q z l ) ~ (qz) l

•variance of out-degree distribution - length l

l 2 = < Xl

2 >- < Xl > 2

= dl p (1-z l) / [1-q (1-z l ) 2] ~ dl for large l

for large l, dl l 2 Poissonian

Page 34: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

hl l ~ n ( l )

hl ~ n( l ) / l ~ L p 2 q l / dl ½

dl ~ (qz) l h l ~ (q / z) - ½ l

h (d) ~ d - : ( q z )- = ( q / z) - ½ gives

= ½ (ln z + ln q) / (ln z - ln q)

½ - p / ln r

out-degree distribution for small l (large d) :

scaling behaviour of the envelope

2

h

Page 35: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

P( Xl = d ) = (dl ) d exp ( - dl ) / d ! Poisson

P(d ) = l n(l ) P (Xl = d)

= Lp l p q l (dl )d exp( - dl ) / d !

0 dx x d- - ½ e-x / d ! for large l

P(d) (d + ½ - ) / (d + 1) ~ d - - ½ : Gamma funx.

where = ½ (ln z + ln q) / (ln z - ln q) ~ ½ - p/ln r

Scaling exponent 1 + ½ = 1 - p / ln r

out-degree distribution for large l (small d)

Page 36: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

out-degree distribution : finite size effectsdotted: full Gaussian distribution taken for P (Xl = d )

solid lines: finite size correction dlout = (l

out )2 , P( Xl = d ) Poisson

Thus both for large and small l,

P(d ) = Lp l p q l (dl )d exp( - dl ) / d ! provides a good representation

Page 37: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

peaks well seperated for l < lc ~ 8

dl ~ ( q / 2) l ;

l ~ dl 0 slower than dl

crossover occurs where

dl – dl+1 ~ l

More precisely:

(dl – dl+1 ) / 2 = l dc 6.6

(From requiring that the minimum between the two Gaussian peaks centered at dl and dl+1 vanish)

4. Crossover in the scaling behaviour

Page 38: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

5. Simulation and analytical results:

The in-degree distribution

Solid line : finite size effect taken care of by inserting

dlin = (l

in )2

Page 39: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

The second peak can be obtained accurately from

dlin

= k l n(l ) p(l, k)

(lin )2 = k l n(l ) p(l, k) [ 1- p(l, k) ]

p(d in) = pq l [ 2 (lin )2]-½ exp [- (d- din)2/ 2 (l

in )2]

The in-degree distribution

Page 40: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

modelling gene interactions

A. Kabakcioglu, M, Mungan, D. Balcan, AE, preprint: sequence matching also operates in the case of

transcriptional gene interaction ?

claim: secondary structures (conformations) of transcription factors are determined by their amino acid sequence, coded for by the corresponding DNA sequence - the different folds expose precise regulatory sites, which are recognized by regulatory sequences on the genome ?

Page 41: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Experimental data from expression of mRNA in DNA array

M.Gustafsson, M, Hörnquist, A. Lombardi, “Large-scale reverse Engineering by Lasso,” q-bio.MN/040312. On data from P.T. Spellmann et al., Mol. Bio. Cell 9, 3273 (1998) from microarray experiments

Yeast data

(Saccaromyces cerevisae)

Page 42: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Expected model out-degree distribution, with

Gaussian RS length distribution

Page 43: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Model with a Gaussian RS length distribution

single realisation, adjustable parameters <l >, l

and Yeast data

Page 44: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Comparison of network of a single realisation of the

model chromosome and yeast microarray experiment

Page 45: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Consensus data (http://cgsigma.cshl/org )

for length distribution of Regulatory Segments

RS length Gaussian distribution with parameters fixed

by comparison with out-degree of yeast data

Page 46: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Single realisation for two independent sets of

Regulatory Sequences associated

with each node of the network Si, S’i

Connectivity rule: Si S’j

Note expected distributions will not change

Page 47: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Thanks Chrisantha!

Adnvances in Artificial Life:5th Eur. Conf. (ECAL99), Vol. 1674, LNAI, Springer

N=L/4 ppromoter seq.

of length p =4

“2”

Page 48: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

averaged over 20 genomes - “oscillatory behavior”

from superposition of Poisson peaks

Page 49: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Evolution of gene networks by gene duplicationWagner, PNAS 91, 4387 (1994), Vazquez, Flammini, Maritan and Vespignani, cond-mat/ 0108043, Sole, Pastor-Satorras, Smith and Kepler, Adv. Comlex Syst. 5, 43 (2002)

• take random network• duplicate gene with connections• take out the connections with prob. and establish

new connection to random node with probability scale free proteomic model

= 2.5 , C and minimum path length compares well with data

Page 50: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Sequence similarity

Page 51: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Gaussian network, evolution

by duplication of randomly chosen RSs, mutation

(Yasemin Şengün)

Page 52: Random sequence-matching model for emergent gene-regulatory networks Ayşe Erzan Istanbul Technical University, Gürsey Institute, Collegium Budapest Duygu

Summary

• random gene interaction network model with sequence matching for

- arbitrary alphabet- finite temperature (partial matching)

• outdegree distribution power law for small d - log-periodic for large d • exponents = 1- p / ln r , 1 = 0.5 - p / ln r ~ universal for small p• single realisations compare well with experiment• not scale free - crossover behaviour ?