biological networks - diunitobotta/didattica/aa0809/lez17.pdf · gliceraldeide3fosfato...
TRANSCRIPT
Bioinformatica
Biological Networks
Biological Networks
Bioinformatica
a.a. 2008-2009
Francesca Cordero
Networks for Wilson
”...The greatest challenge today, not just in cell biology but all science, osthe accurate and complete description of complex systems. Scientists havebroken down many kinds of systems. They think they know most of theelements and forces. The next task is to reassemble them, at least inmathematical models that capture the key properties of the entireensembles. ...”
Astrazione:
Semplificazione → Adozione di un formalismo → Simulazione
Biological Networks
Bioinformatica2
a.a. 2008-2009
Francesca Cordero
Astrazione - Perche?
La biologia studia sistemi complessi, nell’uomo:
• ≈ 24000 geni
• decine di migliaia di proteine differenti
• almeno 26 tipi di modicazioni post-translazioni
• ≈ 80 possibili localizzazioni sub-cellulari
• una media ”stimata” di 10 interazioni per proteina con proteine chepresentano centinaia di interazioni
• 518 chinasi, 218 fosfatasi, 80 small GTPasi
circa 300000-400000 interazioni proteina-proteina
Biological Networks
Bioinformatica3
a.a. 2008-2009
Francesca Cordero
Astrazione - Perche?
Cosa astrarre?
• L’insieme delle interazioni funzionali del sistema biologico
Come astrarre?
• La rete come astrazione dei fenomeni biologici
Come rappresentare?
• Il grafo, come rappresentazione simbolica
Esempi:
• Sistemi di trasduzione del segnale = insieme di molecole intracellulari,sicamente connesse, deputate alla processazzione di informazione
• Sistema Nervoso Centrale = insieme di neuroni sicamente connessi
• Sistema Immunitario = insieme di cellule e molecole sicamente efunzionalmente connesse
Biological Networks
Bioinformatica4
a.a. 2008-2009
Francesca Cordero
High-throughput Technologies
DNA microarrays → Espressione genica → Genomics
Protein-mAbs arrays, 2D-PAGE+SpecMass, → Interactome,Phosphoproteome, Metabolome etc.. → Proteomics
Genomics & Proteomics =⇒ Ottenimento ”massivo” di datisperimentali
=⇒ Ricostruzione ”rapida” e a ”basso costo” dei network biologici
System Biology
Studia la biologia con analisi statiche e dinamiche.Con sistemi di equazioni differenziali, algoritmi e simulazione delladinamica, cambiamento nel tempo.
Biological Networks
Bioinformatica5
a.a. 2008-2009
Francesca Cordero
Analisi Dinamica
Biological Networks
Bioinformatica6
a.a. 2008-2009
Francesca Cordero
Analisi Dinamica
Glucosio
Glucosio6P
Fruttosio6p
Fruttosio1,6P
FosfoFruttosioChinasi
DiidrossiacetoneFosfatasi
TriFosfatoIsomerasi
Gliceraldeide3Fosfato BiFosfoGlicerato
FosfoGlicerato FosfoGlicerato2 Fosfoenolpiruvato Piruvato
FosfoglucosioIsomerasi
Esochinasi
Aldolasi
EnolasiGliceraldeideTriFosfIdroFosfogliceratoKin
FosfoGliceratoMutasi PiruvatoKinase
T1
T2
T3
T4
T5
T6 T7 T8 T9 T10
Biological Networks
Bioinformatica7
a.a. 2008-2009
Francesca Cordero
Analisi Statica
Analisi topologica, studio dell’architettura locale e globale della rete. Nonpermette di simulare il cambiamento nel tempo, ma permette diidentificare le proprieta strutturali della rete che permettono la dinamicadel sistema. Lo studio delle prorpieta topologiche permette di rispondere adomande come:
• Il sistema e’ robusto? (= tollerante alle interferenze)
• quali sono i possibili regolatori/effettori di una molecola?
• esistono meccanismi regolatori cellulo-specifici?
• quali sono gli effetti collaterali di un farmaco?
• quali sono i meccanismi di azione di un farmaco?
• quali sono le alterazioni generali molecolari durante lo shock settico?
• qual’e il meccanismo molecolare della resistenza alla terapia in AML?
Biological Networks
Bioinformatica8
a.a. 2008-2009
Francesca Cordero
Network measures
Network biology offers a quantifiable description of the networks thatcharacterize various biological systems. The most basic network measuresthat allow us to compare and characterize different complex networks are:
• Degree Number of links the node has to other nodes, k.Incomingdegree kin and outgoing degree kout. An undirected network with Nnodes and L links is characterized by an average degree 〈k〉 = 2L/N .
• Degree distribution P (k) gives the probability that a selected nodehas exactly k links. P (k) is obtained by counting the number of nodesN(k) with k = 1, 2 . . . links and dividing by the total number of nodesN .
• Shortest path and mean path length distance in networks ismeasures with the path length, which tell us how many links we needto pass through to travel between two nodes. In directed networks, thedistance lAB from node A to node B is often different from thedistance lBA. The mean path length , 〈l〉, represents the average overthe shortest paths between all pairs of nodes.
Biological Networks
Bioinformatica9
a.a. 2008-2009
Francesca Cordero
Network measures
• Cluster Coefficient In many networks if node A is connected to Band B is connected to C, then is highly the probable that A also has adirect link to C. This phenomens can be quantified using the clustercoefficient CI = 2n/k(k − 1). The average coefficient 〈C〉 characterizesthe overall tendency of nodes to form clusters.C(k) defines the average cluster coefficient of all nodes with k links.
where CA = 2
20and CF = 0
Biological Networks
Bioinformatica10
a.a. 2008-2009
Francesca Cordero
Examples of metabolic networks
Biological Networks
Bioinformatica11
a.a. 2008-2009
Francesca Cordero
Architectural features of cellular networks
Three network models:
• Random networks
– The ER model start with N nodes and connects each pair of nodeswith probability p.
– The node degree follow a Poisson distribution, means that mostmodes have approximately the same number of links.
– The clustering coefficient is independent of node’s degree.
– The mean path length si proportional to the logarithm of networksize l ≈ logN .
Biological Networks
Bioinformatica12
a.a. 2008-2009
Francesca Cordero
Random networks
Biological Networks
Bioinformatica13
a.a. 2008-2009
Francesca Cordero
Architectural features of cellular networks
• Scale-free networks
– Networks are highly non-uniform, most of the nodes have only a fewlinks, instead few nodes with vary large number of links (HUBS)
– The probability that a node has k links follows P (k) ≈ kγ , where γis the degree exponent. The probability that a node is highlyconnected is statistically more significant than in a random graph.
– Scale-free networks are characterized by a power-law degreedistribution; such distributions are seen as a straight line on aloglog plot.
– C(k) is independent of k.
– The average path length following ≈ loglogN , which is significantlyshorter than logN that characterizes random small-world networks.
Biological Networks
Bioinformatica14
a.a. 2008-2009
Francesca Cordero
Scale-free networks
Biological Networks
Bioinformatica15
a.a. 2008-2009
Francesca Cordero
Architectural features of cellular networks
• Hierarchical networks
– Construction:
1. small cluster of four densely linked nodes:2. Three replicas of this module are generated and the three
external nodes of the replicated clusters connected to the centralnode of the old cluster, which produces a large 16-node module.
3. Three replicas of this 16-node module are generated andconnected to the central node
4. A new module of 64 nodes.
– The network that has a power-law degree distribution
– System-size independent average clustering coefficient 〈C〉 ≈ 0.6.
– Scaling of the clustering coefficient, which follows C(k) ≈ k1 astraight line of slope 1 on a loglog plot .
Biological Networks
Bioinformatica16
a.a. 2008-2009
Francesca Cordero
Hierarchical networks
Biological Networks
Bioinformatica17
a.a. 2008-2009
Francesca Cordero
Which type are the cellular networks?
The analysis of the metabolic networks of 43 different organisms from allthree domains of life (eukaryotes, bacteria, and archaea) indicates that thecellular metabolism has a scale-free topology.
• MOST metabolic substrates participate in only one or two reactions,FEW such as pyruvate or coenzyme A, participate in dozens andfunction → metabolic hubs.
• Genetic regulatory networks
Biological Networks
Bioinformatica18
a.a. 2008-2009
Francesca Cordero
Which type are the cellular networks?
BUT other nets as transcription regulatory networks of S. cerevisiae andEscherichia coli is MIXED scale-free and exponential:
• Most transcription factors regulate only a few genes
• Few general transcription factors interact with many genes.
• that most genes are regulated by one to three transcription factors.
Biological Networks
Bioinformatica19
a.a. 2008-2009
Francesca Cordero
Small-word effect
Two nodes can be connected with a path of a few links only.
In the cell, ”small-world effect” was documented for metabolism:paths of only three to four reactions can link most pairs of metabolites.
This short path length indicates that local perturbations in metaboliteconcentrations could reach the whole network very quickly.
Biological Networks
Bioinformatica20
a.a. 2008-2009
Francesca Cordero
Why scale-free architecture?
• Most networks are the result of a growth process, during which newnodes join the system over an extended time period.
• Nodes prefer to connect to nodes that already have many links.
Duplicated genes produce identical proteins that interact with the sameprotein partners.
Biological Networks
Bioinformatica21
a.a. 2008-2009
Francesca Cordero
Motif and Modules
Modularity refers to a group of physically or functionally linked molecules(nodes) that work together to achieve a (relatively) distinct function.
Biology is full of examples of modularity:
• Protein-protein and protein-RNA complexes (physical modules)
• Temporally coregulated groups of molecules are known to governvarious stages of the cell cycle
• To convey extracellular signals in bacterial chemotaxis
In a network representation, a module (or cluster) appears as a highlyinterconnected group of nodes.
Biological Networks
Bioinformatica22
a.a. 2008-2009
Francesca Cordero
Motif and Modules
The high clustering indicates that networks are locally splitted withvarious subgraphs of highly inter-linked groups of nodes.
• Modules by definition imply that there are groups of nodes that arerelatively isolated from the rest of the system.
• However, in a scale-free network hubs are in contact with a highfraction of nodes, which makes the existence of relatively isolatedmodules unlikely.
• Clustering and hubs naturally coexist, however, which indicates thattopological modules are not independent, but combine to form ahierarchical network.
Biological Networks
Bioinformatica23
a.a. 2008-2009
Francesca Cordero
Robustness
Scale-free networks do not have a critical threshold for disintegration: if80% of randomly selected nodes fail, the remaining 20% still form acompact cluster with a path connecting any two nodes. This is becauserandom failure affects mainly the numerous small degree nodes, theabsence of which does not disrupt the networks integrity.
• It is increasingly accepted that adaptation and robustness are inherentnetwork properties, and not a result of the fine-tuning of a componentscharacteristics
• Robustness is inevitably accompanied by vulnerabilities (well selectednetwork components)
• The ability of a module to evolve also has a key role in developing orlimiting robustness (”frozen” modules as nucleicacid synthesis)
• Modularity and robustness: the weak communication between modulesprobably limiting the effects of local perturbations in cellular networks.
Biological Networks
Bioinformatica24
a.a. 2008-2009
Francesca Cordero
Esempio
Cellula ”virtuale”(interattoma) = GLOBAL Network = 14500 nodi(proteine) - 94600 interazioni Solo mammiferi (> 90%homo sapiens, il restomouse e rat) Vericate mediante differenti approcci sperimentali
• L’esperimento: stimolazione di linfociti T umani con SDF-1alpha
• LISTA di proteine fosforilate da CXCR4 (SDF-1a)
• La lista serve ad estrarre informazione dalla rete globale, cioe aricostruire la sottorete specicamente generata dalle proteine fosforilate.
• Il livello di fosforilazione l’attributo dei nodi del probe
Biological Networks
Bioinformatica25
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
• The complex functions of a living cell are carried out through theconcerted activity of many genes and gene products. This activity isoften coordinated by the organization of the genome into regulatorymodules, or sets of coregulated genes that share a common function.
• Identifying this organization is crucial for understanding cellularresponses to internal and external signals.
• Genome-wide expression profiles
GoalModule networks procedure; a method based on probabilistic graphicalmodels for inferring regulatory modules from gene expression data.
Biological Networks
Bioinformatica26
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
AssumptionThe regulators are themselves transcriptionally regulated, so that theirexpression profiles provide information about their activity level.
Procedure
Biological Networks
Bioinformatica27
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
The algorithm searches simultaneously for a partition of genes into modulesand for a regulation program for each module that explains the expressionbehavior of genes in the module. The regulation program of a module.
Biological Networks
Bioinformatica28
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
Respiration module Hap4 TF module’s top regulator, infact Hap4-DNAbiding sequence motif is present in 29 of 55 genes in the module
Biological Networks
Bioinformatica29
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
All modules Summary of module analysis
Biological Networks
Bioinformatica30
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
To obtain a global perspective on the relationships between differentmodules:
• Compiled a graph of modules and cis-regulatory motifs
• Connected modules to their significantly enriched motifs
Biological Networks
Bioinformatica31
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
Experimental tests
Biological Networks
Bioinformatica32
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
Regulatory components
Biological Networks
Bioinformatica33
a.a. 2008-2009
Francesca Cordero
Paper - Module Networks
A key question regarding the validity of this approach is explaining howregulatory events can be inferred from gene expression data. To identify aregulatory relation in expression data, both the regulator and its targetsmust be transcriptionally regulated, resulting in detectable changes in theirexpression.
Failures:
• Regulatory activity by post-transcriptional changes
• Several regulators participate in the same regultatory event
Biological Networks
Bioinformatica34
a.a. 2008-2009
Francesca Cordero