biological networks bing zhang department of biomedical informatics vanderbilt university...

24
Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected]

Post on 19-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Biological networks

Bing Zhang

Department of Biomedical Informatics

Vanderbilt University

[email protected]

Page 2: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 2

Protein-protein interaction (PPI)

Definition Physical association of two or more

protein molecules

Examples Receptor-ligand interactions

Kinase-substrate interactions

Transcription factor-co-activator interactions

Multiprotein complex, e.g. multimeric enzymes

Cramer et al. Science 292:1863, 2001

RNA polymerase II, 12 subunits

Page 3: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 3

Significance of protein interaction

Most proteins mediate their function through interacting with other proteins To form molecular machines

To participate in various regulatory processes

Distortions of protein interactions can cause diseases

Page 4: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011

Method Bait strain: a protein of interest, bait (B), fused

to a DNA-binding domain (DBD) Prey strains: ORFs fused to a transcriptional

activation domain (AD) Mate the bait strain to prey strains and plate

diploid cells on selective media (e.g. without Histidine)

If bait and prey interact in the diploid cell, they reconstitute a transcription factor, which activates a reporter gene whose expression allows the diploid cell to grow on selective media

Pick colonies, isolate DNA, and sequence to identify the ORF interacting with the bait

Pros High-throughput Can detect transient interactions

Cons False positives Non-physiological (done in the yeast nucleus) Can’t detect multiprotein complexes

Uetz P. Curr Opin Chem Biol. 6:57, 2002

Yeast two-hybrid

4

Page 5: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011

Tandem affinity purification

Method TAP tag: Protein A, Calmodulin binding

domain, TEV protease cleavage site Bait protein gene is fused with the DNA

sequences encoding TAP tag Tagged bait is expressed in cells and forms

native complexes Complexes purified by TAP method Components of each complex are identified

through gel separation followed by MS/MS Pros

High-throughput Physiological setting Can detect large stable protein complexes

Cons High false positives Can’t detect transient interactions Can’t detect interactions not present under

the given condition Tagging may disturb complex formation Binary interaction relationship is not clear

Chepelev et al. Biotechnol & Biotechnol 22:1, 2008

5

Page 6: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011

Large scale protein interaction identification

Experimental Yeast two-hybrid

Tandem affinity purification

Computational Gene fusion

Ortholog interaction

Phylogenetic profiling

Microarray gene co-expression

Valencia et al. Curr. Opin. Struct. Biol, 12:368, 2002

6

Page 7: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011

Protein interaction data in the public domain

Database of Interacting Proteins (DIP)http://dip.doe-mbi.ucla.edu/

The Molecular INTeraction database (MINT)http://mint.bio.uniroma2.it/mint/

The Biomolecular Interaction Network Database (BIND)http://www.binddb.org/

The General Repository for Interaction Datasets (BioGRID)http://www.thebiogrid.org/

Human Protein Reference Database (HPRD)http://www.hprd.org

Online Predicted Human Interaction Database (OPHID)http://ophid.utoronto.ca

The Munich Information Center for Protein Sequences (MIPS)http://mips.gsf.de

7

Page 8: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

HPRD

BCHM352, Spring 2011 8

Page 9: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011

Protein interaction networks

Saccharomyces cerevisiae Jeong et al. Nature, 411:41, 2001

Drosophila melanogasterGiot et al. Science, 302:1727, 2003

Caenorhabditis elegans

Li et al. Science, 303:540, 2004

Homo sapiens Rual et al. Nature, 437:1173, 2005

9

Page 10: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Gene regulatory networks

Experimental Chromatin immunoprecipitation (ChIP)

ChIP-chip

ChIP-seq

Computational Promoter sequence analysis

Reverse engineering from microarray gene expression data

Public databases Transfac (http://www.gene-regulation.com)

MSigDB (http://www.broadinstitute.org/gsea/msigdb)

hPDI (http://bioinfo.wilmer.jhu.edu/PDI/ )

BCHM352, Spring 2011 10

Shen-orr et al. Nat Genet, 31:64, 2002

Page 11: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

KEGG metabolic network

BCHM352, Spring 2011 11

Page 12: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Network visualization tools

Cytoscape http://www.cytoscape.org

BCHM352, Spring 2011 12

Gehlenborg et al. Nature Methods, 7:S56, 2010

Page 13: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 13

Graph representation of networks

Cramer et al. Science 292:1863, 2001

edge

node

Graph: a graph is a set of objects called nodes or vertices connected by links called edges. In mathematics and computer science, a graph is the basic object of study in graph theory.

RNA polymerase II

Page 14: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 14

Undirected graph vs directed graph

Protein interaction network

Nodes: protein

Edges: physical interaction

Undirected

Transcriptional regulatory network

Nodes: transcription factors and genes

Edges: transcriptional regulation

Directed

TF->target gene

Metabolic network

Nodes: metabolites

Edges: enzymes

Directed

Substrate->Product

Krogan et al. Nature 440:637, 2006

Ravasz et al. Science 297:1551, 2002

Lee et al. Science 298:799, 2002

Fhl1

RPL2B

Page 15: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Degree, path, shortest path

Degree: the number of edges adjacent to a node. A simple measure of the node centrality.

Path: a sequence of nodes such that from each of its nodes there is an edge to the next node in the sequence.

Shortest path: a path between two nodes such that the sum of the distance of its constituent edges is minimized.

BCHM352, Spring 2011 15

YDL176W

Degree: 3

Fhl1

Out degree: 4

In degree: 0

Page 16: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Obama vs Lady Gaga: who is more influential?

BCHM352, Spring 2011 16

Obama 7,035,548701,301

Gaga 8,873,525144,263

Eminem 3,509,4690

Twitter followers(in degree)

Twitter following (out degree)

Page 17: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 17

Albert et al., Nature, 406:378, 2000

Random network 130 nodes, 215 edges Homogeneous: most nodes

have approximately the same number of links

Five red nodes with the highest number of links reach 27% of the nodes

Scale-free network 130 nodes, 215 edges Heterogeneous: the majority

of the nodes have one or two links but a few nodes have a large number of links

Five red nodes with the highest degrees reach 60% of the nodes (hubs)

Network properties (I): hubs

Page 18: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 18

Scale-free biological networks

Jeong et al, Nature, 407:651, 2000 Noort et al, EMBO Reports,5:280, 2004Stelzl et al. Cell, 122:957, 2005

Metabolic networkC. elegans

Protein interaction networkH. sapiens

Gene co-expression networkS. cerevisiae

Page 19: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 19

Network properties (II): small world network

Stanly Milgram’s small world experiment

Social network

Average path length between two person

Small world network: a graph in which most nodes can be reached from every other by a small number of steps.

Biological interpretation: Efficiency in transfer of biological information

Six degrees of separation

Omaha Boston

Wichita

"If you do not know the target person on a personal basis, do not try to contact him directly. Instead, mail this folder to a personal acquaintance who is more likely than you to know the target person."

Page 20: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 20

Network properties (III): motifs

Network motifs: Patterns that occur in the real network significantly more often than in randomized networks.

Three-node patternsMilo et al., Science, 298:824, 2002

Feed-forward loop

Feedback loop

Page 21: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

BCHM352, Spring 2011 21

Network properties (IV): modularity

Modularity refers to a group of physically or functionally linked molecules (nodes) that work together to achieve a relatively distinct function.

Examples Transcriptional module: a set of co-

regulated genes sharing a common function

Protein complex: assembly of proteins that build up some cellular machinery, commonly spans a dense sub-network of proteins in a protein interaction network

Signaling pathway: a chain of interacting proteins propagating a signal in the cell

Protein interaction modulesPalla et al, Nature, 435:841, 2005

Gene co-expression modulesShi et al, BMC Syst Biol, 4:74, 2010

Page 22: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Network distance vs functional similarity

Proteins that lie closer to one another in a protein interaction network are more likely to have similar function and involve in similar biological process.

Sharan et al. Mol Syst Biol, 3:88, 2007

22 BCHM352, Spring 2011

Page 23: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Network-based disease gene prioritization

Kohler et al. Am J Hum Genet. 82:949, 2008

23 BCHM352, Spring 2011

For a specific disease, candidate genes can be ranked based on their proximity to known disease genes.

Page 24: Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Summary

Biological networks Protein-protein interaction network; Gene regulatory network; Metabolic network

Graph representation of networks Graph, node, edge, undirected graph, directed graph, degree, path, shortest path

Network properties Hubs and scale-free degree distribution

Small-world

Motifs

Modularity

Network-based applications Disease gene prioritization

BCHM352, Spring 2011 24