introduction to bioinformatics

22
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj natasha@imperial 1

Upload: rhonda

Post on 27-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Bioinformatics. Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18. Nataša Pržulj [email protected]. Types of Biological Network Comparisons:. 2. 2. 2. 2. 2. 2. 2. 1. Network Alignment. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Bioinformatics

Introduction to Bioinformatics

Biological NetworksDepartment of ComputingImperial College London

March 18, 2010

Lecture hour 18

Nataša Prž[email protected]

1

Page 2: Introduction to Bioinformatics

22222222

Types of Biological Network Comparisons:

Page 3: Introduction to Bioinformatics

333

1. Network Alignment

3

• Finding structural similarities between two networks

Page 4: Introduction to Bioinformatics

4444

1. Network Alignment

4

RecallSubgraph isomorphism (NP-complete):• An isomorphism is a bijection between nodes of two networks G and H

that preserves edge adjacency

• Exact comparisons inappropriate in biology (biological variation)• Network alignment

– More general problem of finding the best way to “fit” G into H even if G does not exist as an exact subgraph of H

A 1B 2C 4D 3

G HC D 3 4

A B 1 2

Page 5: Introduction to Bioinformatics

5G H

1. Network Alignment

Page 6: Introduction to Bioinformatics

666666

1. Network Alignment

6

• Methods vary in these aspects:A. Global vs. local

B. Pairwise vs. multiple

C. Functional vs. topological information

Page 7: Introduction to Bioinformatics

77777

1. Network Alignment

7

• Methods vary in these aspects:A. Global vs. local

B. Pairwise vs. multiple

C. Functional vs. topological information

A. Local alignment: Mappings are chosen independently for each region of

similarity Can be ambiguous, with one node having pairings in

different local alignments Example algorithms:

PathBLAST, NetworkBLAST, MaWISh, Graemlin

Page 8: Introduction to Bioinformatics

888888

1. Network Alignment

8

• Methods vary in these aspects:A. Global vs. local

B. Pairwise vs. multiple

C. Functional vs. topological information

A. Global alignment: Provides a unique alignment from every node in the smaller

network to exactly one node in the larger network May lead to inoptimal machings in some local regions Example algorithms:

GRAAL, IsoRank, IsoRankN, Extended Graemlin

Page 9: Introduction to Bioinformatics

9999999

1. Network Alignment

9

• Methods vary in these aspects:A. Global vs. local

B. Pairwise vs. multiple

C. Functional vs. topological information

B. Pairwise alignment: Two networks aligned Example algorithms:

GRAAL, PathBLAST, MaWISh, IsoRankMultiple alignment: More than two networks aligned Example algorithms:

Greamlin, Extended PathBLAST, Extended IsoRank

Page 10: Introduction to Bioinformatics

1010101010101010

1. Network Alignment

10

• Methods vary in these aspects:A. Global vs. local

B. Pairwise vs. multiple

C. Functional vs. topological information

C. Functional information Information external to network topology used, e.g., protein sequence, to define

“similarity” between nodes Careful: mixing different biological data types, that might agree or contradict Example algorithms:

all except for GRAAL; some can exclude sequence, e.g. IsoRank, but then perform poorly

Topological information Only network topology used to define node “similarity” Good -- since it answers how much and what type of biological information can

be extracted from topology only

Page 11: Introduction to Bioinformatics

11111111

1. Network Alignment

11

• Since in general, the problem is computationally hard (generalizing subgraph isomorphism), heuristic approaches are devised.

• Key algorithmic components of network alignment algorithms:1. Scoring function – measures:

the similarity of each subnetwork to a predefined structure of interest

the level of conservation of this structure across the subnetworks being scored

the alignment quality

2. Search procedure: methods for searching for conserved subnetworks of interest

Page 12: Introduction to Bioinformatics

12

1. Measuring the alignment quality1) Edge correctness (EC)

• Percentage of edges in G that are aligned to edges in H

G H

1. Network Alignment

Page 13: Introduction to Bioinformatics

13

1. Measuring the alignment quality1) Edge correctness (EC)

• Percentage of edges in G that are aligned to edges in H

G H

EC=1/2=50%

1. Network Alignment

Page 14: Introduction to Bioinformatics

1414

1. Network Alignment

Page 15: Introduction to Bioinformatics

1515

1. Measuring the alignment quality3) Can the alignment be attributed to chance?

• Compare it with a random alignment of the two networks

• Compare it with the amount of alignment found between model networks (random graphs) of the size of the data

4) Biological quality of the alignment:• Do the aligned protein pairs have the same biological

function?

• Does the alignment identify evolutionary conserved functional modules?

• How much of the network alignment supported by sequence alignment? Note: We should not expect networks and sequences to give identical results!!

1. Network Alignment

Page 16: Introduction to Bioinformatics

161616

2. The search procedure – commonly used: How is “similarity” between nodes defined?

• Using information external to network topology, e.g., the sequence alignment score for protein pairs

• Using only network topology, e.g., node degree, common neighbourhood, graphlet degree vectors (e.g., GRAAL – GRAph Aligner – uses GDVs)

Use the most “similar” nodes across the two networks as “anchors” or

“seed nodes” in the two networks “Extend around” the seed nodes – “seed and extend method” – greedy

approach

1. Network Alignment

Page 17: Introduction to Bioinformatics

17171717

2. The search procedure – commonly used: How is “similarity” between nodes defined?

• Using information external to network topology, e.g., sequence alignment score of the proteins

• Using only network topology, e.g., node degree, common neighbourhood, graphlet degree vectors

Use the most “similar” nodes across the two networks as anchors or

“seed nodes” in the two networks “Extend around” the seed nodes – “seed and extend method” – greedy

approach

1. Network Alignment

Page 18: Introduction to Bioinformatics

18181818181818

1. Network Alignment

18

In a network alignment, we have interaction: matches (conserved interactions) – contribute to EC mismatches (gaps):

1. Insertions 2. Deletions

G1G2

If G2 evolved from G1:Gaps: B’1C’ vs BC (insertion of node 1)

C’2D’ vs CD (insertion of node 2)

If G1 evolved from G2:Gaps: B’1C’ vs BC (deletion of node 1)

C’2D’ vs CD (deletion of node 2)

Page 19: Introduction to Bioinformatics

1919191919

1. Network Alignment

19

One heuristic approach: A merged representation of the two networks being compared is

created, called a “network alignment graph” in which:• Nodes represent sets of molecules, one from each network• Edges represent conserved molecular interactions across different

networks• The alignment is simple when there exists a 1-to-1 correspondence

between molecules across the two networks, but in general there may be a complex many-to-many correspondence

Then apply a greedy algorithm for identifying conserved subnetworks embedded in the “network alignment graph”

Sharan and Ideker (2006) Sharan and Ideker (2006) Nature BiotechnologyNature Biotechnology 2424(4): 427-433(4): 427-433

Page 20: Introduction to Bioinformatics

202020202020

1. Network Alignment

20

“Network alignment graph”:

Sharan and Ideker (2006) Sharan and Ideker (2006) Nature BiotechnologyNature Biotechnology 2424(4): 427-433(4): 427-433

Page 21: Introduction to Bioinformatics

212121212121

1. Network Alignment

21

“Network alignment graph”:• Facilitates the search for conserved network regions• E.g.,

conserved dense clusters of interactions may indicate protein complexes,

conserved linear paths may correspond to signalling pathways• Finding conserved pathways was done by finding “high-scoring”

paths in the alignment graph (Kelley et al., PNAS, 2003): PathBLAST Identified five regions conserved across PPI networks of yeast S.

Cerevisiae and Helicobacter pylori Later extended to detect conserved protein clusters rather than

paths

Page 22: Introduction to Bioinformatics

Introduction to BioinformaticsBiological Networks

Department of ComputingImperial College London

March 18, 2010

Lecture hour 19

Review of the Course Material