probabilistic ensembles for improved inference in protein -structure determination

38
Probabilistic Ensembles for Improved Inference in Protein-Structure Determination Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011

Upload: gili

Post on 23-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Probabilistic Ensembles for Improved Inference in Protein -Structure Determination. Ameet Soni* and Jude Shavlik Dept . of Computer Sciences Dept. of Biostatistics and Medical Informatics. Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Probabilistic Ensembles for Improved Inference in

Protein-Structure Determination

Ameet Soni* and Jude ShavlikDept. of Computer SciencesDept. of Biostatistics and Medical Informatics

Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011

Page 2: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Protein Structure Determination

2

Proteins essential to mostcellular function Structural support Catalysis/enzymatic activity Cell signaling

Protein structures determine function

X-ray crystallography is main technique for determining structures

Page 3: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Task Overview3

Given A protein sequence Electron-density map

(EDM) of protein

Do Automatically produce a

protein structure that Contains all atoms Is physically feasible

SAVRVGLAIM...

Page 4: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Challenges & Related Work4

1 Å 2 Å 3 Å 4 Å

Our Method: ACMI

ARP/wARPTEXTAL & RESOLVE

Resolution is a

property of the protein

Higher Resolution : Better Quality

Page 5: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Outline5

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Page 6: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Outline6

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Page 7: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Our Technique: ACMI7

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

bk

bk-1

bk+1*1…M

Page 8: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Results[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007]

8

Page 9: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

ACMI Outline9

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

bk

bk-1

bk+1*1…M

Page 10: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Phase 2 – Probabilistic Model

10

ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF)

LEU4 SER5GLY2 LYS3ALA1

Page 11: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Probabilistic Model11

# nodes: ~1,000# edges:

~1,000,000

Page 12: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Approximate Inference12

Best structure intractable to calculatei.e., we cannot infer the underlying structure analytically

Phase 2 uses Loopy Belief Propagation (BP) to approximate solution Local, message-passing scheme Distributes evidence between nodes

Page 13: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Loopy Belief Propagation13

LYS31 LEU32

mLYS31→LEU32

pLEU32pLYS31

Page 14: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Loopy Belief Propagation14

LYS31 LEU32

mLEU32→LEU31

pLEU32pLYS31

Page 15: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Shortcomings of Phase 215

Inference is very difficult ~1,000,000 possible outputs for one amino

acid ~250-1250 amino acids in one protein Evidence is noisy O(N2) constraints

Approximate solutions, room for improvement

Page 16: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Outline16

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Page 17: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Ensembles: the use of multiple models to improve predictive performance

Tend to outperform best single model [Dietterich ‘00] Eg, Netflix prize

Ensemble Methods17

Page 18: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Phase 2: Standard ACMI18

Protocol

MRF

P(bk)

Page 19: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Phase 2: Ensemble ACMI19

Protocol 1

MRF

Protocol 2

Protocol C

P1(bk)

P2(bk)

PC(bk)

Page 20: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Probabilistic Ensembles in ACMI (PEA)20

New ensemble framework (PEA) Run inference multiple times, under

different conditions Output: multiple, diverse, estimates of each

amino acid’s location

Phase 2 now has several probability distributions for each amino acid, so what?

Page 21: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

ACMI Outline21

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3bk

bk-1

bk+1*1…M

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

Page 22: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Place next backbone atom

Backbone Step (Prior work)22

(1) Sample bk from empirical Ca- Ca- Ca pseudoangle distribution

bk-1b'k

bk-2

????

?

Page 23: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Place next backbone atom

Backbone Step (Prior work)23

0.25…

bk-1

bk-2

(2) Weight each sample by its Phase 2 computed marginal

b'k0.20

0.15

Page 24: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Place next backbone atom

Backbone Step (Prior work)24

0.25…

bk-1

bk-2

(3) Select bk with probability proportional to sample weight

b'k0.20

0.15

Page 25: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Backbone Step for PEA25

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? Aggregator

w(b'k)

Page 26: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Backbone Step for PEA: Average

26

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? AVG

0.14

Page 27: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Backbone Step for PEA: Maximum

27

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? MAX

0.23

Page 28: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Backbone Step for PEA: Sample

28

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? SAMP

0.15

Page 29: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Review: Previous work on ACMI

29

Prot

ocol

P(bk)

0.25

bk-1

bk-2

0.20

0.15

Phase 2 Phase 3

Page 30: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Prot

ocol

Prot

ocol

Review: PEA30

Prot

ocol

bk-1

bk-2

0.14

0.26

0.05

Phase 2 Phase 3AG

G

Page 31: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Outline31

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Page 32: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Experimental Methodology32

PEA (Probabilistic Ensembles in ACMI) 4 ensemble components Aggregators: AVG, MAX, SAMP

ACMI ORIG – standard ACMI (prior work) EXT – run inference 4 times as long BEST – test best of 4 PEA components

Page 33: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Phase 2 Results33

*p-value < 0.01

Page 34: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Protein Structure Results34

*p-value < 0.05

Correctness Completeness

Page 35: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Protein Structure Results35

Page 36: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Impact of Ensemble Size36

Page 37: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Conclusions37

ACMI is the state-of-the-art method for determining protein structures in poor-resolution images

Probabilistic Ensembles in ACMI (PEA) improves approximate inference, produces better protein structures

Future Work General solution for inference Larger ensemble size

Page 38: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Acknowledgements38

Phillips Laboratory at UW - Madison UW Center for Eukaryotic Structural Genomics

(CESG)

NLM R01-LM008796 NLM Training Grant T15-LM007359 NIH Protein Structure Initiative Grant

GM074901

Thank you!