probabilistic ensembles for improved inference in protein -structure determination
DESCRIPTION
Probabilistic Ensembles for Improved Inference in Protein -Structure Determination. Ameet Soni* and Jude Shavlik Dept . of Computer Sciences Dept. of Biostatistics and Medical Informatics. Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011. - PowerPoint PPT PresentationTRANSCRIPT
Probabilistic Ensembles for Improved Inference in
Protein-Structure Determination
Ameet Soni* and Jude ShavlikDept. of Computer SciencesDept. of Biostatistics and Medical Informatics
Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011
Protein Structure Determination
2
Proteins essential to mostcellular function Structural support Catalysis/enzymatic activity Cell signaling
Protein structures determine function
X-ray crystallography is main technique for determining structures
Task Overview3
Given A protein sequence Electron-density map
(EDM) of protein
Do Automatically produce a
protein structure that Contains all atoms Is physically feasible
SAVRVGLAIM...
Challenges & Related Work4
1 Å 2 Å 3 Å 4 Å
Our Method: ACMI
ARP/wARPTEXTAL & RESOLVE
Resolution is a
property of the protein
Higher Resolution : Better Quality
Outline5
Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results
Outline6
Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results
Our Technique: ACMI7
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
bk
bk-1
bk+1*1…M
Results[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007]
8
ACMI Outline9
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
bk
bk-1
bk+1*1…M
Phase 2 – Probabilistic Model
10
ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF)
LEU4 SER5GLY2 LYS3ALA1
Probabilistic Model11
# nodes: ~1,000# edges:
~1,000,000
Approximate Inference12
Best structure intractable to calculatei.e., we cannot infer the underlying structure analytically
Phase 2 uses Loopy Belief Propagation (BP) to approximate solution Local, message-passing scheme Distributes evidence between nodes
Loopy Belief Propagation13
LYS31 LEU32
mLYS31→LEU32
pLEU32pLYS31
Loopy Belief Propagation14
LYS31 LEU32
mLEU32→LEU31
pLEU32pLYS31
Shortcomings of Phase 215
Inference is very difficult ~1,000,000 possible outputs for one amino
acid ~250-1250 amino acids in one protein Evidence is noisy O(N2) constraints
Approximate solutions, room for improvement
Outline16
Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results
Ensembles: the use of multiple models to improve predictive performance
Tend to outperform best single model [Dietterich ‘00] Eg, Netflix prize
Ensemble Methods17
Phase 2: Standard ACMI18
Protocol
MRF
P(bk)
Phase 2: Ensemble ACMI19
Protocol 1
MRF
Protocol 2
Protocol C
P1(bk)
P2(bk)
PC(bk)
…
…
Probabilistic Ensembles in ACMI (PEA)20
New ensemble framework (PEA) Run inference multiple times, under
different conditions Output: multiple, diverse, estimates of each
amino acid’s location
Phase 2 now has several probability distributions for each amino acid, so what?
ACMI Outline21
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3bk
bk-1
bk+1*1…M
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
Place next backbone atom
Backbone Step (Prior work)22
(1) Sample bk from empirical Ca- Ca- Ca pseudoangle distribution
bk-1b'k
bk-2
????
?
Place next backbone atom
Backbone Step (Prior work)23
0.25…
bk-1
bk-2
(2) Weight each sample by its Phase 2 computed marginal
b'k0.20
0.15
Place next backbone atom
Backbone Step (Prior work)24
0.25…
bk-1
bk-2
(3) Select bk with probability proportional to sample weight
b'k0.20
0.15
Backbone Step for PEA25
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? Aggregator
w(b'k)
Backbone Step for PEA: Average
26
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? AVG
0.14
Backbone Step for PEA: Maximum
27
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? MAX
0.23
Backbone Step for PEA: Sample
28
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? SAMP
0.15
Review: Previous work on ACMI
29
Prot
ocol
P(bk)
0.25
…
bk-1
bk-2
0.20
0.15
Phase 2 Phase 3
Prot
ocol
Prot
ocol
Review: PEA30
Prot
ocol
bk-1
bk-2
0.14
…
0.26
0.05
Phase 2 Phase 3AG
G
Outline31
Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results
Experimental Methodology32
PEA (Probabilistic Ensembles in ACMI) 4 ensemble components Aggregators: AVG, MAX, SAMP
ACMI ORIG – standard ACMI (prior work) EXT – run inference 4 times as long BEST – test best of 4 PEA components
Phase 2 Results33
*p-value < 0.01
Protein Structure Results34
*p-value < 0.05
Correctness Completeness
Protein Structure Results35
Impact of Ensemble Size36
Conclusions37
ACMI is the state-of-the-art method for determining protein structures in poor-resolution images
Probabilistic Ensembles in ACMI (PEA) improves approximate inference, produces better protein structures
Future Work General solution for inference Larger ensemble size
Acknowledgements38
Phillips Laboratory at UW - Madison UW Center for Eukaryotic Structural Genomics
(CESG)
NLM R01-LM008796 NLM Training Grant T15-LM007359 NIH Protein Structure Initiative Grant
GM074901
Thank you!