p2p-based simulator for protein folding shun-yun hu 2005/06/03
Post on 21-Dec-2015
212 views
TRANSCRIPT
P2P-based Simulator forProtein Folding
Shun-Yun Hu
2005/06/03
Introduction
A Look at Simulations Simulations are important tools in scientific research Larger scale and higher resolution are constantly sought However, computational resource can be limited
An Untapped Potential 300 Million PCs on the Internet (2000 est.) Up to 80% to 90% of CPU is wasted Large supply of computing resource, growing rapidly
Examples
SETI@Home (UC Berkeley – space radio analysis) 5.3 M world-wide participants 2.2 M years of single-processor CPU 54 teraflop machine (current top 3: 70.72, 51.87, 35.86)
Folding@Home (Standford – protein’s 3D structure) 30,000 volunteers 1 M days of single-processor CPU Published 23 papers in: Science, Nature, Nature
Structural Biology, PNAS, JMB, etc
The Grand Question
Can we build the ultimate simulator for large-scale simulation utilizing millions of computers world-wide?
Potential applications: Nuclear reaction Star clusters Atomic-scale modeling in material science Weather, earthquakes Biology (protein, ecosystem, brain, ...)
Promise & Challenge of P2P
Promises Growing resource, decentralized
Scalable Commodity hardware Affordable
Challenges Topology maintenance dynamic join/leave Efficient content retrieval no global knowledge
A Simulation Scenario
How can we utilize P2P for simulation-purpose?
Answer: depends on what you want to simulate
We observe that many simulations… are spatially-oriented (i.e. based on coordinate systems) run in discrete time-steps exhibit localized interaction (i.e. short-range interaction)
example: molecular dynamics (MD) simulation Protein folding?
Protein Folding Problem
Thermodynamic Hypothesis: native structure has lowest free energy.
Simulation Difficulties
Timescale limitation of classical MD methods Small protein folds in 10s of s (10-6) full-atomic simulation of 1 ns (10-9) takes one CPU day 1,000 ~ 10,000 gap (it might take decades)
Rough energy landscape Funnel-like (quick initial descend) Local minimum traps
Folding@Home Parallelization Timescale limitation Folding time is statistically
distributed. Try many trajectories will obtain
folding in much shorter time
Free energy barriers Most time is spent in free energy
minimum “waiting” Re-initialize configurations after
crossing a barrier.
Limitations Can simulate only small proteins Simulation within time-step is not
decomposable
Molecular Dynamics in P2P
Many atoms (nodes) on a 2D plane ( > 1,000) Positions (coordinates) may change at each time-step How to synchronize positions with those in Area of Interest
(AOI)?
Area of Interest
Proposed Approach
Voronoi-based Overlay Network (VON) Supports spatially-oriented simulations Scalable, efficient, fully-distributed P2P
VON Design Concepts
Identify enclosing and boundary neighbors (EN & BN) Each node constructs a Voronoi of all AOI neighbors ENs are minimally maintained Mutual collaboration in neighbor discovery by BNs
Circle Area of Interest (AOI)
White self
Yellow enclosing neighbor (EN)
L. Blue boundary neighbor (BN)
Pink EN & BN
Green AOI neighbor
D. Blue unknown neighbor
Use Voronoi to solve the neighbor discovery problem
Summary
Idle CPU and networks are untapped potential resources for large-scale simulation
Protein folding is a global minimum search problem in complex energy landscape
Parallelization using P2P computing is an interesting yet unexplored possibility