p2p-based simulator for protein folding shun-yun hu 2005/06/03

P2P-based Simulator forProtein Folding

Shun-Yun Hu

2005/06/03

Introduction

A Look at Simulations Simulations are important tools in scientific research Larger scale and higher resolution are constantly sought However, computational resource can be limited

An Untapped Potential 300 Million PCs on the Internet (2000 est.) Up to 80% to 90% of CPU is wasted Large supply of computing resource, growing rapidly

Examples

SETI@Home (UC Berkeley – space radio analysis) 5.3 M world-wide participants 2.2 M years of single-processor CPU 54 teraflop machine (current top 3: 70.72, 51.87, 35.86)

Folding@Home (Standford – protein’s 3D structure) 30,000 volunteers 1 M days of single-processor CPU Published 23 papers in: Science, Nature, Nature

Structural Biology, PNAS, JMB, etc

The Grand Question

Can we build the ultimate simulator for large-scale simulation utilizing millions of computers world-wide?

Potential applications: Nuclear reaction Star clusters Atomic-scale modeling in material science Weather, earthquakes Biology (protein, ecosystem, brain, ...)

Promise & Challenge of P2P

Promises Growing resource, decentralized

Scalable Commodity hardware Affordable

Challenges Topology maintenance dynamic join/leave Efficient content retrieval no global knowledge

A Simulation Scenario

How can we utilize P2P for simulation-purpose?

Answer: depends on what you want to simulate

We observe that many simulations… are spatially-oriented (i.e. based on coordinate systems) run in discrete time-steps exhibit localized interaction (i.e. short-range interaction)

example: molecular dynamics (MD) simulation Protein folding?

Protein Folding Problem

Thermodynamic Hypothesis: native structure has lowest free energy.

Simulation Difficulties

Timescale limitation of classical MD methods Small protein folds in 10s of s (10-6) full-atomic simulation of 1 ns (10-9) takes one CPU day 1,000 ~ 10,000 gap (it might take decades)

Rough energy landscape Funnel-like (quick initial descend) Local minimum traps

Folding@Home Parallelization Timescale limitation Folding time is statistically

distributed. Try many trajectories will obtain

folding in much shorter time

Free energy barriers Most time is spent in free energy

minimum “waiting” Re-initialize configurations after

crossing a barrier.

Limitations Can simulate only small proteins Simulation within time-step is not

decomposable

Molecular Dynamics in P2P

Many atoms (nodes) on a 2D plane ( > 1,000) Positions (coordinates) may change at each time-step How to synchronize positions with those in Area of Interest

(AOI)?

Area of Interest

Proposed Approach

Voronoi-based Overlay Network (VON) Supports spatially-oriented simulations Scalable, efficient, fully-distributed P2P

VON Design Concepts

Identify enclosing and boundary neighbors (EN & BN) Each node constructs a Voronoi of all AOI neighbors ENs are minimally maintained Mutual collaboration in neighbor discovery by BNs

Circle Area of Interest (AOI)

White self

Yellow enclosing neighbor (EN)

L. Blue boundary neighbor (BN)

Pink EN & BN

Green AOI neighbor

D. Blue unknown neighbor

Use Voronoi to solve the neighbor discovery problem

Summary

Idle CPU and networks are untapped potential resources for large-scale simulation

Protein folding is a global minimum search problem in complex energy landscape

Parallelization using P2P computing is an interesting yet unexplored possibility

p2p-based simulator for protein folding shun-yun hu 2005/06/03

Documents