massively distributed computing and an nrpgm project on protein structure and function
DESCRIPTION
Massively Distributed Computing and An NRPGM Project on Protein Structure and Function. Computation Biology Lab Physics Dept & Life Science Dept National Central University. From Gene to Protein. About Protein. Function Storage, Transport, Messengers, Regulation… Everything that sustains life - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/1.jpg)
Massively Distributed Computingand
An NRPGM Projecton
Protein Structure and Function
Computation Biology LabPhysics Dept & Life Science Dept
National Central University
![Page 2: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/2.jpg)
From Gene to Protein
![Page 3: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/3.jpg)
About Protein• Function
– Storage, Transport, Messengers, Regulation… Everything that sustains life– Structure: shell, silk, spider-silk, etc.
• Structure– String of amino acid with 3D structure– Homology and Topology
• Importance– Science, Health & Medicine– Industry – enzyme, detergent, etc.
• An example – 3hvt.pdb
![Page 4: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/4.jpg)
Protein Structure & Function • Primary sequence Native state with 3D structure
– Structure function– Expensive and time consuming
• Misfolding means malfunction– Mad cow disease (“prion” misfolds)
![Page 5: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/5.jpg)
The Folding Problem• Complexity of mechanism & pathway
is huge challenge to science and computation technology
![Page 6: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/6.jpg)
Molecular Dynamics (MD)• Molecular’s behavior determined by
– Ensemble statistics– Newtonian mechanics
• Experiment in silico• All-atom w. water
– Huge number of particles• Super-heavyduty computation• Software for macromolecular MD available
– CHARMm, AMBER, GROMACS
![Page 7: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/7.jpg)
Simple Statistics on MD Simulation
• Atoms in a typical protein and water simulation 32000 • Approximate number of interactions in force calculation 109 • Machine instructions per force calculation 1000 • Total number of machine instructions 1023
• Typical time-step size 10–15 s• Number of MD time steps 1011 steps• Physical time for simulation 10–4 s• Total calculation time (CPU: P4-3.0G ) days 10,000
![Page 8: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/8.jpg)
Protein Studies byMassively Distributed Computing
A Project in National Research Program on Genomic Medicine
• Scientific– Protein folding, structure, function, protein-
molecule interaction– Algorithm, force-field
• Computing– Massive distributive computing
• Education– Everyone and Anyone with a personal PC can take part
• Industry – collaborative development
![Page 9: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/9.jpg)
Distributed Computing• Concept
– Computation through internet– Utilize idle PC power (through screen-saver)
• Advantage– Inexpensive way to acquire huge computation power– Perfectly suited to task
• Huge number of runs needed to beat statistics• Parallel computation not ALWAYS needed
• Massive data - good management necessary• Public education – anyone w/ PC can take
part
![Page 10: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/10.jpg)
Hardware Strategies
• Parallel computation (we are not this)– PC cluster– IBM (The blue gene), 106 CPU
• Massive distributive computing– Grid computing (formal and in the future) – Server to individual client (now in inexpensive)
• Examples: SETI, folding@home, genome@home• Our project: protein@CBL
![Page 11: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/11.jpg)
Software Components• Dynamics of macromolecules
– Molecular dynamics, all atomistic or mean-field solvent
– Computer codes• GROMACS (for distributive comp; freeware)• AMBER and others (for in-house comp; licensed)
• Distributed Computing– COSM - a stable, reliable, and secure
system for large scale distributed processing (freeware)
![Page 12: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/12.jpg)
COSM’s Structure
Client
System tests(test all Cosm functions)
Self-tests
Connect to server
Send Request
Recv Assignment
Running Simulation
Put Result
Get Accept
Server
System test
Open Multithread( Working Channel)Connect to client
Recv Request
Send Assignment
Get Result
Put Accept
Packet Request
Packet Assignment
Packet Result
Packet Accept
![Page 13: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/13.jpg)
Structure at Server end
Send(COSM)
Receive
Jobs
clients
Protein database
•Temporary databank•Job analysis•Automatic temperature swaps by parallel tem- pering
Databank Human intervention
Exceptions
![Page 14: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/14.jpg)
Structure at Client end
Receive
MD Run
Return result
Delete files
RestartIf crash
Server
![Page 15: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/15.jpg)
Multi-temperature Annealing• Project suited for multi-temperature runs –
Parallel Tempering• Two configurations with energy and
temperature (E1, T1) and (E2, T2) Temperature swapped with probability
P = min{1, exp[-(E2-E1)(1/kT1 – 1/kT2)]}• Mode of operation
– Send same peptide at different temperature to many clients; let run; collect; swap T’s by multiple parallel tempering; randomly redistribute peptides with new T’s to clients
![Page 16: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/16.jpg)
Multi-temperature Annealingclient
client
client
client
client
client
client
ServerOld temperatures
Swap temps byMultiple
“peptide”parallel
tempering
New temperatures
Data
bank
![Page 17: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/17.jpg)
• Simulation of folding a small peptide for 100ns– Each run (105 simulation steps; 100 ps) ~100 min PC time– 1000 runs (100 ns) per “fold” ~105 min– Approx. 70 days on single PC running 24h/day
• Ideal client contribute 8h/day– 100 clients 70x3/100 = 2 days per fold – 10,000 clients 50 folds/day (small peptide)
• Mid-sized protein needs > 1 ms to fold – 7x105 days on single PC – 10,000 clients 210 days– 106 clients (!!) 2~3 days
Potential of Massive Distributive Computing
![Page 18: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/18.jpg)
Learning curve• Launched –August 2002 • Small PC-cluster – October 2002
– In-house runs to learn codes• Infrastructure for Distributive Computation
– Installation Gromacs & COSM – January-March 2003• Test runs and debugging
– IntraLaboratory test run – March-October 2003– NCU test run – July-October 2003
• Launched on WWW – 20 November 2003– Traffic jam – multiple server (see next slide)
• Scientific studies > November 2003
![Page 19: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/19.jpg)
In-House Test Runs• Time – Began March 2003• Clients
– About 25 PCs in CBL and outsiders ( MS-Window )
• Goal – test and debug– Test server-client communication
• Lots of debugging– Test data distribution, collection and
management– Test parallel tempering
![Page 20: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/20.jpg)
Step 1: Client sends request to Front-End ServerStep 2: FES assigns IP of a Back-End Server i to client Step 3: Client requests job from BESiStep 4: BESi sends job to clientStep 5: Client sends result to BESiRepeat cycle.
Client
Font-End
Server
Backend
Server 1
Backend
Server N
Client
Client
Multi-Server Architecture
![Page 21: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/21.jpg)
Current status and Plans for immediate future
• Last beta version Pac v0.9– Released on July 15– To lab CBL members & physics dept– About 25 clients
• First alpha version Pac v1.0 released October 1 2003• Current version Pac v1.2
– Releases for distributed computing on 20 November 2003– In search of clients
• Portal in “Educities” http://www.educities.edu.tw/ ~2,500 downloads, ~500 real clients• PC’s in university administrative units• City halls and county government offices• Talks and visits to universities and high schools
![Page 22: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/22.jpg)
Current Simulations1SOL: (20 res.) A Pip2 and F-Actin-Binding Site Of Gelsolin, Residue 150-169. One helix.
1ZDD: (35 res.) Disulfide-Stab-ilized Mini Protein A Domain.Two helices.
1L2Y: (20 res.)NMR Structure Of Trp-Cage Miniprotein Construct Tc5B; synthetic.
![Page 23: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/23.jpg)
A small test case – 1SOL
• Target peptide – 1SOL.pdb– 20 amino acids; 3-loop helix and 1 hairpin; 352 atoms; ~4000 bonds interaction– Unit time step= 1 fs
• Compare constant temperature and parallel-tempering– Constant T @ 300K– Parallel-tempering with about 20 peptides, results returned to server for swapping after each “run”, or 105 time steps (100 ps)
![Page 24: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/24.jpg)
P = min{1, exp[-(E2-E1)(1/kT1 – 1/kT2)]}
Parallel-tempering (1SOL)
200
250
300
350
400
450
500
550
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
273285300315333348366384405426447471498
Tem
pera
ture
(K)
Number of runs (in units of 100 ps)
![Page 25: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/25.jpg)
Const temp. (20ns)
Native conformation
Parallel-temp. (1.6ns)
Initial structure
Preliminary result on
1SOL
![Page 26: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/26.jpg)
A second test case – 1L2Y
• Simulation target – Trp-Cage
• 20 amino acids, 2 helical loops• A short, artificial and fold-by-itself peptide• Have been simulated with AMBER• Folding mechanism not well understood
![Page 27: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/27.jpg)
Swap History (1L2Y)
250
300
350
400
450
5001 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69
270290310330350370390410430450470
Number of runs (in units of 100 ps)
Tem
pera
ture
(K)
![Page 28: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/28.jpg)
Native stateInitial state
PAC 6ns
Preliminary result on
1L2Y(11
peptides)
![Page 29: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/29.jpg)
• Reduce size of water box– Save computation time
• Rewrite the energy function– Ignore the water-water interaction
• Increase cut-off radius• Try different simulation algorithms
for changing pressure and temperature
• Others…
Modifications needed
![Page 30: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/30.jpg)
• Better understanding of annealing procedure• Better understanding of energetics• Expand client community• Develop serious collaboration with biologists
– Structure biologists, e.g., NMR people– Protein function people– Drug designers
• “…investigation of motions that have particular functional implications and to obtain information that is not accessible to experiment.” Karplus and McCammon, Nature Strct. Biol. 2002
Looking ahead
![Page 31: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/31.jpg)
The Team• Funded by NRPGM/NSC• Computational Biology Laboratory Physics Dept & Life Sciences Dept National Central University
– PI: Professor HC Lee (Phys & LS/NCU)– Co-PI: Professor Hsuen-Yi Chen (Phys/NCU)– Jia-Lin Lo (PhD student)– Jun-Ping Yiu (MSc Res. Assistant)– Chien-Hao Wei (MSc RA)– Engin Lee ( MSc student )– PDF (TBA)– We are looking for collaborators, research associates, programmers, students, etc.
![Page 32: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/32.jpg)
http://protein.ncu.edu.tw
![Page 33: Massively Distributed Computing and An NRPGM Project on Protein Structure and Function](https://reader036.vdocuments.mx/reader036/viewer/2022062814/56816864550346895ddebaff/html5/thumbnails/33.jpg)
Thank you for your attention