molecular dynamics simulations of proteins with petascale special-purpose computer mdgrape-3 makoto...
TRANSCRIPT
Molecular Dynamics Simulations of Proteins with Petascale Special-Purpose Computer MDGRAPE-3
Makoto Taiji
Deputy Project DirectorComputational & Experimental Systems Biology GroupGenomic Sciences Center, RIKEN
What is GRAPE?
GRAvity PipE Special-purpose accelerator for classical p
article simulationsAstrophysical N-body simulationsMolecular Dynamics Simulations
MDGRAPE-3 : Petaflops GRAPE for Molecular Dynamics simulations
J. Makino & M. Taiji, Scientific Simulations with Special-Purpose Computers,John Wiley & Sons, 1997.
MDGRAPE-3 (aka Protein Explorer)
Petaflops special-purpose computer for molecular dynamics simulations
Started at April 2002,Finished at June 2006
Part of Protein 3000 project – a project to determine 3,000 protein structures
M. Taiji et al, Proc. Supercomputing 2003, on CDROM.M. Taiji, Proc. Hot Chips 16, on CDROM (2004).
EGFR TT RNA Polymerase
Molecular Dynamics Simulations
Alpha.mpg
Folding of Chignolin,10-residue β-hairpindesign peptide(by Dr. A. Suenaga)
Bonding
+-
CoulombVdW
Force calculation dominatescomputational time
Require large computational power
How GRAPE works Accelerator to calculate forces
HostComputer
GRAPE
Most of Calculation → GRAPEOthers → Host computer
Particle Data
Results
•Communication = O (N) << Calculation = O (N2)•Easy to build, Easy to use•Cost Effective
History of GRAPE computers
106
103
100
10-3
09060300979491
Pea
k P
erfo
rman
ce (
Gflo
ps)
Year
Megaflops
Gigaflops
Teraflops
Petaflops
MD-GRAPE
MDM
MDGRAPE-3
11A
3
5
2
2A
4
GRAPE-6
GRAPE-DR
EarthSimulator
BlueGene/L
ORNL
Roadrunner
Eight Gordon Bell Prizes`95, `96, `99, `00 (double), `01, `03, `06
Why we build special-purpose computers?
Bottleneck of high-performance computing: Parallelization limit / Memory bandwidth Power Consumption = Heat Dissipation
These problems will become more serious in future.
Special-purpose approach: can solve parallelization limit for some applications relax power consumption ~100 times better cost-performance
Broadcast Parallelization Molecular Dynamics Case Two-body forces
For parallel calculation of Fi,
we can use the same
Broadcast Parallelization
- relax Bandwidth Problem
Pipeline 1
Pipeline 2
Pipeline i
Highly-Parallel Operations in Molecular Dynamics Processors For special-purpose computers
Broadcast Memory Architecture Efficient : 720 operations/cycle/chip
in MDGRAPE-3 chip possible to increase according to Moore’s law
In case of MD:MDGRAPE 600 nm 1 pipeline 1Gflops
MDGRAPE-2 250 nm 4 pipelines 16Gflops
MDGRAPE-3 130 nm 20 pipelines 180Gflops
Power Efficiency of Special-Purpose Computers If we compare at the same technology
Pentium 4 (0.13 m, 3GHz, FSB800) … 14W/Gflops MDGRAPE-3 chip (0.13m) … 0.1W/Gflops
Why ? Highly-parallel at low frequency
MDGRAPE-3: 250MHz, 720-equivalent operationsfor example, single-precision multiplier has 3 pipeline stages
Tuning accuracyMost of calculations are done in single precision
Slow I/O84-bit wide input and output port at 125 MHz (GTL)
Force Pipeline Calculate two-body
central forces
• 8 multipliers, 9 adders, and 1 function evaluator= 33 equivalent operations for Coulomb force calculation
A. H. Karp, Scientific Programming, 1, pp133–141 (1992)• Function Evaluator: approximate arbitrary functions by segmented
fourth-order polynomials• Multipliers : floating-point, single precision• Adders: floating-point, single precision / fixed-point 40 or 80 bit
Block Diagram of MDGRAPE-3 chip
Memory-in-a-chip Architecture
Memory for 32,768 particles
The same data is broadcasted to each pipeline
MDGRAPE-3 chip
216 GFLOPS@300MHz180 GFLOPS@250MHz17W at 300 MHz
Hitachi HDL4N130 nmVcore=+1.2V15.7 mm X 15.7 mm6.1 M random gates + 9 Mbit memory1444 pin FCBGA
MDGRAPE-3 Board 12 Chips/Board 2 boards/2U subrack = 5 Tflops Connected to PCI-X bus
via LVDS 10Gbit/s interface
MDGRAPE-3 system 4,778 dedicated LSI “MDGRAPE-3 chi
p” 300MHz(216Gflops) 3,890 250MHz(180Gflops) 888
Nominal Peak Performance: 1 Petaflops
Total 400 boards with 12(some 11) MDGRAPE-3 chips
Host : Intel Xeon Cluster, 370 cores Dual-core Xeon 5150(Woodcrest 2.66
GHz) 2way server x 85 Nodesprovided by Intel Corporation
Xeon 3.2DGHz 2way server x 15 Nodes
System Integration: Japan SGI Power Consumption : 200kW Size : 22 standard 19inch racks Cost : 8.6 M$ (including Labor)Special-Purpose
Computer(4,778 chips)
MDGRAPE-3boards
HostComputer(200 CPUs, 328 Cores)
PC100
PC81
Infin
iban
dS
witc
h
PC80
PC61
Infin
iban
dS
witc
h
PC60
PC41
Infin
iban
dS
witc
h
PC40
PC21
Infin
iban
dS
witc
h
PC20
PC1
Infin
iban
dS
witc
h
Infin
iban
dS
witc
h
Four IB 4x40Gbps
MDGRAPE-3 system
Gordon Bell Prize 2006 :Gordon Bell Prize 2006 :185 TFLOPS simulations of Amyloi185 TFLOPS simulations of Amyloid-Forming Peptides from Yeast Supd-Forming Peptides from Yeast Sup35 Proteins35 Proteins
Sustained Performance ofParallel System Gordon Bell 2006 Honorable M
ention, Peak Performance Amyloid forming process of Ye
ast Sup 35 peptides Systems with 17 million atoms Cutoff simulations
(Rcut = 45 Å) Nominal peak : 860 Tflops Running speed : 370 Tflops Sustained performance: 185 Tf
lops Efficiency ~ 45 %
Gordon Bell Prize 2007 Finalist
281 TFLOPS for X-ray protein structure analysis GA-DS (Genetic Algorithm- Direct Space)
method Discrete Fourier Transformation by MDGRAPE-3
Peak Performance: 303 TFLOPS Efficiency ~ 93%
What can be done by large-scale molecular dynamics simulations? Predictions of
Protein-Protein interactions Protein-Ligand interactions (screening) Protein-DNA/RNA interactions Effect of SNPs Effect of chemical modifications
Molecular mechanism of proteins in atomic level Protein Folding – dynamics of folding pathways
We provide methods to bridge knowledge on molecular structures and functions
Ligand screening using MD
High-precision estimation of binding free energy of protein-ligand complex with MD
1000 poses, 200 ligands/day Suitable for Lead Optimization
Evaluate affinities before chemical synthesis
Currently we mainly focus on MMPBSA which is suitable for large-scale applications
N. Okimoto et al., in preparation.
Combining Molecular Docking and Molecular Dynamics Simulations
Combination of molecular docking and high-speed MD simulations improve these drawbacks.
Molecular docking techniques Advantages: they explore the vast conformational space of ligand
s in a short time. Drawbacks: predictions of binding free energies are less reliable.
Molecular Dynamics (MD) simulation techniques Advantages: they can treat both ligand and protein in a flexible way, c
onsider the solvation effect, and predict accurate binding free energies. Drawbacks: they are time-consuming and the system are often trappe
d in local minima.
High-performance special purpose computer, MDGRAPE-3 MDGRAPE-3, enable high-speed and high-accuracy MD simulations.
Our Screening Approach
Library of Chemical Compounds
(6,000,000)Enriched Library Target Protein
X-ray, NMR, Modeling
Successfully Docked Compounds
Lead Compound
Lead Optimization
Drug Candidates Experimentaltesting
Drug
Molecular Docking
MD Simulations Using MDGRAPE-3MDGRAPE-3
Drug-like compounds
De Novo Design
Pre-filtering•Lipinski’s rule•Similarity analysis•Pharmacophore analysis
Test of Our Approach
Lead Discovery
Speed of Screening
Multiconformer MD plus MM-PBSA+PCA calculations
1200 poses/128 CPUs/a day ~ 300 ligands/128 CPUs/ a day
Singleconformer MD plus MM-PBSA+PCA calculations
1200 ligands/128 CPUs/a day
AMD simulation 2.5 hours
MM-PBSA 1.25 hours PCA 1.25 hours
Core 1
Core 2
1CPU
2.5 hours/1pose/1 CPUs
Screening of 10,000 compounds•Screening of 10,000 compounds by molecular docking : 1 days•Screening of top 1,000 compounds by MD (5,000 poses): 4-5 days
Screening Performance: Trypsin- Enrichment Curves - (goldscore/ligprep)
0102030405060708090
100
0 10 20 30 40 50 60 70 80 90 100
GOLD(GOLDscore)G01G06G09G12random
% of Top 1,000 compounds
% o
f ac
tive
com
poun
ds
3 times5 times
Other Applications
Protein/Peptide Designs Transcription Factors Growth Factors Fusion Proteins Enzymes
Large-Scale Calculations Virus Capsid Proteins Molecular Motors Ribosome Membranes
Next Generation: MDGRAPE-4
Tile Processor with special-purpose tile 200 General-purpose Processor Tiles
0.5TFlops (DP), 1TFlops (SP) 20 Special-Purpose Tiles for Molecular Dynamics
0.7TFlops Fast on-chip network with 200~300GB/s router
bandwidth 5 Memory Access Tiles
160~320GB/s 45nm <<100W peak power with Duraid Madeina, Univ. Tokyo
R R R R
R R R R
R R R R
R R R R
RR RR
R
R
R
R
R
ProcessorTiles
Memory-AccessTiles
Special-PurposeTiles
R R R RR
R
R
R
R
R
R
SynchronizedGroups
Block Diagram of MDGRAPE-4 Processor
Acknowledgements
Dr. Yousuke OhnoDr. Atsushi SuenagaDr. Noriyuki FutatsugiDr. Gentaro MorimotoMs. Hiroko KondoMs. Ryoko Yanai
Dr. Tetsu Narumi(Keio Univ.)
Mr. Duraid Madeina(U. Tokyo)
Ministry of Education, Culture, Sports, Science & Technology Intel Corporation for early processor support Japan SGI for system integration