molecular dynamics simulations of proteins with petascale special-purpose computer mdgrape-3 makoto...

Molecular Dynamics Simulations of Proteins with Petascale Special-Purpose Computer MDGRAPE-3

Makoto Taiji

Deputy Project DirectorComputational & Experimental Systems Biology GroupGenomic Sciences Center, RIKEN

What is GRAPE?

GRAvity PipE Special-purpose accelerator for classical p

article simulationsAstrophysical N-body simulationsMolecular Dynamics Simulations

MDGRAPE-3 : Petaflops GRAPE for Molecular Dynamics simulations

J. Makino & M. Taiji, Scientific Simulations with Special-Purpose Computers,John Wiley & Sons, 1997.

MDGRAPE-3 (aka Protein Explorer)

Petaflops special-purpose computer for molecular dynamics simulations

Started at April 2002,Finished at June 2006

Part of Protein 3000 project – a project to determine 3,000 protein structures

M. Taiji et al, Proc. Supercomputing 2003, on CDROM.M. Taiji, Proc. Hot Chips 16, on CDROM (2004).

EGFR TT RNA Polymerase

Molecular Dynamics Simulations

Alpha.mpg

Folding of Chignolin,10-residue β-hairpindesign peptide(by Dr. A. Suenaga)

Bonding

+-

CoulombVdW

Force calculation dominatescomputational time

Require large computational power

How GRAPE works Accelerator to calculate forces

HostComputer

GRAPE

Most of Calculation → GRAPEOthers → Host computer

Particle Data

Results

•Communication = O (N) << Calculation = O (N2)•Easy to build, Easy to use•Cost Effective

History of GRAPE computers

106

103

100

10-3

09060300979491

Pea

k P

erfo

rman

ce (

Gflo

ps)

Year

Megaflops

Gigaflops

Teraflops

Petaflops

MD-GRAPE

MDM

MDGRAPE-3

11A

3

5

2

2A

4

GRAPE-6

GRAPE-DR

EarthSimulator

BlueGene/L

ORNL

Roadrunner

Eight Gordon Bell Prizes`95, `96, `99, `00 (double), `01, `03, `06

Why we build special-purpose computers?

Bottleneck of high-performance computing: Parallelization limit / Memory bandwidth Power Consumption = Heat Dissipation

These problems will become more serious in future.

Special-purpose approach: can solve parallelization limit for some applications relax power consumption ~100 times better cost-performance

Broadcast Parallelization Molecular Dynamics Case Two-body forces

For parallel calculation of Fi,

we can use the same

Broadcast Parallelization

- relax Bandwidth Problem

Pipeline 1

Pipeline 2

Pipeline i

Highly-Parallel Operations in Molecular Dynamics Processors For special-purpose computers

Broadcast Memory Architecture Efficient : 720 operations/cycle/chip

in MDGRAPE-3 chip possible to increase according to Moore’s law

In case of MD:MDGRAPE 600 nm 1 pipeline 1Gflops

MDGRAPE-2 250 nm 4 pipelines 16Gflops

MDGRAPE-3 130 nm 20 pipelines 180Gflops

Power Efficiency of Special-Purpose Computers If we compare at the same technology

Pentium 4 (0.13 m, 3GHz, FSB800) … 14W/Gflops MDGRAPE-3 chip (0.13m) … 0.1W/Gflops

Why ? Highly-parallel at low frequency

MDGRAPE-3: 250MHz, 720-equivalent operationsfor example, single-precision multiplier has 3 pipeline stages

Tuning accuracyMost of calculations are done in single precision

Slow I/O84-bit wide input and output port at 125 MHz (GTL)

Force Pipeline Calculate two-body

central forces

• 8 multipliers, 9 adders, and 1 function evaluator= 33 equivalent operations for Coulomb force calculation

A. H. Karp, Scientific Programming, 1, pp133–141 (1992)• Function Evaluator: approximate arbitrary functions by segmented

fourth-order polynomials• Multipliers : floating-point, single precision• Adders: floating-point, single precision / fixed-point 40 or 80 bit

Block Diagram of MDGRAPE-3 chip

Memory-in-a-chip Architecture

Memory for 32,768 particles

The same data is broadcasted to each pipeline

MDGRAPE-3 chip

216 GFLOPS@300MHz180 GFLOPS@250MHz17W at 300 MHz

Hitachi HDL4N130 nmVcore=+1.2V15.7 mm X 15.7 mm6.1 M random gates + 9 Mbit memory1444 pin FCBGA

MDGRAPE-3 Board 12 Chips/Board 2 boards/2U subrack = 5 Tflops Connected to PCI-X bus

via LVDS 10Gbit/s interface

MDGRAPE-3 system 4,778 dedicated LSI “MDGRAPE-3 chi

p” 300MHz(216Gflops) 3,890 250MHz(180Gflops) 888

Nominal Peak Performance: 1 Petaflops

Total 400 boards with 12(some 11) MDGRAPE-3 chips

Host : Intel Xeon Cluster, 370 cores Dual-core Xeon 5150(Woodcrest 2.66

GHz) 2way server x 85 Nodesprovided by Intel Corporation

Xeon 3.2DGHz 2way server x 15 Nodes

System Integration: Japan SGI Power Consumption : 200kW Size : 22 standard 19inch racks Cost : 8.6 M$ (including Labor)Special-Purpose

Computer(4,778 chips)

MDGRAPE-3boards

HostComputer(200 CPUs, 328 Cores)

PC100

PC81

Infin

iban

dS

witc

h

PC80

PC61

Infin

iban

dS

witc

h

PC60

PC41

Infin

iban

dS

witc

h

PC40

PC21

Infin

iban

dS

witc

h

PC20

PC1

Infin

iban

dS

witc

h

Infin

iban

dS

witc

h

Four IB 4x40Gbps

MDGRAPE-3 system

Gordon Bell Prize 2006 :Gordon Bell Prize 2006 :185 TFLOPS simulations of Amyloi185 TFLOPS simulations of Amyloid-Forming Peptides from Yeast Supd-Forming Peptides from Yeast Sup35 Proteins35 Proteins

Sustained Performance ofParallel System Gordon Bell 2006 Honorable M

ention, Peak Performance Amyloid forming process of Ye

ast Sup 35 peptides Systems with 17 million atoms Cutoff simulations

(Rcut = 45 Å) Nominal peak : 860 Tflops Running speed : 370 Tflops Sustained performance: 185 Tf

lops Efficiency ~ 45 %

Gordon Bell Prize 2007 Finalist

281 TFLOPS for X-ray protein structure analysis GA-DS (Genetic Algorithm- Direct Space)

method Discrete Fourier Transformation by MDGRAPE-3

Peak Performance: 303 TFLOPS Efficiency ~ 93%

What can be done by large-scale molecular dynamics simulations? Predictions of

Protein-Protein interactions Protein-Ligand interactions (screening) Protein-DNA/RNA interactions Effect of SNPs Effect of chemical modifications

Molecular mechanism of proteins in atomic level Protein Folding – dynamics of folding pathways

We provide methods to bridge knowledge on molecular structures and functions

Ligand screening using MD

High-precision estimation of binding free energy of protein-ligand complex with MD

1000 poses, 200 ligands/day Suitable for Lead Optimization

Evaluate affinities before chemical synthesis

Currently we mainly focus on MMPBSA which is suitable for large-scale applications

N. Okimoto et al., in preparation.

Combining Molecular Docking and Molecular Dynamics Simulations

Combination of molecular docking and high-speed MD simulations improve these drawbacks.

Molecular docking techniques Advantages: they explore the vast conformational space of ligand

s in a short time. Drawbacks: predictions of binding free energies are less reliable.

Molecular Dynamics (MD) simulation techniques Advantages: they can treat both ligand and protein in a flexible way, c

onsider the solvation effect, and predict accurate binding free energies. Drawbacks: they are time-consuming and the system are often trappe

d in local minima.

High-performance special purpose computer, MDGRAPE-3 MDGRAPE-3, enable high-speed and high-accuracy MD simulations.

Our Screening Approach

Library of Chemical Compounds

(6,000,000)Enriched Library Target Protein

X-ray, NMR, Modeling

Successfully Docked Compounds

Lead Compound

Lead Optimization

Drug Candidates Experimentaltesting

Drug

Molecular Docking

MD Simulations Using MDGRAPE-3MDGRAPE-3

Drug-like compounds

De Novo Design

Pre-filtering•Lipinski’s rule•Similarity analysis•Pharmacophore analysis

Test of Our Approach

Lead Discovery

Speed of Screening

Multiconformer MD plus MM-PBSA+PCA calculations

1200 poses/128 CPUs/a day ~ 300 ligands/128 CPUs/ a day

Singleconformer MD plus MM-PBSA+PCA calculations

1200 ligands/128 CPUs/a day

ＡMD simulation 2.5 hours

MM-PBSA 1.25 hours PCA 1.25 hours

Core 1

Core 2

1CPU

2.5 hours/1pose/1 CPUs

Screening of 10,000 compounds•Screening of 10,000 compounds by molecular docking : 1 days•Screening of top 1,000 compounds by MD (5,000 poses): 4-5 days

Screening Performance: Trypsin- Enrichment Curves - (goldscore/ligprep)

0102030405060708090

100

0 10 20 30 40 50 60 70 80 90 100

GOLD(GOLDscore)G01G06G09G12random

% of Top 1,000 compounds

% o

f ac

tive

com

poun

ds

3 times5 times

Other Applications

Protein/Peptide Designs Transcription Factors Growth Factors Fusion Proteins Enzymes

Large-Scale Calculations Virus Capsid Proteins Molecular Motors Ribosome Membranes

Next Generation: MDGRAPE-4

Tile Processor with special-purpose tile 200 General-purpose Processor Tiles

0.5TFlops (DP), 1TFlops (SP) 20 Special-Purpose Tiles for Molecular Dynamics

0.7TFlops Fast on-chip network with 200~300GB/s router

bandwidth 5 Memory Access Tiles

160~320GB/s 45nm <<100W peak power with Duraid Madeina, Univ. Tokyo

R R R R

R R R R

R R R R

R R R R

RR RR

R

R

R

R

R

ProcessorTiles

Memory-AccessTiles

Special-PurposeTiles

R R R RR

R

R

R

R

R

R

SynchronizedGroups

Block Diagram of MDGRAPE-4 Processor

Acknowledgements

Dr. Yousuke OhnoDr. Atsushi SuenagaDr. Noriyuki FutatsugiDr. Gentaro MorimotoMs. Hiroko KondoMs. Ryoko Yanai

Dr. Tetsu Narumi(Keio Univ.)

Mr. Duraid Madeina(U. Tokyo)

Ministry of Education, Culture, Sports, Science & Technology Intel Corporation for early processor support Japan SGI for system integration

molecular dynamics simulations of proteins with petascale special-purpose computer mdgrape-3 makoto...

Documents