navid azizi, ian kuon, aaron egier, ahmad darabiha and

37
FCCM 2004 Reconfigurable Molecular Dynamics Simulator Navid Azizi, Ian Kuon, Aaron Egier, Ahmad Darabiha and Paul Chow University of Toronto

Upload: others

Post on 21-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Navid Azizi, Ian Kuon, Aaron Egier, Ahmad Darabiha and Paul ChowUniversity of Toronto

2

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Why is Molecular Dynamics interesting? Simulates interaction of atoms over time

Many possible applications– Biomolecules

Computationally intensive to handle >1000 atoms

Large computer clusters used in the past

Can a reconfigurable simulation system do this better?

3

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

What is Molecular Dynamics?

Simulate using classical Newtonian mechanicsF = m a

Integrate acceleration to get position and velocity changes

Use a very small timestep ~ 1 femtosecond

4

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Molecular Dynamics Background

Simulation Procedure per timestep– Sum force over all interacting atoms

– Calculate acceleration

– Integrate acceleration to update atom position and velocity

– Repeat for all atoms

5

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Background - Forces

Two types of forces–Bonded – O(n)

–No hardware acceleration required–Non Bonded – O(n2)

–Needs hardware acceleration

6

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Background – Force Calculation

-2.5E-11

0.0E+00

2.5E-11

5.0E-11

7.5E-11

1.0E-10

Distance

Pote

ntia

l

Lennard-Jones (LJ) potential models interaction

Force on a atom is the gradient of potential

( )⎥⎥⎦

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛−⎟

⎠⎞

⎜⎝⎛=

612

4rr

rLJσσεφ

7

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Background – Simulating Large VolumesAny interesting volume has far too many atoms to simulate

Solution – Periodic Boundary Conditions

ReplicatedBox

ReplicatedBox

ReplicatedBox

ReplicatedBox

ReplicatedBox

ReplicatedBox

ReplicatedBox

ReplicatedBox

Box beingsimulated

BA

B’A’

B’A’

B’A’

B’A’

B’A’

B’A’

B’A’

B’A’

8

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Background – Time Integration

Position and Velocity updated through time integration–Velocity Verlet Update rule used

( ) ( )

( ) ( ) ( ) ( )

( ) ( )tattvttv

tattvttrttr

tatttvtv

22

2

222

δδ

δδδ

δδ

+=⎟⎠⎞

⎜⎝⎛ +

+×+=+

+⎟⎠⎞

⎜⎝⎛ −=

9

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – System Overview

Composed of 6 blocks–Pair Generator (PG)

–Lennard-Jones Force Computer (LJFC)

–Acceleration Update (AU)

–Verlet Update (VU)

–System Controller (SC)

–Memory Controllers

For each timestep–PG generates a pair of particles

–LJFC computes force between particles

–AU accumulates the forces

–VU updates position and velocity

PairGen

VerletUpdate

ForceComputer

AccelerationUpdate

SunWorkstation

SystemControl

MemoryMemory

Controller

FunctionValue

Memory

MemoryController

SlopeMemory

MemoryController

10

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – System Controller

Coordinates the activities of the different blocks during the simulation

Determines the end of the timestep and signals start of new timestep

–Signals memory controllers to switch position banks to eliminate read after write hazards (more on this later)

Serves as an Interface between the MD hardware system and the software modules on the Sun Workstation

–The Sun Workstation may signal the FPGA that it wants to read the memory contents

–After the completion of a complete timestep, the SC pauses the simulation

SystemControl

11

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – Pair GeneratorCalculates distance between all atom pairs

Must consider Periodic Boundary Conditions

0001-11110010

1111-00011110

Smallest distance = 0010

rx=1111rx=0001PairGen

ParticleMemory

Forc

eC

ompu

ter

12

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – LJ Force Calculator

PairG

enForce

Computer

AccelerationUpdate

FunctionValue

Memory

MemoryController

SlopeMemory

MemoryController

Uses r2, obtained from PG, to look up value of the force and the slope of the force derivative

Performs an interpolation to obtain a more accurate force magnitude

Uses component distances, rx, ry, rz, and force magnitude to calculate acceleration vector

Sends the acceleration vector to the AU

13

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – LJ LookupUse part of r2 as index to into look up table

Look up table has a 19-bit address, but r2 is larger than 19-bits–To choose which bits of to use, must look at characteristic of the LJ function

Characteristic of LJ Function–Changes rapidly near distances of 0

–At large distances the change in force is very little

-1.00E+09

0.00E+00

1.00E+09

2.00E+09

3.00E+09

4.00E+09

5.00E+09

6.00E+09

0 5E-19 1E-18 1.5E-18 2E-18

Separation2

Psue

do-A

ccel

erat

ion

14

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – LJ LookupUse lower-order bits to interpolate

Use the higher-order bits as a cutoff–Since at very large distances, force is nearly 0

Use middle-order bits to lookup force value and slope

r2

FF...FF

low bits

middlebits

highbits

Residual forinterpolation

Address tolookup tables

15

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – LJ Force CalculatorMultiply Slope by lower-order bitsAdd to function valueMultiply by component distances to obtain pseudo-acceleration

–Look up table values contain multiplication by time to simplify downstream calculations

PG_RAD_X

PG_RAD_Y

PG_RAD_Z

PG_RADIUS_MAG_SQ

FF...FF

AU_ACC_Z

AU_ACC_Y

AU_ACC_X

LJMC_SLOPE

LJMC_RADIUS_MAG_SQ_

SLOPE

LJMC_VALUE

LJMC_RADIUS_MAG_SQ_

VALUE

Input 1 Stage 1 Stage 2 Stage 3 OutputInput 2

16

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – Acceleration Update

VerletU

pdate

ForceComputer

AccelerationUpdate

Mem

ory Controller

Receives incremental pseudo-accelerations from LJFC and to obtain total pseudo-acceleration

When pseudo-acceleration is complete, send data to

–Memory Controller to store in Memory

–Verlet Update to calculate new velocity, position

17

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – Velocity Verlet BlockNot performance critical

No pipelining in current implementation

p(t)

v(t+½dt)

p(t+dt)

v(t-½dt)

a(t)

dt

dt

dt

dt2

18

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Architectural Design – Memory Controllers

Memory controllers are interfaces between off-chip SRAMs and the rest of the MD system.

Memory access is a bottleneck in the current system.

Two types of memory controllers:– Memory Control (MC)

to access position, velocity and acceleration arrays in the primary SRAM.

– Lennard-Jones Memory Control (LJMC)to access lookup tables containing value and slope of Lennard-Jones

function.

19

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Memory Controllers - MCMC receives requests from PG, VU and AU blocks

– SRAMs on TM3 are single-ported, 64-bit wide– A priority scheme arbitrates between simultaneous memory access

requests

Each system-level Read/Write through MC consists of three 64-bit Read/Writes for X, Y and Z directions in series.

7-clock cycle system-level read6-clock cycle system-level write

Block Name Read/Write Target Array

PG R Position

VU R & W Position & Velocity

AU W Acceleration

20

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Memory Controllers - LJMC

Two instantiations of LJMC used for Lennard-Jones Force Calculation: – To access Lennard-Jones function value lookup table

– To access Lennard-Jones slope lookup table

Only one source of request for memory access, no arbitration is required.

Only read requests

Each system-level read from LJMC is one single read and takes 4 clock cycles.

21

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Precision and Scaling Factors

Architecture uses integer operations to reduce complexity

Precision: number of bits used to represent a valueScaling Factor: the weight of the least significant bit of the value

Precision

ScalingFactor

22

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Precision and Scaling Factors – cont. To choose appropriate scaling values & precision

–Calculations made with particles at varying distances

–Found absolute minimum and absolute maximum for different values along our datapath

–Scaling Factor was found by taking log2 of the minimum value

–Precision was found by taking log2 of the difference between minimum and maximum

–Extra bits were placed to the upper and lower portions of most values

Quantity Scaling Factor Precision

Position 2-64 38

Velocity 2-15 51

Acceleration 2-64 37

23

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Simulation Environment is ConfigurableSimulation reconfigurability

– Change precision, scaling factors, number of atoms. forces

– No wasted hardware

– No time overhead when precision is reduced

Entire process is automated– One input file controls entire process

– Hardware – C program creates appropriate VHDL

– Software interface, Software initialization– Always match the hardware

24

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

ImplementationUsed the Transmogrifier 3

–4 interconnected Virtex-E 2000’s

–2MB memories connected to each Virtex-E

–Slow by today’s standards

25

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

VerificationTested accuracy of implementation

–Compared TM3 results with software

0

500

1000

1500

2000

2500

0 100 200 300 400 500 600 700

Timestep

Ener

gy (k

J/m

ol)

TM3 -Total EnergySoftware - Total EnergyTM3 - Kinetic EnergySoftware - Kinetic EnergyTM3 - Potential EnergySoftware - Potential Energy

26

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

System Performance

For a 8192 atom MD system running on the TM3

–Frequency: 26 MHz–Timestep Duration: 37 sec

For a 8192 atom software system running on a 2.4GHz Pentium 4

–Timestep Duration: 10.8 sec

MD system is 3.4X slower than software

27

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

How to improve this?Memory

New FPGA

Parallelism

28

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Memory Requirements of MD System

Acceleration Array (8192 atom system uses 0.17 MB)

Velocity Array (8192 atom system uses 0.17 MB)

Position Array (8192 atom system uses 0.34 MB)

Lookup Tables: 2 MB

29

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Improving Performance – Memory Organization On TM3 there is only one external SRAM per FPGA

Single SRAM for all atom information causes large slowdowns

– Handshaking

– Serial reads for x, y and z

– Hardware issues

Better memory system

2.1 seconds/timestep (5X faster than software)

30

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Improving Performance – Memory Organization On TM3 there is only one external SRAM per FPGA

– 1 SRAM used to store positions, velocities, accelerations– 1 SRAM used to store function lookup table– 1 SRAM used to store slope lookup table

The SRAM used to store positions, velocities, accelerations caused large slowdowns

– Factor of 3 slowdown due to handshaking that is needed when arbitrating accesses to a single memory array, which semantically are different arrays

– Factor of 3 slowdown because the width of the data bus was too small, so the x, y, and z components of position/velocity/acceleration were read serially

– Factor of 2 slowdown due to wait states required by the memory system (a TM3 issue) between each access

Having a better memory systemAllows for a speedup of 18

2.1 second timestep (5X faster than software)

Not increasing the amount of memory

31

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Improving Performance – Clock Speed

Run on modern FPGA

All possible improvements for clock speed not explored

Expect a factor of 4 increase to a 100MHz

Better memory system + Faster Clock Speed

0.51 seconds/timestep (21X faster than software)

32

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Improving Performance – Parallel Architecture

Better memory system + Faster Clock Speed + Parallelize

0.51/n seconds/timestep (21n X faster than software)

PairGen

AU(1)AA

AU(0)AA

AU(n-1)AA

VU(1)VA

VU(0)VA

VU(n-1)VA

PAi

PA

iP

AiPA

jPA

jPA

j

LJFC(0)

LJFC(1)

LJFC(n-1)

Slope Mem

Value Mem

Value Mem

Slope Mem

Value Mem

Slope Mem

33

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Improving Performance – Parallel ArchitectureMultiple MD engines are placed in parallel

Logic Requirements–LJFC, AU, VU do not need to be changed

–For each pipeline, one LJFC, AU, VU fit on one FPGA

–PG–More sophisticated to send different pairs of particles to different pipelines

Memory Requirements–3 additional 512K x 32 memory chips are needed for each pipeline

–No increase in memory used to store position, velocity or acceleration as they are distributed throughout the system

PairGen

LJFC(0)

LJFC(n-1)

LJFC(1)

LJFC(k)

AU(1)

AA

AU(0)

AA

AU(k)

AU(n-1)

AA

AA

VU(1)

VA

VU(0)

VA

VU(k)

VU(n-1)

VA

VA

PAi

PAiPAi

PAi

PAj

PAj

PAj

PAj

Slope Mem

Value Mem

Value Mem

Slope Mem

Value Mem

Slope Mem

Value Mem

Slope Mem

Better memory system + Faster Clock Speed + Parallelize

0.51/n second timestep (21n X faster than software)

34

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Cost, Power Benefits

Performance– MD system can deliver a 21X performance benefit over

software

– Assume a conservative 10X performance advantage

Cost– Microprocessor motherboard-sized board can fit 4

FPGAs

– 4 FPGAs (each $200) + board + SRAM + misc. ~ $1500

– Microprocessor Motherboard + CPU + DRAM ~ $1500

35

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Comparison of MD Simulator and Supercomputers

Per Board 1 Pentium + 1 GB DRAM

4 FPGAs + 24 MB SRAM

Performance 1X 40XCost ~$1500 ~$1500Power 106W 40W

Metric Improvement of MD System

Performance/Power 100X Improvement

Performance/Cost 40X ImprovementPerformance/Space 40X Improvement

36

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

ConclusionsEasily reconfigurable MD System designed

Molecular dynamics simulation can be done on FPGAs

Simple enhancements will improve speed– Power, Cost and Space savings over software

37

FCCM 2004

Reconfigurable Molecular Dynamics Simulator

Future WorkImprove accuracy

Target newer FPGA platform

Support new forces

Funding for the TM3 Project was provided by Micronet and Xilinx

Acknowledgements