performance tuning of lammps dissipative particle dynamics...

25
Performance Tuning of LAMMPS Dissipative Particle Dynamics Simulation on Intel MIC Department of High Performance Computing, CNIC, CAS Center of Scientific Computing Applications & Research, CAS Shun Xu, Zhong Jin 2018.5.11 Intel® Parallel Computing Centers (IPCC) Asia Summit 2018

Upload: others

Post on 10-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Performance Tuning of LAMMPS Dissipative Particle Dynamics

Simulation on Intel MIC

Department of High Performance Computing, CNIC, CASCenter of Scientific Computing Applications & Research, CAS

Shun Xu, Zhong Jin

2018.5.11

Intel® Parallel Computing Centers (IPCC) Asia Summit 2018

Page 2: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Outline

1. Introduction to Dissipative Particle Dynamics(DPD) simulations

2. Performance tuning of LAMMPS DPD

3. Conclusions

Page 3: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Particles i and j interactions in cutoff range:

Relative displacement:

Relative velocity�

Normalized Rij�

Strength coefficient �

mid 2ridt2

= Fi = f C rij( )+ f D rij,vij( )+ f R rij( )!" #$j≠i∑

vij = vj − vi

Conservative Dissipative Random

Introduction to Dissipative Particle Dynamics(DPD)

r̂ij =rijrij

rcrij = rj − ri

f C rij( ) = aijWC rij( ) r̂ijf D rij,vij( ) = −γ ijWD rij( ) r̂ij ⋅ vij( ) r̂ij

f R rij( ) =σ ijWR rij( )ζ ijδtr̂ij

aij

WD rij( ) ≡W 2R rij( )

WR rij( ) =W sC rij( ) = 1−

rijrc

"

#$$

%

&''

s

σ ij2 = 2γ ijkBT

ζ ij =ζ ji ∈ Ν 0,1( )

Repulsive F

Attractive F

Attractive F

The dissipative force

and random force

exists constraints to

satisfy the

Boltzmann weight

distribution.

For any s>=1

Three pairs of forces: the first is a conservative force, the second is the dissipative force, the

third is the random force

Page 4: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

A case of LAMMPS DPD simulation of the two phases separation

N=32000Temp=0.25Density=3pair_coeff 1 1 25.0 2.5pair_coeff 1 2 30.0 2.5pair_coeff 2 2 25.0 2.5

�Under LAMMPS without Intel MIC accelerated�pair_coeff i j aij γij [rc ]

Page 5: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

The scientific significances in accelerated DPD simulations• To solve the problem of soft matter field of DPD simulation

usually have difficulty in the calculation of a large number of particles

• To observe (macroscopic) properties of DPD system for a long time

• To connect DPD into the multiscale simulations much smoothly

A review paper in 2010�DISSIPATIVE PARTICLE DYNAMICS IN SOFT MATTER AND POLYMERIC APPLICATIONS - A REVIEW; E. Moeendarbary, T.Y. Ng, M. Zangeneh, Int. J. Appl. Mech. 2 (2010) 161–190

mentioned that "The DPD is one the most reliable mesoscopic simulation techniques for phenomenological investigation of soft matter and polymeric systems.”

[1] P.J. Hoogerbrugge, J.M.V.A. Koelman, Europhys. Lett. 19 (3) (1992) 155–160. [2] P. Español, P.B. Warren, Europhys. Lett. 30 (4) (1995) 191.[3] R.D. Groot, P.B. Warren, J. Chem. Phys. 107 (11) (1997) 4423–4435.

Page 6: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Outline

1. Introduction to Dissipative Particle Dynamics(DPD) simulations

2. Performance tuning of LAMMPS DPD

3. Conclusions

Page 7: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Intel Xeon Phi Accelerating MD

• LAMMPSProduct level MD software

• MiniMDLightweight version for performance testing

SIMD optimized

1. LAMMPS Intel Xeon (phi) USER_INTEL package (Intel KNC/KNL)2. NAMD 2015-12-22 Linux-x86_64-multicore-MIC (Intel Xeon Phi coprocessor acceleration)3. GROMACS 5.0-RC with Intel Xeon Phi coprocessor native/symmetric support (plan for support Offload

mode)

Page 8: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Integrate DPD code into LAMMPS USER-INTEL package

Page 9: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

About USER-INTEL packageUSER-INTEL LAMMPS plug-in package of the framework code maintained by W. Michael Brown from INTEL and Kurpad Anupama, mainly based on the previously developed USER-OMP plug-in package:Main features: • support for three kinds of precisions: single, double and mixed• key function in vector optimization• support Intel Xeon Phi KNL and KNC in offload mode

Using suffix by order�intel, ompTurn on offload by defining macro variable -DLMP_INTEL_OFFLOAD Supporting thread affinity setting by defining -DINTEL_OFFLOAD_NOAFFINITY

Compile: locate LAMMPS src directory, then make yes-USER-INTEL && make intel_phi

fix_intel.cpp/.h; basic function for MIC interactionsintel_buffers.cpp/.h; Buffer management between HOST and MIC device (modified)Intel_intrinsics.h; routines for AVX-512 and AVX2verlet_intel.cpp/.h; verlet integration on Intel pair_xxx_intel.cpp/.h; xxx potential on MIC (KNC or KNL)

Several corecode files

Page 10: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

LAMMPS simulation in KNC offload mode

CPU

MIC

input.in output.log

neighbor list short-range terms

update F, v, x

MPI task rank=0 MPI task rank=n

MPI

task

utiliz

es se

vera

l MIC

thre

ads

for o

ffloa

d tas

ks

each subdomain maps to MPI task

...

Each CPU MPI task can launch several MIC threads to calculations

Page 11: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

LAMMPS in MPI-OpenMP vs. MIC-offload mode

OpenMP threads portioned between CPU and MIC devices

MPI taskrank=0

MPI taskrank=n

Thread0

Threadm

2, Advanced MPI + Host & MIC offload OpenMP threads

1, Normal MPI + Host OpenMP threads

Page 12: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Initial setting in PairDPDIntel class

void PairDPDIntel::init_style(){//…

int ifix = modify->find_fix("package_intel");if (fix->precision() == FixIntel::PREC_MODE_MIXED)

pack_force_const(force_const_single, fix->get_mixed_buffers());else if (fix->precision() == FixIntel::PREC_MODE_DOUBLE)

pack_force_const(force_const_double, fix->get_double_buffers());else

pack_force_const(force_const_single, fix->get_single_buffers());}

At the beginning of calculation, PairDPDIntel calls init_style()�to get buffer variable of IntelBuffers<flt_t, acc_t> buffer�

buffer created in different precisions.

Page 13: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

PairDPDIntel ::compute<flt_t,acc_t> function

if (eflag) {if (force->newton_pair) {

eval<1, 1, 1>(1, ovflag, buffers, fc, 0, offload_end);eval<1, 1, 1>(0, ovflag, buffers, fc, host_start, inum);

} else {eval<1, 1, 0>(1, ovflag, buffers, fc, 0, offload_end);eval<1, 1, 0>(0, ovflag, buffers, fc, host_start, inum);

}} else {

if (force->newton_pair) {eval<1, 0, 1>(1, ovflag, buffers, fc, 0, offload_end);eval<1, 0, 1>(0, ovflag, buffers, fc, host_start, inum);

} else {eval<1, 0, 0>(1, ovflag, buffers, fc, 0, offload_end);eval<1, 0, 0>(0, ovflag, buffers, fc, host_start, inum);

}}

For MIC offload

For Host CPU

Load balance setting:

Page 14: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Vectorization calculation

• Vector calculation in both sides�1. MIC thread: vector bit width 512 bits.2. Host CPU: AVX bit width 512 bits

• data alignment optimization1. The 64 bit variable memory space alignment2. The data structure of atom combined 4 floating-point numbers, which

space size in bytes can be divided by 5123. Using advanced SIMD directive, such as #pragma SIMD reduction (.) 4. Using mixed precision (both of single and double float) trade-off

between speed and accuracy.

Page 15: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

PairDPDIntel::eval()

Highlight to random number usage in DPD potential calculation:

Page 16: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

PairDPDIntel::eval()

Highlight to simd reduction in DPD potential calculation:

Page 17: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Integration of source codes for KNC and KNLpair_dpd_offload_intel.cpppair_dpd_offload_intel.h

pair_dpd_intel.cpppair_dpd_intel.h

• Use macro variable: LMP_INTEL_OFFLOADto separate KNC and KNL/CPU codes

• Use thread parallelism

Page 18: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Architecture x86_64

CPU op-mode(s) 32-bit, 64-bit

Byte Order Little EndianCPU(s) 256On-line CPU(s) list 0-255Thread(s) per core 4Core(s) per socket 64Socket(s) 1NUMA node(s) 1

Vendor ID GenuineIntel

CPU family 6Model 87

Model name Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz

Stepping 1CPU MHz 1182.289BogoMIPS 2599.92Virtualization VT-xL1d cache 32KL1i cache 32KL2 cache 1024KNUMA node0 CPU(s) 0-255

Intel® Xeon Phi™ Processor 7210

Intel KNL for test

All 16GB MCDRAM used as cache memory

Page 19: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

OPTFLAGS = -xMIC-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limitsCCFLAGS = -qopenmp -qno-offload -fno-alias -ansi-alias -restrict \

-DLMP_INTEL_USELRT $(OPTFLAGS)

CC = mpiicpc

OPTFLAGS = -xMIC-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limitsCCFLAGS = -qopenmp -qno-offload -fno-alias -ansi-alias -restrict \

-DLMP_INTEL_USELRT -DLMP_USE_MKL_RNG $(OPTFLAGS)

OPTFLAGS = -O2 -fp-model fast=2 -no-prec-div -qoverride-limitsCCFLAGS = -qopenmp -qno-offload -fno-alias -ansi-alias -restrict \

-DLMP_INTEL_USELRT $(OPTFLAGS)

KNL�

KNL_AVX512�

KNL_AVX512_MKL�

export OMP_NUM_THREADS=$threadsmpirun -np 64 lmp_intel_knl -in in.intel.dpd -log dpd.64c4t.log \

-pk intel 0 -sf intel -screen none -v d 1

Run for 1, 2, 4 threads per core

Page 20: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

0

10

20

30

40

50

60

70

80

90

1 2 4

Tim

este

ps/s

ec.

(64 cores with) N threads/core

LAMMPS DPD on Intel KNL, 512000 atoms * 4000 stepsKNL KNL_AVX512 KNL_AVX512_MKL

2.89X

1.61X

1 X

Page 21: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

KNL�

KNL_AVX512_MKL�

Intel® Trace Analyzer for MPI behavior

Page 22: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Outline

1. Introduction to Dissipative Particle Dynamics(DPD) simulations

2. Performance tuning of LAMMPS DPD

3. Conclusions

Page 23: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Conclusions

• LAMMPS DPD optimization for Intel platform is highlighted.• To promote the applications of LAMMPS DPD.

Page 24: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Acknowledgements

• LAMMPS DPD module inside USER-INTEL package is initially developed from the CAS-IPCC project.

• To the support of W. Michael Brown from Intel.

Page 25: Performance Tuning of LAMMPS Dissipative Particle Dynamics …itoc.sjtu.edu.cn/wp-content/uploads/2018/05/IPCC_Asia2018_summit… · Performance Tuning of LAMMPS Dissipative Particle

Thank you for your attention!