Download - Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Computational physiCs

Shuai Dong

High-performance computing

• PC, cluster, supercomputer• Parallelism by OpenMP• Parallelism by MPI• GPU programming• Numerical libraries

PC: Personal Computer

• Usually 1 CPU (Central processing unit) per computer

• Even though, it is already very powerful.

x86(x86-64)-compatible microprocessors: the most used CPUs in PCs.

PentiumCore i3 i5, i7

Athlon, PhenomA4, A6, A10, FX

workstation - powerfultower workstation

4U workstation

2U workstation 1U workstation

U= Rack unit

1U rack • Usually more than 1 CPU per node.

• Intel: Xeon• AMD: Opteron,

EPYC

Non-x86/x86-64 compatible CPUs

• Intel: ItaniumIntel architecture 64

• IBM: Power(Performance Optimization With Enhanced RISC)

总参56所: ShenweiRISC architechture

• IOCT: CAS: LoongsonMIPS architechture

• Nvidia: TegraARM architecture

Cluster • More than 1 computer (node)

• Connect by network• Work together.

More Powerful than PCCheaper than supercomputer

Can be small:Blades

Transformers

• The philosophy of PC cluster.

Devastator挖地虎----> 大力神

Supercomputer

More CpusFaster connection

very expensive

Parallelism

Big problem? needs long CPU-time?The solution:• Simplify the problem!• Use a faster CPU!• Use more than 1 CPU! ---> Parallelism

Proverb: Many hands make light work!!!

Code example

• for(int i=0;i<1000000;i++)• {• a[i]=i;• }

If you only have one cpu:The process is:1. a[0]=0;2. a[1]=1;3. a[2]=2;........

If you have two cpus:You can divide the task to two cpus:1. a[0]=0; a[1]=1;2. a[2]=2; a[3]=3;3. a[4]=4; a[5]=5;........ (even) (odd)

Save half time!

automatically parallelized by compilers

One process, several threads

Example: 9.1.openmp.cpp#include <iostream>#include <unistd.h>using namespace std;

int main(){

const int n=8;

#pragma omp parallel forfor(int i=0;i<n;i++){

cout<<i;sleep(1);

}cout<<endl;return 0;

}

g++ 9.1.openmp.cpptime ./a.outg++ -fopenmp 9.1.openmp.cpptime ./a.out

export OMP_NUM_THREADS=2

9.2.openmp2.cpp

MPI-Message Passing Interface

• MPI is a language-independent communications protocol used to program parallel computers

Several software implementations:Open MPIMPICH/MPICH2LAM/MPIIntel MPIMicrosoft MPI

code example: 9.3.mpi.cpp#include <iostream>#include <mpi.h>using namespace std;

int main(int argc, char *argv[]){

MPI_Init(&argc,&argv);int mpi_procs,mpi_rank;MPI_Comm_size(MPI_COMM_WORLD,&mpi_procs);MPI_Comm_rank(MPI_COMM_WORLD,&mpi_rank);

cout<<"It is the "<<mpi_rank<<" of total "<<mpi_procs<<endl;

MPI_Finalize();return 0;

}

Compile and run a MPI program

• Install one MPI implementation, e.g. OpenMPI

• mpic++ 9.3.mpi.cpp• mpirun -np 10 a.out

MPI for molecular dynamics

• If you have N degrees of freedom during the molecular dynamics simulation, you can use m processors to share the task.

• e.g. 100 atoms in a 3D box, 300 degrees of freedom. We can use 20 cpus together, each of which deals with 5 atoms.

Darts method powered by MPI

Circle: pr2/4=p/4Square: r2=1

Darts in Circle p------------------= --Total Darts 4

MPI Darts

Maybe, it is the earliest MPI darts invented by Mr. Liang Zhuge

The modern MPI dartsKatyusha, Soviet Union

MPI Darts

• 9.4.mpidarts.cpp

• Another method:• 9.5.mpidarts2.cpp

The working mechansim of MPI

• Make n copy of program.• Each copy holds one process• All copies (processes) run parallel• Processes can communicate between each

other

To use GPU

• GPU: Graphic Processing Unit

Very powerfulfor molecular dynamics simulations

GPU Programming• The use of Graphics Processing Units for

rendering is well known, but their power for general parallel computation has only recently been explored.

• Parallel algorithms running on GPUs can often achieve up to 100x speedup over similar CPU algorithms, with many existing applications for physics simulations, signal processing, financial modeling, neural networks, and countless other fields.

Power of GPU

GPU computing language

• Open general-purpose GPU computing language: OpenCL (Open Computing Language)

• Proprietary framework: Nvidia's CUDA since 2006.

• CUDA (Compute Unified Devices Architecture)

• Learn by yourself

Numerical libraries• fftw: Fastest Fourier transfer in the West • blas: Basic Linear Algebra Subprograms• lapack: Linear Algebra Package• mkl: Math Kernel Libraries• acml: AMD Core Math Libraries• .......

some basic knowledge• source file main.cc func.cc• object file main.o func.o (func.obj)• dynamic library

libfunc.so (libfunc.dll)• static library

libfunc.a (libfunc.lib)• exectuable program• a.out (a.exe)• compile & link

code example: 9.6.lib.cpp

• double dabs(double d)• {

if(d<0) d=-d;return d;

• }

• g++ -c 9.6.lib.cpp• g++ -shared -fPIC 9.6.lib.o -o libabs.so• ar crv libabs.a 9.6.lib.o

code example: 9.6.link.cpp• #include <iostream> • using namespace std;• double dabs(double d);

• int main()• {

double a=0;cout<<"Please input the number:\t";cin>>a;cout<<"The absolute value is:\t"<<dabs(a) <<endl;return 0;

• }

g++ 9.6.link.cppg++ 9.6.link.cpp -L. -labsg++ 9.6.link.cpp libabs.aldd a.out

environment variableLD_LIBRARY_PATH

Using lapack• 9.7.Diagonalization.cpp

Call a Fortran function in C/C++ code.

extern "C"{ void dsyev_(char *jobz,char *uplo,int *n,double *a,int

*lda,double *w,double *work,int *lwork,int *info);}

g++ 9.6.Diagonalization.cpp -llapack

environment variableLD_LIBRARY_PATH

Tips

• Find the usage of lapack functions in the website: http://www.netlib.org/lapack/ or the manual of MKL.

• For symmetric/hermitian matrix, only upper/lower triangular matrix elements are used. But take care the difference between C/C++ and Fortran, especially for the hermitian matrix.

Read the manual

• http://www.netlib.org/lapack/explore-html/dd/d4c/dsyev_8f.html

• http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/dsyev.htm

Another example

• To use FFTW library to do FFT.

• http://www.fftw.org/

9.8.FFTW.cpp

Format of a paper

TitleAuthor

AffiliationDate

• Abstract• Keywords

• Main body (Introduction, method/algorithm, result and discussion, summary, acknowledgment, including figures and tables)

• ReferencesSupplementary: your_code

Download - Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Top Related