Computational physiCs
Shuai Dong
High-performance computing
• PC, cluster, supercomputer• Parallelism by OpenMP• Parallelism by MPI• GPU programming• Numerical libraries
PC: Personal Computer
• Usually 1 CPU (Central processing unit) per computer
• Even though, it is already very powerful.
x86(x86-64)-compatible microprocessors: the most used CPUs in PCs.
PentiumCore i3 i5, i7
Athlon, PhenomA4, A6, A10, FX
workstation - powerfultower workstation
4U workstation
2U workstation 1U workstation
U= Rack unit
1U rack • Usually more than 1 CPU per node.
• Intel: Xeon• AMD: Opteron,
EPYC
Non-x86/x86-64 compatible CPUs
• Intel: ItaniumIntel architecture 64
• IBM: Power(Performance Optimization With Enhanced RISC)
总参56所: ShenweiRISC architechture
• IOCT: CAS: LoongsonMIPS architechture
• Nvidia: TegraARM architecture
Cluster • More than 1 computer (node)
• Connect by network• Work together.
More Powerful than PCCheaper than supercomputer
Can be small:Blades
Transformers
• The philosophy of PC cluster.
Devastator挖地虎----> 大力神
Supercomputer
More CpusFaster connection
very expensive
Parallelism
Big problem? needs long CPU-time?The solution:• Simplify the problem!• Use a faster CPU!• Use more than 1 CPU! ---> Parallelism
Proverb: Many hands make light work!!!
Code example
• for(int i=0;i<1000000;i++)• {• a[i]=i;• }
If you only have one cpu:The process is:1. a[0]=0;2. a[1]=1;3. a[2]=2;........
If you have two cpus:You can divide the task to two cpus:1. a[0]=0; a[1]=1;2. a[2]=2; a[3]=3;3. a[4]=4; a[5]=5;........ (even) (odd)
Save half time!
automatically parallelized by compilers
One process, several threads
Example: 9.1.openmp.cpp#include <iostream>#include <unistd.h>using namespace std;
int main(){
const int n=8;
#pragma omp parallel forfor(int i=0;i<n;i++){
cout<<i;sleep(1);
}cout<<endl;return 0;
}
g++ 9.1.openmp.cpptime ./a.outg++ -fopenmp 9.1.openmp.cpptime ./a.out
export OMP_NUM_THREADS=2
9.2.openmp2.cpp
MPI-Message Passing Interface
• MPI is a language-independent communications protocol used to program parallel computers
Several software implementations:Open MPIMPICH/MPICH2LAM/MPIIntel MPIMicrosoft MPI
code example: 9.3.mpi.cpp#include <iostream>#include <mpi.h>using namespace std;
int main(int argc, char *argv[]){
MPI_Init(&argc,&argv);int mpi_procs,mpi_rank;MPI_Comm_size(MPI_COMM_WORLD,&mpi_procs);MPI_Comm_rank(MPI_COMM_WORLD,&mpi_rank);
cout<<"It is the "<<mpi_rank<<" of total "<<mpi_procs<<endl;
MPI_Finalize();return 0;
}
Compile and run a MPI program
• Install one MPI implementation, e.g. OpenMPI
• mpic++ 9.3.mpi.cpp• mpirun -np 10 a.out
MPI for molecular dynamics
• If you have N degrees of freedom during the molecular dynamics simulation, you can use m processors to share the task.
• e.g. 100 atoms in a 3D box, 300 degrees of freedom. We can use 20 cpus together, each of which deals with 5 atoms.
Darts method powered by MPI
Circle: pr2/4=p/4Square: r2=1
Darts in Circle p------------------= --Total Darts 4
MPI Darts
Maybe, it is the earliest MPI darts invented by Mr. Liang Zhuge
The modern MPI dartsKatyusha, Soviet Union
MPI Darts
• 9.4.mpidarts.cpp
• Another method:• 9.5.mpidarts2.cpp
The working mechansim of MPI
• Make n copy of program.• Each copy holds one process• All copies (processes) run parallel• Processes can communicate between each
other
To use GPU
• GPU: Graphic Processing Unit
Very powerfulfor molecular dynamics simulations
GPU Programming• The use of Graphics Processing Units for
rendering is well known, but their power for general parallel computation has only recently been explored.
• Parallel algorithms running on GPUs can often achieve up to 100x speedup over similar CPU algorithms, with many existing applications for physics simulations, signal processing, financial modeling, neural networks, and countless other fields.
Power of GPU
GPU computing language
• Open general-purpose GPU computing language: OpenCL (Open Computing Language)
• Proprietary framework: Nvidia's CUDA since 2006.
• CUDA (Compute Unified Devices Architecture)
• Learn by yourself
Numerical libraries• fftw: Fastest Fourier transfer in the West • blas: Basic Linear Algebra Subprograms• lapack: Linear Algebra Package• mkl: Math Kernel Libraries• acml: AMD Core Math Libraries• .......
some basic knowledge• source file main.cc func.cc• object file main.o func.o (func.obj)• dynamic library
libfunc.so (libfunc.dll)• static library
libfunc.a (libfunc.lib)• exectuable program• a.out (a.exe)• compile & link
code example: 9.6.lib.cpp
• double dabs(double d)• {
if(d<0) d=-d;return d;
• }
• g++ -c 9.6.lib.cpp• g++ -shared -fPIC 9.6.lib.o -o libabs.so• ar crv libabs.a 9.6.lib.o
code example: 9.6.link.cpp• #include <iostream> • using namespace std;• double dabs(double d);
• int main()• {
double a=0;cout<<"Please input the number:\t";cin>>a;cout<<"The absolute value is:\t"<<dabs(a) <<endl;return 0;
• }
g++ 9.6.link.cppg++ 9.6.link.cpp -L. -labsg++ 9.6.link.cpp libabs.aldd a.out
environment variableLD_LIBRARY_PATH
Using lapack• 9.7.Diagonalization.cpp
Call a Fortran function in C/C++ code.
extern "C"{ void dsyev_(char *jobz,char *uplo,int *n,double *a,int
*lda,double *w,double *work,int *lwork,int *info);}
g++ 9.6.Diagonalization.cpp -llapack
environment variableLD_LIBRARY_PATH
Tips
• Find the usage of lapack functions in the website: http://www.netlib.org/lapack/ or the manual of MKL.
• For symmetric/hermitian matrix, only upper/lower triangular matrix elements are used. But take care the difference between C/C++ and Fortran, especially for the hermitian matrix.
Read the manual
• http://www.netlib.org/lapack/explore-html/dd/d4c/dsyev_8f.html
• http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/dsyev.htm
Another example
• To use FFTW library to do FFT.
• http://www.fftw.org/
9.8.FFTW.cpp
Format of a paper
TitleAuthor
AffiliationDate
• Abstract• Keywords
• Main body (Introduction, method/algorithm, result and discussion, summary, acknowledgment, including figures and tables)
• ReferencesSupplementary: your_code