research on high performance computing systems...performance of npb bt class b omni openmp:...
TRANSCRIPT
Center for Computational Sciences, University of Tsukuba
http://www.ccs.tsukuba.ac.jp/
OmniRPC: A Grid RPC System for Parallel ComputingOmniRPC is a Grid RPC system for parallel programming and is designed as Thread-saf e RPC in order to allow client programs to use in OpenMP.Support of master-workers programming model for parametric search grid applications. Integrating computing resources on various execution platform:
XtremWeb, Condor, CyberGRIP, Grid Engine.Clusters with private IP address and NATs.
Applications: CONFLEX-G: Grid-enabled Molecular Conformational Space Sarch Program HMCS-G: Grid-enabled Hybrid Computing System for Computational Astrophysics. Master-Worker parallel program for large eigen value program (Prof. Sakurai, U. Tsukuba) PAML-G: Phylogenetic Analysis by Maximum Likelihood on the grid. Used as a backend for YML (Prof. Petiton, LIFL, France)Webpage: http://www.omni.hpcc.jp/OmniRPC/
0 2 4 6 8 100
500
1000
1500
2000
2500
3000
3500
MPI version SCASH-MPI(prefetch)SCASH-MPI
Per
form
ance
[Mop
/s]
Number of Nodes
Performance of NPB BT Class B
Omni OpenMP: Cluster-enabled OpenMP by SCASH-MPI
FFTE: A High-Performance FFT Library
Performance of Parallel 3D FFTs
321684210
1000
2000
3000
4000
5000
6000FFTW 2.1.5 (dual)FFTW 2.1.5 (single)FFTE 3.2 (dual)FFTE 3.2 (single)
Per
form
ance
[MF
LOP
S]
Number of CPUs
Client PC
PC Cluster
OmniRPCAgent
OmniRPCAgent
Volunteer PCsOmniRPCAgent
OmniRPCAgent
CyberGRIPGRM
CyberGRIPSRMD
XtremWebDispatcher
OmniRPCClient Program
OmniRPCStub Program
C ondor Pool
The Omni OpenMP compiler enables an OpenMP program to run on a PC-cluster with MPI enabled.
The compiler can generate parallel codes for a software DSM system
SCASH-MPI is a software distributed shared memory system (DSM) using MPI as a communication layer of SCASH DSM.
The original SCASH is a DSM in the SCore cluster software.A thread is created to handle remote memory access request by MPI.By using MPI as a communication layer, one can exploit a high-performance network of clusters with high portability.History-based prefetching for optimizing communication.
PerformanceBenchmark : NPB2.3 BT (parallelized with OpenMP)Environment : Dual Opteron 1.8GHz * 8nodes + Gigabit Ethernet
FFTE is a Fortran subroutine library for computing the Fast Fourier Transform (FFT) in one or more dimensions.The API of FFTE is similar to sequential SGI SCSL or Intel MKL routines.Features:
High speed. (Supports SSE2/SSE3)Complex and mixed-radix transforms.Paralle l transforms : Shared / Distribute d memory parallel computers (OpenMP, MPI and OpenMP+MPI)High portability: Fortran77 + OpenMP + MPIFFTE’s 1-D parallel FFT routine has been incorporated into the HPC Challenge (HPCC) benchmark.
Performance Data: N1xN2xN3 = 2^24xP Machines: Dual Xeon 2.8GHz + Myrinet2000
Research on High Performance Computing Systems