research on high performance computing systems...performance of npb bt class b omni openmp:...

Center for Computational Sciences, University of Tsukuba http://www.ccs.tsukuba.ac.jp/ OmniRPC: A Grid RPC System for Parallel Computing OmniRPC is a Grid RPC system for parallel programming and is designed as Thread-saf e RPC in order to allow client programs to use in OpenMP . Support of master-workers programming model for parametric search grid applications. Integrating computing resources on various execution platform: XtremWeb, Condor, CyberGRIP, Grid Engine. Clusters with private IP address and NATs. Applications: CONFLEX-G: Grid-enabled Molecular Conformational Space Sarch Program HMCS-G: Grid-enabled Hybrid Computing System for Computational Astrophysics. Master-Worker parallel program for large eigen value program (Prof. Sakurai, U. Tsukuba) PAML-G: Phylogenetic Analysis by Maximum Likelihood on the grid. Used as a backend for YML (Prof. Petiton, LIFL, France) Webpage: http://www.omni.hpcc.jp/OmniRPC/ 0 2 4 6 8 10 0 500 1000 1500 2000 2500 3000 3500 MPI version SCASH-MPI(prefetch) SCASH-MPI Performance [Mop/s] Number of Nodes Performance of NPB BT Class B Omni OpenMP: Cluster-enabled OpenMP by SCASH-MPI FFTE: A High-Performance FFT Library Performance of Parallel 3D FFTs 32 16 8 4 2 1 0 1000 2000 3000 4000 5000 6000 FFTW 2.1.5 (dual) FFTW 2.1.5 (single) FFTE 3.2 (dual) FFTE 3.2 (single) Performance [MFLOPS] Number of CPUs Client PC PC Cluster OmniRPC Agent OmniRPC Agent Volunteer PCs OmniRPC Agent OmniRPC Agent CyberGRIP GRM CyberGRIP SRMD XtremWeb Dispatcher OmniRPC Client Program OmniRPC Stub Program C ondorP ool The Omni OpenMP compiler enables an OpenMP program to run on a PC-cluster with MPI enabled. The compiler can generate parallel codes for a software DSM system SCASH-MPI is a software distributed shared memory system (DSM) using MPI as a communication layer of SCASH DSM. The original SCASH is a DSM in the SCore cluster software. A thread is created to handle remote memory access request by MPI. By using MPI as a communication layer, one can exploit a high-performance network of clusters with high portability. History-based prefetching for optimizing communication. Performance Benchmark : NPB2.3 BT (parallelized with OpenMP) Environment : Dual Opteron 1.8GHz * 8nodes + Gigabit Ethernet FFTE is a Fortran subroutine library for computing the Fast Fourier Transform (FFT) in one or more dimensions. The API of FFTE is similar to sequential SGI SCSL or Intel MKL routines. Features: High speed. (Supports SSE2/SSE3) Complex and mixed-radix transforms. Paralle l transforms : Shared / Distribute d memory parallel computers (OpenMP, MPI and OpenMP+MPI) High portability: Fortran77 + OpenMP + MPI FFTE’s 1-D parallel FFT routine has been incorporated into the HPC Challenge (HPCC) benchmark. Performance Data: N1xN2xN3 = 2^24xP Machines: Dual Xeon 2.8GHz + Myrinet2000 Research on High Performance Computing Systems

Upload: others

Post on 23-May-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Research on High Performance Computing Systems...Performance of NPB BT Class B Omni OpenMP: Cluster-enabled OpenMP by SCASH-MPI FFTE: A High-Performance FFT Library Performance of

Center for Computational Sciences, University of Tsukuba

http://www.ccs.tsukuba.ac.jp/

OmniRPC: A Grid RPC System for Parallel ComputingOmniRPC is a Grid RPC system for parallel programming and is designed as Thread-saf e RPC in order to allow client programs to use in OpenMP.Support of master-workers programming model for parametric search grid applications. Integrating computing resources on various execution platform:

XtremWeb, Condor, CyberGRIP, Grid Engine.Clusters with private IP address and NATs.

Applications: CONFLEX-G: Grid-enabled Molecular Conformational Space Sarch Program HMCS-G: Grid-enabled Hybrid Computing System for Computational Astrophysics. Master-Worker parallel program for large eigen value program (Prof. Sakurai, U. Tsukuba) PAML-G: Phylogenetic Analysis by Maximum Likelihood on the grid. Used as a backend for YML (Prof. Petiton, LIFL, France)Webpage: http://www.omni.hpcc.jp/OmniRPC/

0 2 4 6 8 100

500

1000

1500

2000

2500

3000

3500

MPI version SCASH-MPI(prefetch)SCASH-MPI

Per

form

ance

[Mop

/s]

Number of Nodes

Performance of NPB BT Class B

Omni OpenMP: Cluster-enabled OpenMP by SCASH-MPI

FFTE: A High-Performance FFT Library

Performance of Parallel 3D FFTs

321684210

1000

2000

3000

4000

5000

6000FFTW 2.1.5 (dual)FFTW 2.1.5 (single)FFTE 3.2 (dual)FFTE 3.2 (single)

Per

form

ance

[MF

LOP

Number of CPUs

Client PC

PC Cluster

OmniRPCAgent

Volunteer PCsOmniRPCAgent

OmniRPCAgent

CyberGRIPGRM

CyberGRIPSRMD

XtremWebDispatcher

OmniRPCClient Program

OmniRPCStub Program

C ondor Pool

The Omni OpenMP compiler enables an OpenMP program to run on a PC-cluster with MPI enabled.

The compiler can generate parallel codes for a software DSM system

SCASH-MPI is a software distributed shared memory system (DSM) using MPI as a communication layer of SCASH DSM.

The original SCASH is a DSM in the SCore cluster software.A thread is created to handle remote memory access request by MPI.By using MPI as a communication layer, one can exploit a high-performance network of clusters with high portability.History-based prefetching for optimizing communication.

PerformanceBenchmark : NPB2.3 BT (parallelized with OpenMP)Environment : Dual Opteron 1.8GHz * 8nodes + Gigabit Ethernet

FFTE is a Fortran subroutine library for computing the Fast Fourier Transform (FFT) in one or more dimensions.The API of FFTE is similar to sequential SGI SCSL or Intel MKL routines.Features:

High speed. (Supports SSE2/SSE3)Complex and mixed-radix transforms.Paralle l transforms : Shared / Distribute d memory parallel computers (OpenMP, MPI and OpenMP+MPI)High portability: Fortran77 + OpenMP + MPIFFTE’s 1-D parallel FFT routine has been incorporated into the HPC Challenge (HPCC) benchmark.

Performance Data: N1xN2xN3 = 2^24xP Machines: Dual Xeon 2.8GHz + Myrinet2000

Research on High Performance Computing Systems

OpenMP Offload Evaluating Support for Features · OpenMP 1.0 OpenMP 2.0 OpenMP 2.0 OpenMP 2.5 OpenMP 3.0 OpenMP 3.1 OpenMP 4.0 OpenMP 4.5 OpenMP 5.0 OpenACC 1.0 OpenACC 2.5 OpenACC

Performance and Energy Analysis of OpenMP Runtime Systems

Getting Performance from OpenMP Programs on NUMA …

Improving OpenMP Performance Johnnie W. Baker March 2, 2011

OpenMP China MCP 1. Agenda Motivation: The Need The OpenMP Solution OpenMP Features OpenMP Implementation Getting Started with OpenMP on KeyStone

How to Get Good Performance Agenda by Using OpenMPHow to Get Good Performance by Using OpenMP!!" Agenda! • !Loop optimizations! • !Measuring OpenMP performance! • !Best practices!

Performance Characteristics of Hybrid MPI/OpenMP ...faculty.cs.tamu.edu/wuxf/papers/IJNDC2013.pdf · Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a

Hybrid MPI/OpenMP programming - University of Utah · Hybrid MPI/OpenMP programming. Martin . Č. uma Center for High Performance Computing University of Utah [email protected]

SCALABILITY AND PERFORMANCE ANALYSIS OF OPENMP … · sary extensions to Periscope to perform scalability and performance analysis on OpenMP codes. Periscope is an online-based performance

Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines