2006/1/23yutaka ishikawa, the university of tokyo1 an introduction of gridmpi yutaka ishikawa and...

24
2006/1/23 Yutaka Ishikawa, The University of Tokyo 1 An Introduction of Gri dMPI Yutaka Ishikawa and Motohiko Ma tsuda University of Tokyo Grid Technology Research Center, AIST (National Institute of Advanced Industrial Science and Technology) This work is partially supported by the NAREGI pro (1,2) (2) (1) (2) http://www.gridmpi.org

Upload: cecily-kelley

Post on 11-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 1

An Introduction of GridMPI

Yutaka Ishikawa   and Motohiko MatsudaUniversity of Tokyo

Grid Technology Research Center, AIST

(National Institute of Advanced Industrial Science and Technology)

This work is partially supported by the NAREGI project

(1,2)

(2)

(1)

(2)

http://www.gridmpi.org/

Page 2: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 2

Motivation• MPI, Message Passing Interface, has been widely used to program

parallel applications.

• Users want to run such applications over the Grid environment without any modifications of the program.

• However, the performance of existing MPI implementations is not scaled up on the Grid environment.

Wide-areaNetwork

Single (monolithic) MPI applicationover the Grid environment

computing resourcesite A

computing resourcesite A

computing resourcesite B

computing resourcesite B

Page 3: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 3

Motivation• Focus on metropolitan-area, high-bandwidth environment: 10Gp

bs, 500miles (smaller than 10ms one-way latency)– We have already demonstrated that the performance of the NAS parallel be

nchmark programs are scaled up if one-way latency is smaller than 10ms using an emulated WAN environment .

Wide-areaNetwork

Single (monolithic) MPI applicationover the Grid environment

computing resourcesite A

computing resourcesite A

computing resourcesite B

computing resourcesite B

Motohiko Matsuda, Yutaka Ishikawa, and Tomohiro Kudoh, ``Evaluation of MPI Implementations on Grid-connected Clusters using an Emulated WAN Environment,'' CCGRID2003, 2003

Motohiko Matsuda, Yutaka Ishikawa, and Tomohiro Kudoh, ``Evaluation of MPI Implementations on Grid-connected Clusters using an Emulated WAN Environment,'' CCGRID2003, 2003

Page 4: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 4

InternetInternet

Issues• High Performance Communication

Facilities for MPI on Long and Fat Networks

– TCP vs. MPI communication patterns

– Network Topology

• Latency and Bandwidth

• Interoperability

– Most MPI library implementations use their own network protocol.

• Fault Tolerance and Migration

– To survive a site failure

• Security

TCP MPI

Designed for streams.

Burst traffic.

Repeat the computation and communication phases.

Change traffic by communication patterns.

Page 5: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 5

Internet

Internet

Issues• High Performance Communication

Facilities for MPI on Long and Fat Networks

– TCP vs. MPI communication patterns

– Network Topology

• Latency and Bandwidth

• Interoperability

– Many MPI library implementations. Most implementations use their own network protocol.

• Fault Tolerance and Migration

– To survive a site failure

• Security

TCP MPI

Designed for streams.

Burst traffic.

Repeat the computation and communication phases.

Change traffic by communication patterns.

Using Vendor C’s MPI library

Using Vendor A’s MPI library

Using Vendor B’s MPI library

Using Vendor D’s MPI library

Page 6: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 6

GridMPI Features• MPI-2 implementation• IMPI (Interoperable MPI) protocol and extension for Grid

– MPI-2– New Collective protocols– Checkpoint

• Integration of Vendor MPI– IBM, Solaris, Fujitsu, and MPICH2

• High Performance TCP/IP implementation on Long and Fat Networks– Pacing the transmission ratio so that the burst transmission is controlled according

to the MPI communication pattern.• Checkpoint

IMPI

Cluster X Cluster Y

VendorMPI YAMPII

Page 7: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 7

Evaluation

• It is almost impossible to reproduce the execution behavior of communication performance in the wide area network.

• A WAN emulator, GtrcNET-1, is used to scientifically examine implementations, protocols, communication algorithms, etc.

GtrcNET-1

GtrcNET-1 is developed at AIST.• injection of delay, jitter, error, …• traffic monitor, frame capture

•Four 1000Base-SX ports•One USB port for Host PC•FPGA (XC2V6000)http://www.gtrc.aist.go.jp/gnet/

Page 8: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 8

Experimental Environment8 PCs

CPU: Pentium4/2.4GHz, Memory: DDR400 512MBNIC: Intel PRO/1000 (82547EI)OS: Linux-2.6.9-1.6 (Fedora Core 2)Socket Buffer Size: 20MB

WAN Emulator

GtrcNET-1

8 PCs

Node7Node7

Host 0Host 0Host 0Host 0Host 0Host 0Node0Node0 Catalyst 3750

Catalyst 3750

Node15Node15

Host 0Host 0Host 0Host 0Host 0Host 0Node8Node8Catalyst 3750

Catalyst 3750

……

… ……

•Bandwidth:1Gbps•Delay: 0ms -- 10ms

Page 9: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 9

GridMPI vs. MPICH-G2 (1/4)FT (Class B) of NAS Parallel Benchmarks 3.2

on 8 x 8 processes

One way delay (msec)

Rela

tive

Perfo

rman

ce

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12

FT(GridMPI)

FT(MPICH-G2)

Page 10: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 10

GridMPI vs. MPICH-G2 (2/4)IS (Class B) of NAS Parallel Benchmarks 3.2

on 8 x 8 processes

One way delay (msec)

Rela

tive

Perfo

rman

ce

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12

IS(GridMPI)

IS(MPICH-G2)

Page 11: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 11

GridMPI vs. MPICH-G2 (3/4)LU (Class B) of NAS Parallel Benchmarks 3.2

on 8 x 8 processes

One way delay (msec)

Rela

tive

Perfo

rman

ce

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12

LU(GridMPI)LU(MPICH-G2)

Page 12: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 12

GridMPI vs. MPICH-G2 (4/4)NAS Parallel Benchmarks 3.2 Class B

on 8 x 8 processes

One way delay (msec)

Rela

tive

Perfo

rman

ce

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12

SP(GridMPI)BT (GridMPI)MG(GridMPI)CG(GridMPI)SP(MPICH-G2)BT(MPICH-G2)MG(MPICH-G2)CG(MPICH-G2)

No parameters tuned in GridMPI

Page 13: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 13

GridMPI on Actual network • NAS Parallel Benchmarks run using 8

node (2.4GHz) cluster at Tsukuba and 8 node (2.8GHz) cluster at Akihabara

– 16 nodes

• Comparing the performance with– result using 16 node (2.4 GHz)

– result using 16 node (2.8 GHz)

JGN2 Network 10Gbps Bandwidt

h 1.5 msec RTT

JGN2 Network 10Gbps Bandwidt

h 1.5 msec RTT

Pentium-4 2.4GHz x 8connected by 1G Ethernet@ Tsukuba

Pentium-4 2.8 GHz x 8Connected by 1G Ethernet@ Akihabara

60 Km (40mi.)

00.20.40.60.8

11.2

BT CG EP FT IS LU MG SP

2.4 GHz2.8 GHz

Benchmarks

Rel

ativ

e pe

rfor

man

ce

Page 14: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 14

Demonstration

• Easy installation– Download the source– Make it and set up configuration files

• Easy use– Compile your MPI application– Run it !

JGN2 Network 10Gbps Bandwidt

h 1.5 msec RTT

JGN2 Network 10Gbps Bandwidt

h 1.5 msec RTT

Pentium-4 2.4GHz x 8connected by 1G Ethernet@ Tsukuba

Pentium-4 2.8 GHz x 8Connected by 1G Ethernet@ Akihabara

60 Km (40mi.)

Page 15: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 15

NAREGI Software Stack (Beta Ver. 2006)

(( Globus,Condor,UNICOREGlobus,Condor,UNICOREOGSA / WSRF)OGSA / WSRF)

Grid-Enabled Nano-Applications

Grid PSEGrid   Programing

-Grid RPC -Grid MPI

Grid Visualization

Grid VM

DistributedInformation Service

Grid Workflow

Super Scheduler

High-Performance & Secure Grid Networking

Data

Page 16: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 16

GridMPI Current Status

• GridMPI version 0.9 was released– MPI-1.2 features are fully supported– MPI-2.0 features are supported except for MPI-IO and

one sided communication primitives– Conformance Tests

• MPICH Test Suite: 0/142 (Fails/Tests)

• Intel Test Suite: 0/493 (Fails/Tests)

• GridMPI version 1.0 will be released in this Spring– MPI-2.0 fully supported

http://www.gridmpi.org/

Page 17: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 17

Concluding Remarks• GridMPI is integrated into the NaReGI package.• GridMPI is not only for production but also our research vehicle

for Grid environment in the sense that the new idea in Grid is implemented and tested.

• We are currently studying high-performance communication mechanisms in the long and fat network:– Modifications of TCP Behavior

• M Matsuda, T. Kudoh, Y. Kodama, R. Takano, and Y. Ishikawa, “TCP Adaptation for MPI on Long-and-Fat Networks,” IEEE Cluster 2005, 2005.

– Precise Software Pacing• R. Takano, T. Kudoh, Y. Kodama, M. Matsuda, H. Tezuka, Y. Ishika

wa, “Design and Evaluation of Precise Software Pacing Mechanisms for Fast Long-Distance Networks”, PFLDnet2005, 2005.

– Collective communication algorithms with respect to network latency and bandwidth.

Page 18: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 18

BACKUP

Page 19: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 19

GridMPI Version 1.0

– YAMPII, developed at the University of Tokyo, is used as the core implementation

– Intra communication by YAMPII ( TCP/IP 、 SCore )– Inter communication by IMPI ( TCP/IP )

MPI API

TC

P/IP

PM

v2

MX

O2G

Vendor M

PI

P2P Interface

Request LayerRequest Interface

IMP

I

LACT Layer(Collectives)

IMP

I

ssh

rsh

SC

ore

Globus

Vendor M

PI

RPIM Interface

Page 20: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 20

GridMPI vs. Others (1/2)NAS Parallel Benchmarks 3.2 Class B

on 8 x 8 processes

One way delay (msec)

Rela

tive

Perfo

rman

ce

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12

FT(GridMPI)IS(GridMPI)LU(GridMPI)SP(GridMPI)BT (GridMPI)MG(GridMPI)CG(GridMPI)FT(MPICH-G2)IS(MPICH-G2)LU(MPICH-G2)SP(MPICH-G2)BT(MPICH-G2)MG(MPICH-G2)CG(MPICH-G2)

Page 21: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 21

GridMPI vs. Others (1/2)NAS Parallel Benchmarks 3.2 Class B

on 8 x 8 processes

0.00

0.50

1.00

1.50

2.00

0ms

5ms

10m

s

0ms

5ms

10m

s

0ms

5ms

10m

s

0ms

5ms

10m

s

0ms

5ms

10m

s

BT CG LU MG SP

GridMPIGridMPI(with PSP)MPICHLAM/MPIYAMPIIMPICH2MPICH-G2

Rela

tive

Perfo

rman

ce

Page 22: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 22

GridMPI vs. Others (2/2)

0.00

0.50

1.00

1.50

2.00

0ms 5ms 10ms 0ms 5ms 10ms

FT IS

GridMPIGridMPI(with PSP)MPICHLAM/MPIYAMPIIMPICH2MPICH-G2

Rela

tive

Perfo

rman

ce

NAS Parallel Benchmarks 3.2 Class Bon 8 x 8 processes

Page 23: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 23

GridMPI vs. Others

Rel

ativ

e Pe

rfor

man

ce

0.00

0.50

1.00

1.50

2.00

2.50

0ms 5ms 10ms 0ms 2ms 5ms 10ms

FT IS

GridMPIGridMPI(withPSP)MPICHLAM/MPIYAMPIIMPICH2MPICH-G2

NAS Parallel Benchmarks 3.2

16 x 16

Page 24: 2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology

2006/1/23 Yutaka Ishikawa, The University of Tokyo 24

GridMPI vs. Others

0.00

0.50

1.00

1.50

2.00

0m

s

5m

s

10m

s

0m

s

5m

s

10m

s

0m

s

5m

s

10m

s

0m

s

5m

s

10m

s

0m

s

5m

s

10m

s

BT CG LU MG SP

GridMPI GridMPI(withPSP)MPICH LAM/MPIYAMPII MPICH2MPICH-G2

Rel

ativ

e P

erfo

rman

ce

NAS Parallel Benchmarks 3.2