flexible control of data transfer between parallel programs joe shang-chieh wu alan sussman...

Flexible Control of Data Transfer between Parallel Programs

Joe Shang-chieh WuAlan Sussman

Department of Computer ScienceUniversity of Maryland, USA

Grid 2004 2

Corona and solar wind

Global magnetospheric MHD

Thermosphere-ionosphere model

Rice convection model

Particle and Hybrid model

Grid 2004 3

What is the problem?• Coupling existing (parallel) programs

– for physical simulations more accurate answers can be obtained

– for visualization, flexible transmission of data between simulation and visualization codes

• Exchange data across shared or overlapped regions in multiple parallel programs

• Couple multi-scale (space & time) programs• Focus on multiple time scale problems (when to

exchange data)

Grid 2004 4

Roadmap

• Motivation

• Approximate Matching

• Matching properties

• Performance results

• Conclusions and future work

Grid 2004 5

Is it important?

• Petroleum reservoir simulations – multi-scale, multi-resolution code

• Special issue in May/Jun 2004 of IEEE Computing in Science & Engineering

“It’s then possible to couple several existing calculations together through an interface and obtain accurate answers.”

• Earth System Modeling Frameworkseveral US federal agencies and universities. (http://www.esmf.ucar.edu)

Grid 2004 6

Solving multiple space scales1. Appropriate

tools 2. Coordinate

transformation3. Domain

knowledge

Grid 2004 7

Matching is OUTSIDE components• Separate matching (coupling) information from

the participating components– Maintainability – Components can be

developed/upgraded individually– Flexibility – Change participants/components easily– Functionality – Support variable-sized time interval

numerical algorithms or visualizations

• Matching information is specified separately by application integrator

• Runtime match via simulation time stamps

Grid 2004 8

Separate codes from matchingdefine region Sr12define region Sr4define region Sr5...Do t = 1, N, Step0 ... // computation jobs export(Sr12,t) export(Sr4,t) export(Sr5,t)EndDo

define region Sr0...Do t = 1, M, Step1 import(Sr0,t) ... // computation jobsEndDo

Importer Ap1

Exporter Ap0

Ap1.Sr0

Ap2.Sr0

Ap4.Sr0

Ap0.Sr12

Ap0.Sr4

Ap0.Sr5

Configuration file#Ap0 cluster0 /bin/Ap0 2 ...Ap1 cluster1 /bin/Ap1 4 ...Ap2 cluster2 /bin/Ap2 16 ...Ap4 cluster4 /bin/Ap4 4#Ap0.Sr12 Ap1.Sr0 REGL 0.05Ap0.Sr12 Ap2.Sr0 REGU 0.1Ap0.Sr4 Ap4.Sr0 REG 1.0#

Grid 2004 9

Matching implementation

• Library is implemented with POSIX threads• Each process in each program uses library

threads to exchange control information in the background, while applications are computing in the foreground

• One process in each parallel program runs an extra representative thread to exchange control information between parallel programs– Minimize communication between parallel programs– Keep collective correctness in each parallel program– Improve overall performance

Grid 2004 10

Approximate Matching

• Exporter Ap0 produces a sequence of data object A at simulation times 1.1, 1.2, 1.5, and 1.9– [email protected], [email protected], [email protected], [email protected]

• Importer Ap1 requests the same data object A at time 1.3– [email protected]

• Is there a match for [email protected]? If Yes, which one and why?

Grid 2004 11

Supported matching policies<importer request, exporter matched, desired precision> = <x, f(x), p>

• LUB minimum f(x) with f(x) ≥ x• GLB maximum f(x) with f(x) ≤ x• REG f(x) minimizes |f(x)-x| with |f(x)-x| ≤ p• REGU f(x) minimizes f(x)-x with 0 ≤ f(x)-x ≤ p• REGL f(x) minimizes x-f(x) with 0 ≤ x-f(x) ≤ p• FASTR any f(x) with |f(x)-x| ≤ p• FASTU any f(x) with 0 ≤ f(x)-x ≤ p• FASTL any f(x) with 0 ≤ x-f(x) ≤ p

Grid 2004 12

GLB

REG

Stamps te

Acceptable region

ti

REGL

time

Acceptable ≠ Matchable

te’ te’’

Grid 2004 13

Region-type matches

REGU

REG

Stampste

Acceptable region

ti

REGL

te’

Grid 2004 14

Experimental setupQuestion : How much overhead introduced by runtime matching?

• 6 PIII-600 processors, connected by channel-bonded Fast Ethernet

• utt = uxx + uyy + f(t,x,y), solve 2-d diffusion equation by the finite element method.

• u(t,x,y) : 512x512 array, on 4 processors (Ap1)• f(t,x,y) : 32x512 array, on 2 processors (Ap2)• All data in Ap2 is sent (exported) to Ap1 using matching

criterion <REGL,0.05>• Ap1 receives (imports) data with 3 different scenarios.

1001 matches made for each scenario (results averaged over multiple runs)

Grid 2004 15

Experiment result 1

P10 P11 P12 P13

Case A 341ms 336ms 610ms 614ms

Case B 620ms 618ms 618ms 618ms

Case C 624ms 612ms 340ms 339ms

Ap1 execution time (average)

Grid 2004 16

Experiment result 2Do t = 1, N

import (data, t)

compute u

EndDo

Do t = 1, N

Request a match for data@t

Receive data

compute u

EndDo

Matching time Data Transfer time Computation Time Matching Overhead

Case A 944us 6.1ms 605ms 13%

Case B 708us 2.9ms 613ms 20%

Case C 535us 6.8ms 614ms 7%

Ap1 pseudo code

Ap1 overhead in the slowest process

Grid 2004 17

Experiment result 3

Slowest Process Fastest Process

Case A 944us (P13) 4394us (P11)

Case B 708us (P10) 3468us (Others)

Case C 535us (P10) 3703us (P13)

Comparison of matching time

• Fastest process (P11)- high cost, remote match

• Slowest process (P13)- low cost, local match

• High cost match can be hidden

Grid 2004 18

Conclusions & Future work

• Conclusions– Low overhead approach for flexible data

exchange between different time scale e-Science components

• Ongoing & future work– Performance experiments in Grid environment– Caching strategies to efficiently deal with slow

importers– Real applications – space weather is the first one

End of Talk

Grid 2004 20

Main components

Grid 2004 21

Local and Remote requestsr2

r5

r3

r3r4

r4

r1

r6

r’1

r’2

Ap2 Ap1

P12

P11P10P20

P21 P13

Grid 2004 22

Space Science Application

flexible control of data transfer between parallel programs joe shang-chieh wu alan sussman...

Documents

data slide

sr0 ap0

sr12 ap0

sr4 ap0

usa slide

region sr0

multiple parallel programs

sr0 ap2