taking advantages of collective operation semantics for loosely coupled simulations

23
Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations Shang-Chieh Joe Wu* Alan Sussman Department of Computer Science University of Maryland, USA *graduating soon

Upload: bina

Post on 11-Jan-2016

29 views

Category:

Documents


1 download

DESCRIPTION

Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations. Shang-Chieh Joe Wu* Alan Sussman Department of Computer Science University of Maryland, USA. *graduating soon. Motivation Approximate Matching [Grid 2004] Collective Semantics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Taking Advantages of Collective Operation Semantics for Loosely

Coupled Simulations

Shang-Chieh Joe Wu*Alan Sussman

Department of Computer ScienceUniversity of Maryland, USA

*graduating soon

Page 2: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

2

Roadmap

• Motivation• Approximate Matching [Grid 2004]

• Collective Semantics

• Dissection of Execution Time

• Smart Buffering

• Future Work

Page 3: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

What is the overall problem?

• Obtain more accurate results by coupling existing (parallel) physical simulation components

• Different time and space scales for data produced in shared or overlapped regions

• Runtime decisions for which time-stamped data objects should be exchanged

• Performance becomes a concern

Page 4: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Coupling, is it important?

• Special issue in May/Jun 2004 of IEEE/AIP Computing in Science & Engineering (CSE)

“It’s then possible to couple several existing calculations together through an interface and obtain accurate answers.”

• Multi-scale multi-resolution simulations and models – multiphysics (May/Jun 2005 CSE)

adaptive small-scale noise capture (hydrodynamics)complex fluid and dense suspension (fluid dynamics)patch dynamics (material science)

• Earth System Modeling Frameworkseveral US federal agencies and universities. (http://www.esmf.ucar.edu)

Page 5: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Matching is OUTSIDE components

• Separate matching (coupling) information from the participating componentsMaintainability – Components can be

developed/upgraded individuallyFlexibility – Change participants/components easilyFunctionality – Support variable-sized time interval

numerical algorithms or visualizations

• Matching information is specified separately by application integrator

• Runtime match via simulation timestamps• POSIX thread-based implementation

Page 6: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Separate codes from matching

define region R1define region R4define region R5...Do t = 1, N, Step0 ... // computation jobs export(R1,t) export(R4,t) export(R5,t)EndDo

define region R2...Do t = 1, M, Step1 import(R2,t) ... // computation jobsEndDo

Importer App1

Exporter App0

App1.R0

App2.R0

App4.R0

App0.R1

App0.R4

App0.R5

Configuration file#App0 cluster0 /bin/App0 2 ...App1 cluster1 /bin/App1 4 ...App2 cluster2 /bin/App2 16 ...App4 cluster4 /bin/App4 4#App0.R1 App1.R0 REGL 0.05App0.R1 App2.R0 REGU 0.1App0.R4 App4.R0 REG 1.0#

SPMD

Page 7: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Distributed Array Transfer Library

Basic Operation

Approximate Match Library

Importer component

Request [email protected]

Matched Array@T3

ApproximateMatch

Exporter component

T4

T3

T2

T1

Exported Distributed

Array

ImportedDistributed

Array

Arrays are distributed among multiple processes

Page 8: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Collective Semantics

• Collective operations– All processes in the same component must perform the same

operation, but not necessarily at the same time

• Approximate match is a collective operation– All processes in the same exporter component asynchronously

generates distributed data with the same timestamps (T1 T2 T3 T4) – All processes in the same importer component asynchronously

makes requests with the same timestamps (T3.1)– All processes in the same exporter component must reply to the

requests with the same timestamps (T3 match to T3.1)– Consistent decisions must be made about which copy of

data (Array@T3) should be transferred for shared or overlapped regions

Page 9: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

• Approximate match is runtime-based approach, so source code-based optimizations help little

• Different components execute at different speeds, and export/import data at their own rates

• Not all exported data are required by importer components– Exported data, whose size might be very large, may

be buffered when matching decisions cannot yet be made

• Not all processes in the same component execute at same speed– Some complex components can be very hard to

perfectly load balance across all processes

Performance Concerns

Page 10: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

• Execution time is composed of – Computation Time – Local Copy Time (might be unnecessary)– Runtime Match Time + Remote Data Transfer

Time

• Same match decisions, for each request, are made repeatedly by all exporter processes in exporter components

• Smart buffering– Faster processes help slower processes in

the same exporter component

Dissection of Execution TimeSmart

Buffering

Page 11: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Smart Buffering

• Exported data are buffered in framework

• A slow exporting process may be able to avoid memory copies, based on– Its responses for previously received import

requests (self-help)– The responses for previous requests satisfied

by the fastest process in the same component (buddy-help)

Page 12: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Smart Buffering Example

Fastest Process

Req Region

treqTimestamp

Slower Process

The Match

Page 13: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Load Balance

• No assumptions about load balance inside each component

• Smart buffering will help with load imbalance at runtime– Slower processes can avoid some

unnecessary work (memory copies)– Component tunes itself at runtime when some

processes fall behind– Framework-level approach – no restrictions

on algorithms/applications

Page 14: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Micro-Benchmark Experiment• utt = uxx + uyy + f(t,x,y), solve 2-d diffusion equation by the

finite element method• A 1024x1024 distributed array is evenly distributed over

participating processes.• 4/8/16/32 P4 2.8GHz processors, connected by Myrinet,

is the importer component U • 4 PIII-650 processors, connected by channel-bonded

Fast Ethernet, is the exporter component F• Two clusters are connected by Gigabit Ethernet.• 1001 data objects exported, and 50 data objects

transferred (20:1)• One process (fs) in the exporter component F performs

extra computation – measuring its data exporting time• Smart buffering can be observed when fs is falling (far)

behind other processes

Page 15: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Smart Buffering Results8 Importer Processes – Exporter component does NOT run Slower

Data Exporting Time for Slowest Process

Page 16: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Data Exporting Time for Slowest Process

Only Buffer Matched Data (Optimal State)

Smart Buffering Results32 Importer Processes

Exporter component runs more slowly from the beginning

Page 17: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Smart Buffering Results

Data Exporting Time for Slowest Process

Nearly No Skips

Some Skips

EnterOptimal

State

16 Importer Processes

Page 18: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Related Work

• Parallel Data Redistribution– Shared data among coupled

parallel models – InterComm (Meta-Chaos),

PAWS, MCT, CUMULVS, Roccom, etc.

– MxN Working group in Common Component Architecture (CCA) Forum

• Coordination Languages– Creating and coordinating execution threads in distributed

computing environment– Linda (tuple space model + directives). Delirium, Strand (new

languages). C-Linda, Fortran-M (extending old languages), plus many others.

Page 19: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Conclusion

• Described a runtime-based approach to speed up slower processes in the same exporter component in (loosely) coupled simulations

• Try to minimize unnecessary buffering for exported data that ends up not being transferred during component execution. Post-processing in the simulations components, or other tools, is not needed

• Perfect synchronization across participating components is not required – can especially benefit “hard-to-load-balance” components

Page 20: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

20

Future Work

• Investigate buffering issues between processes, such as non-blocking transfers or RDMA over InfiniBand

• Performance optimizations for slow importers (pattern-based semantic cache)

• Applying the framework to a set of large-scale coupled scientific applications from the space weather domain in progress

Page 21: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

21

The End

Page 22: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Questions ?

Page 23: Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

23

Supported matching policies

<importer request, exporter matched, desired precision> = <x, f(x), p>

• LUB minimum f(x) with f(x) ≥ x• GLB maximum f(x) with f(x) ≤ x• REG f(x) minimizes |f(x)-x| with |f(x)-x| ≤ p• REGU f(x) minimizes f(x)-x with 0 ≤ f(x)-x ≤ p• REGL f(x) minimizes x-f(x) with 0 ≤ x-f(x) ≤ p• FASTR any f(x) with |f(x)-x| ≤ p• FASTU any f(x) with 0 ≤ f(x)-x ≤ p• FASTL any f(x) with 0 ≤ x-f(x) ≤ p