amlapi: active messages over low-level application programming interface simon yau, smyau@cs tyson...

12
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Upload: rosa-russell

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

AMLAPI:Active Messages overLow-level Application Programming Interface

Simon Yau, smyau@cs

Tyson Condie, tcondie@cs

Page 2: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Background

AM is a low-level communication architecture for high-performance parallel computing

LAPI is IBM’s version of AMVery similar API’sPrograms running on AM platform

should be able to run on LAPI.Use AMLAPI layer to emulate AM

using LAPI.

Page 3: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Similarities

Both are low-level message-passing style architectures.

Both use active messages:– One node initiates an active message.– Receiving node executes a handler

upon reception of the active message.

Page 4: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Differences

AM virtualized network interface with endpoints and bundles – allow multiple threads at each endpoint.

AM requires handlers to be executed in the context of the application program; LAPI handlers execute in the context of the polling thread.

LAPI separates handlers into header and completion handlers.

LAPI uses counters for synchronization (guarantees execution of handlers); AM guarantees network has accepted data.

Page 5: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

AM & LAPI Execution ModelAM Execution

LAPI Execution

Send Msg

Do work..

Get Msg

Execute Handler (and send reply)

Sender

Receiver

Do work…

Send Msg

Do work…

Get Msg

Exec Header Handler

Sender

Receiver

Do work…

Poll

Get Footer

Send Footer

Exec Footer Handler

Poll…

Page 6: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

To Emulate AM on LAPI

Emulate Endpoints and bundles– Maintain a list of endpoints per box– Each endpoint is represented by the

box id and its position in the list

Associate each endpoint bundle with a task queue.– An AM is done with a LAPI call which

schedules a task on the queue at the remote end.

Page 7: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

DesignSending an AM:

– Package a LAPI Message and send to the receiving node

– At receiving node, multiplex the message to the appropriate endpoint and put the associated function pointer with arguments on to the task queue

Receiving an AM:– When the user Polls, check the task queue

and execute a task from it.– Execute only one task since we do not want

the user thread to spend too much time in the handler.

Page 8: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Picture

Send Msg

Do work…

Get Msg

Header Handler

Sender

Receiver

Do work…

Poll

Get Footer

Send Footer

Footer Handler

Poll… Execute Handler…

1. Sender executes AM_Send2. Sender piggy backs information about the AM call and executes LAPI_Send

3. Network ships the message to receiver

4. Receiver’s network gets the request message, causes the polling thread to execute the header handler

5. Header handler allocates buffer space to which the message is copied. 6. LAPI copies the data into a buffer and calls the Footer handler.7. Footer handler posts the AM handler with the arguments and AM

information on the queue of the designation endpoint..8. When user application polls, it will pull out the handler from the task

queue and executes it.

Page 9: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Evaluation Platform: SP3

Interconnect:– Advertised bandwidth = 350MB/s– Advertised latency = ~17 micro seconds.

SMPs:– 8 X Power3 processor SMPs– 4 GB of memory per node

Processor:– super-scalar, pipelined 64 bit RISC. – 8 instructions per clock at 375 MHz. – 64KB L1 cache, 8MB L2 cache

OS:– AIX with IBM Parallel Environment.

Page 10: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Micro Benchmarks:

Round trip latency: 473 usLAPI round trip latency: 32 us

LAPI & AMLAPI Bandwidth on SP3

0

20000

40000

60000

80000

100000

120000

140000

0 100000 200000 300000

Message Size (bytes)

Ba

nd

wid

th (

KB

/s)

LAPI Bandwidth (KB/s)

AM Bandwidth (KB/s)

Page 11: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Explanation

Percentage breakdown of overhead at 262144 byte message

LAPI51%

Context switch10%

Packing AM Info17%

Copying to Endpoint

VM22%

Time spent on transmission of a message

0

0.5

1

1.5

2

2.5

0 50000 100000 150000 200000 250000 300000

Message size

Tim

e (m

s)

LAPI (communication)

Context Switch & Polling

Packing AM info

Copying to Endpoint VMSegment

Copying data from message buffer to an Endpoint’s VM segment takes up the bulk of the overhead.

Context switching and packing AM info takes up the rest. Since SP3 is an SMP, the LAPI threads and application thread

run on different nodes. Moving data from LAPI thread’s processor requires invalidating the processor cache on which the LAPI thread runs.

Page 12: AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, smyau@cs Tyson Condie, tcondie@cs

Conclusion

Using low-level glue-ware is viable option to make programs portable if the communication layers match

Future work:– Macro benchmarks– Improve short message latency by

header handler– “Zero copy” to endpoint VM – make

AM handler run in LAPI context