send and receive based message-passing for scmp

43
1 Send and Receive Based Send and Receive Based Message-Passing for SCMP Message-Passing for SCMP Charles W. Lewis, Jr. Thesis Defense Virginia Tech April 28 th , 2004

Upload: tanner

Post on 08-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Send and Receive Based Message-Passing for SCMP. Charles W. Lewis, Jr. Thesis Defense Virginia Tech April 28 th , 2004. A. B. Thread. A. Data. B. Sync. A. B. RTS. A. B. CTS. A. B. Data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Send and Receive Based Message-Passing for SCMP

11

Send and Receive Based Send and Receive Based Message-Passing for SCMPMessage-Passing for SCMP

Charles W. Lewis, Jr.

Thesis DefenseVirginia Tech

April 28th, 2004

Page 2: Send and Receive Based Message-Passing for SCMP

22

This presentation introduces the SCMP architecture, This presentation introduces the SCMP architecture, discusses problems with the current SCMP message-discusses problems with the current SCMP message-

passing system, and focuses on the design and passing system, and focuses on the design and performance of a new SCMP message-passing systemperformance of a new SCMP message-passing system..

1. Overview of SCMP 2. Original Message-Passing System

3. New Message-Passing System 4. Performance Comparisons

A

A B

BThread

Sync

Data

A

A B

B

CTS

A BData

RTS

Page 3: Send and Receive Based Message-Passing for SCMP

33

Problems with current design Problems with current design trends motivate the SCMP concept.trends motivate the SCMP concept.

As transistor sizes shrink, so do As transistor sizes shrink, so do communication wires. This leads to higher communication wires. This leads to higher cross-chip communication latencies.cross-chip communication latencies.

ILP faces diminishing returns.ILP faces diminishing returns.

Large and complex uni-processors require Large and complex uni-processors require extensive amounts of design and extensive amounts of design and verification.verification.

Page 4: Send and Receive Based Message-Passing for SCMP

44

SCMP provides PLP through SCMP provides PLP through replication.replication.

Up to 64 identical Up to 64 identical nodes on-chipnodes on-chip

Replicated nodes Replicated nodes reduce complexityreduce complexity

2-D network 2-D network eliminates cross-chip eliminates cross-chip wireswires

SCMP Network with 64 Nodes

Page 5: Send and Receive Based Message-Passing for SCMP

55

SCMP provides TLP through SCMP provides TLP through multi-thread hardware support.multi-thread hardware support.

Up to 16 threads Up to 16 threads

Round-robin thread Round-robin thread scheduling by scheduling by hardwarehardware

On every node:On every node:– 4-stage RISC pipeline4-stage RISC pipeline– 8MB memory8MB memory– Networking hardwareNetworking hardware

SCMP Node

Page 6: Send and Receive Based Message-Passing for SCMP

66

The original messaging system has The original messaging system has two message types.two message types.

HH TT PayloadPayload

11 00 XX YY THREADTHREAD 11

00 00 AddressAddress

00 00 Register DataRegister Data

..

..

..

00 11 Register DataRegister Data

HH TT PayloadPayload

11 00 XX YY DATADATA StrideStride

00 00 AddressAddress

00 00 DataData

..

..

..

00 11 DataData

Thread Message Data Message

Because they contain handling information these message formats borrow from the Active-Messages message-passing

system.

Page 7: Send and Receive Based Message-Passing for SCMP

77

Network uses wormhole and Network uses wormhole and dimension-order routing.dimension-order routing.

0 1 2 3

4 5 6 7

Every router multiplexes virtual channel buffers over physical channels. Every router multiplexes virtual channel buffers over physical channels. Head flits claim virtual channel resources as they travelHead flits claim virtual channel resources as they travelIf one message blocks, other messages may still continue as long as If one message blocks, other messages may still continue as long as enough virtual channels are free.enough virtual channels are free.Messages move along X axis, then Y axisMessages move along X axis, then Y axisTail flits release virtual channel resources as they travel.Tail flits release virtual channel resources as they travel.

Page 8: Send and Receive Based Message-Passing for SCMP

88Router

Thread VCs

Dimension-order routing is deadlock free as Dimension-order routing is deadlock free as long as messages eventually drain.long as messages eventually drain.

Even with VCs, network can Even with VCs, network can still deadlock if messages don’t still deadlock if messages don’t drain.drain.

If all contexts are consumed, If all contexts are consumed, thread messages block at NIUthread messages block at NIU– Threads may not release until Threads may not release until

a data message is receiveda data message is received– Data messages must not be Data messages must not be

stopped by congested thread stopped by congested thread messagesmessages

Data messages must have a Data messages must have a separate path through separate path through network.network. Data VCs

West East

Page 9: Send and Receive Based Message-Passing for SCMP

99

NIU

The NIU bears most of the The NIU bears most of the messaging load.messaging load.

Thread Buffer

Data Buffer

Receive Buffer

Context 1Context 2

Context 2

Ejection Channel

Injection Channel

To Router

From Router

Memory

Page 10: Send and Receive Based Message-Passing for SCMP

1010

Messages are built through Messages are built through assembly instructions.assembly instructions.

InstructionInstruction ArgumentsArguments DescriptionDescription

sendhsendh d_node, type, d_address, d_strided_node, type, d_address, d_stride send a header flitsend a header flit

sendsend datadata send one data flitsend one data flit

send2send2 data, datadata, data send two data flitssend two data flits

sendesende datadata send one data flit and end messagesend one data flit and end message

send2esend2e data, datadata, data send two data flits and end messagesend two data flits and end message

sendmsendm l_address, {l_stride, count}l_address, {l_stride, count} send data block from memorysend data block from memory

sendmesendme l_address, {l_stride, count}l_address, {l_stride, count} send data block from memory and end send data block from memory and end messagemessage

Page 11: Send and Receive Based Message-Passing for SCMP

1111

The thread library facilitates thread The thread library facilitates thread messages.messages.

OperationOperation ArgumentsArguments DescriptionDescription

createThreadcreateThread int dst_nodeint dst_node create a thread on dst_nodecreate a thread on dst_node

void(*addr)()void(*addr)()

void(*callback)()void(*callback)()

……

parExecuteparExecute int num_nodesint num_nodes create threads oncreate threads on

void(*addr)()void(*addr)() num_nodes nodesnum_nodes nodes

……

getBlockgetBlock unsigned int node_idunsigned int node_id request a block of valuesrequest a block of values

char *dst_addrchar *dst_addr from node node_idfrom node node_id

unsigned int dst_strideunsigned int dst_stride

char **src_addrchar **src_addr

unsigned int src_offsetunsigned int src_offset

unsigned int src_strideunsigned int src_stride

unsigned int num_wordsunsigned int num_words

Page 12: Send and Receive Based Message-Passing for SCMP

1212

The send library facilitates data messages.The send library facilitates data messages.OperationOperation ArgumentsArguments DescriptionDescription

sendDataIntValuesendDataIntValue int dst_nodeint dst_node send an integer to dst_nodesend an integer to dst_node

int *dst_addrint *dst_addr

int valueint value

sendDataFloatValuesendDataFloatValue int dst_nodeint dst_node send a double to dst_nodesend a double to dst_node

double *dst_addrdouble *dst_addr

double valuedouble value

sendDataBlocksendDataBlock int dst_nodeint dst_node send a block of values fromsend a block of values from

int *dst_addrint *dst_addr memory to dst_nodememory to dst_node

int dst_strideint dst_stride (blocking)(blocking)

int *src_addrint *src_addr

int src_strideint src_stride

int num_wordsint num_words

sendDataBlockNBsendDataBlockNB int dst_nodeint dst_node send a block of values fromsend a block of values from

int *dst_addrint *dst_addr memory to dst_nodememory to dst_node

int dst_strideint dst_stride (non-blocking)(non-blocking)

int *src_addrint *src_addr

int src_strideint src_stride

int num_wordsint num_words

Page 13: Send and Receive Based Message-Passing for SCMP

1313

The original message-passing The original message-passing system uses requests and replies.system uses requests and replies.

Node A requires data held Node A requires data held by Node Bby Node B

Node A creates a Node A creates a thread on Node Bthread on Node BNew thread on Node B New thread on Node B sends data to Node Asends data to Node ANew thread on Node B New thread on Node B sends SYNC message sends SYNC message when donewhen done

B

Thread

Sync

DataA

A B

Page 14: Send and Receive Based Message-Passing for SCMP

1414

Dynamic memory is a problem.Dynamic memory is a problem.HH TT PayloadPayload

11 00 XX YY DATADATA StrideStride

00 00 AddressAddress

00 00 DataData

..

..

..

00 11 DataData

Request thread on node B Request thread on node B must know:must know:– Source AddressSource Address– Source StrideSource Stride– Destination AddressDestination Address– Destination StrideDestination Stride– Number of Values to SendNumber of Values to Send

How can Node A know the How can Node A know the source address and stride source address and stride if Node B allocates the if Node B allocates the buffer dynamically?buffer dynamically?Program must contain Program must contain global pointersglobal pointers

Page 15: Send and Receive Based Message-Passing for SCMP

1515

In-order delivery of messages is a In-order delivery of messages is a problem.problem.

SCMP network does not guarantee in-order delivery of SCMP network does not guarantee in-order delivery of messagesmessagesSYNC message may reach Node A before data SYNC message may reach Node A before data messagemessageNode A will read bad values from memoryNode A will read bad values from memory

BSync

DataA

Page 16: Send and Receive Based Message-Passing for SCMP

1616

Request threads and finite thread Request threads and finite thread contexts are a problem.contexts are a problem.

If a node holds highly demanded data, request threads If a node holds highly demanded data, request threads may consume all of its contextsmay consume all of its contextsAdditional thread messages will block in the networkAdditional thread messages will block in the network

Contexts0X0000de5a0X00000f700X00000ff80X00000ff80X00000ff80X00000ff8

0X00000ff80X00000ff8

NIU Thread Thread Thread

Page 17: Send and Receive Based Message-Passing for SCMP

1717

Send-and-Receive message-passing Send-and-Receive message-passing eliminates eliminates allall of these problems. of these problems.

A thread must execute a receive before A thread must execute a receive before data will be accepteddata will be accepted– Don’t need request messagesDon’t need request messages

Messages are identified abstractlyMessages are identified abstractly– Don’t need global pointersDon’t need global pointersCompletion notification occurs locallyCompletion notification occurs locally– Don’t need SYNC messagesDon’t need SYNC messages

Page 18: Send and Receive Based Message-Passing for SCMP

1818

Rendezvous mode uses an Rendezvous mode uses an RTS/CTS handshake.RTS/CTS handshake.

Node B holds data required by Node B holds data required by Node ANode A

Node B sends Node A an Node B sends Node A an RTS message when send is RTS message when send is executedexecutedAfter receive is executed After receive is executed Node A sends Node B a Node A sends Node B a CTS messageCTS messageNode B sends data after Node B sends data after receiving RTSreceiving RTS

BDataA

A B

BCTSA

RTS

Page 19: Send and Receive Based Message-Passing for SCMP

1919

Ready mode foregoes the handshake Ready mode foregoes the handshake to reduce message latency.to reduce message latency.

Node B holds data required by Node B holds data required by Node ANode A

Node B sends data Node B sends data when send is executedwhen send is executed

User must ensure that User must ensure that receive has executed receive has executed on Node Aon Node A

BDataA

Page 20: Send and Receive Based Message-Passing for SCMP

2020

The implementation centers around The implementation centers around two tables.two tables.

3333 22 11 00

idid statestate

8383 5050 4949 2929 2828 1313 1212 77 66 33 22 00

idid addressaddress stridestride r_noder_node r_cntxtr_cntxt statestate

Send Table Entry

Receive Table Entry

Page 21: Send and Receive Based Message-Passing for SCMP

2121

Send Table Entries may be in 4 states, and Send Table Entries may be in 4 states, and Receive Table Entries may be in 5 states.Receive Table Entries may be in 5 states.

ValueValue StateState0000 EmptyEmpty0101 In UseIn Use1010 In ProgressIn Progress1111 CompleteComplete

ValueValue StateState000000 EmptyEmpty001001 In UseIn Use010010 In ProgressIn Progress011011 RTS ReceivedRTS Received10X10X NOT USEDNOT USED110110 NOT USEDNOT USED111111 CompleteComplete

Send Table Entry States Receive Table Entry States

Page 22: Send and Receive Based Message-Passing for SCMP

2222

The new messaging system has The new messaging system has four message types.four message types.

HH TT PayloadPayload

11 00 XX YY THREADTHREAD

11 11 Handler AddressHandler Address

00 00 Register DataRegister Data

..

..

..

00 11 Register DataRegister Data

HH TT PayloadPayload

11 00 XX YY DATADATA

11 11 Message IDMessage ID

00 00 DataData

..

..

..

00 11 DataData

HH TT PayloadPayload

11 00 XX YY RTSRTS cntxtcntxt nodenode

00 11 Message IDMessage ID

Thread Message Data Message

HH TT PayloadPayload

11 00 XX YY CTSCTS cntxtcntxt

00 11 Message IDMessage ID

RTS Message CTS Message

Page 23: Send and Receive Based Message-Passing for SCMP

2323NIU

The NIU now contains a data queue for every context.The NIU now contains a data queue for every context.

Data Buffer

Context 1Context 2

Context 2

Ejection Channel

Injection Channel

To Router

From Router

Thread Buffer

Receive Buffer

CTS Buffer

RTS Buffer

Memory

Page 24: Send and Receive Based Message-Passing for SCMP

2424

Only five new instructions and one Only five new instructions and one modified instruction are needed.modified instruction are needed.

InstructionInstruction ArgumentsArguments DescriptionDescription

sendhsendh d_node, type, d_addressd_node, type, d_address send a header flitsend a header flit

sendsend datadata send one data flitsend one data flit

send2send2 data, datadata, data send two data flitssend two data flits

sendesende datadata send one data flit and end messagesend one data flit and end message

send2esend2e data, datadata, data send two data flits and end messagesend two data flits and end message

sendmsendm l_address, {l_stride, count}l_address, {l_stride, count} send data block from memorysend data block from memory

sendmesendme l_address, {l_stride, count}l_address, {l_stride, count} send data block from memory and end send data block from memory and end messagemessage

ldssldss r, message_idr, message_id poll send operation statuspoll send operation status

ldsrldsr r, message_idr, message_id poll receive operation statuspoll receive operation status

strstr message_id, address, stridemessage_id, address, stride store a receive to tablestore a receive to table

rmsrms message_idmessage_id clear a send operationclear a send operation

rmrrmr message_idmessage_id clear a receive operationclear a receive operation

Page 25: Send and Receive Based Message-Passing for SCMP

2525

The thread library remains nearly the same.The thread library remains nearly the same.OperationOperation ArgumentsArguments DescriptionDescription

createThreadcreateThread int dst_nodeint dst_node create a thread on dst_nodecreate a thread on dst_node

void(*addr)()void(*addr)()

void(*callback)()void(*callback)()

……

parExecuteparExecute int num_nodesint num_nodes create threads oncreate threads on

void(*addr)()void(*addr)() num_nodes nodesnum_nodes nodes

……

getBlockgetBlock unsigned int node_idunsigned int node_id request a block of valuesrequest a block of values

char *dst_addrchar *dst_addr from node node_idfrom node node_id

unsigned int dst_strideunsigned int dst_stride

char **src_addrchar **src_addr

unsigned int src_offsetunsigned int src_offset

unsigned int src_strideunsigned int src_stride

message_idmessage_id

unsigned int num_wordsunsigned int num_words

Page 26: Send and Receive Based Message-Passing for SCMP

2626

The new send library is more familiar.The new send library is more familiar.OperationOperation ArgumentsArguments DescriptionDescription

SCMPSendIntSCMPSendInt int dst_nodeint dst_node send an integer to dst_nodesend an integer to dst_node

int message_idint message_id

int valueint value

SCMPSendFloatSCMPSendFloat int dst_nodeint dst_node send a double to dst_nodesend a double to dst_node

int message_idint message_id

double valuedouble value

SCMPSendSCMPSend int dst_nodeint dst_node send a block of values fromsend a block of values from

int message_idint message_id memory to dst_nodememory to dst_node

int *addressint *address (blocking)(blocking)

int strideint stride

int num_wordsint num_words

SCMPSendNBSCMPSendNB int dst_nodeint dst_node send a block of values fromsend a block of values from

int message_idint message_id memory to dst_nodememory to dst_node

int *addressint *address (non-blocking)(non-blocking)

int strideint stride

int num_wordsint num_words

SCMPPollSendSCMPPollSend int message_idint message_id poll status of send operationpoll status of send operation

SCMPWaitSendSCMPWaitSend int message_idint message_id suspend until message sendssuspend until message sends

SCMPClearSendSCMPClearSend int message_idint message_id clear send operationclear send operation

Page 27: Send and Receive Based Message-Passing for SCMP

2727

The receive library is all new.The receive library is all new.OperationOperation ArgumentsArguments DescriptionDescription

SCMPReceiveSCMPReceive int message_idint message_id receive a message andreceive a message and

int* addressint* address store it at addressstore it at address

int strideint stride (blocking)(blocking)

SCMPReceiveNBSCMPReceiveNB int message_idint message_id receive a message andreceive a message and

int *addressint *address store it at addressstore it at address

int strideint stride (non-blocking)(non-blocking)

SCMPPollReceiveSCMPPollReceive int message_idint message_id poll status of receive poll status of receive operationoperation

SCMPWaitReceiveSCMPWaitReceive int message_idint message_id suspend until message suspend until message arrivesarrives

SCMPClearReceiveSCMPClearReceive int message_idint message_id clear receive operationclear receive operation

Page 28: Send and Receive Based Message-Passing for SCMP

2828

Rendezvous Mode Operation at the Rendezvous Mode Operation at the SenderSender

CTS MessageArrives

Entry->Complete

Tail FlitNot SentSend Flit

sendh

No Entry?

Queue Head And Tag

SUSPEND

Create: Entry->In Use

Head Flit @ Queue Head

Send RTS

No Entry? ERROR

Entry->In Progress

QueueWaitingERROR

T

F

T

T

F

F

Page 29: Send and Receive Based Message-Passing for SCMP

2929

Rendezvous Mode Operation at the Rendezvous Mode Operation at the ReceiverReceiverRTS Message

Arrives

Entry->Complete

Tail FlitNot StoredStore Flit

No Entry

In Use

Record RTS

Send CTS

Block RTS

Data Message Arrives

Entry->In Progress

No EntryDISCARD

In ProgressBlock Data

str

Entry->RTS Rcvd

No Entry RTSRcvd

Record str

Entry->In Use

Send CTS

Entry->In Progress

SUSPEND

T

F

F

F F

F

F

T T

T

T

T

Page 30: Send and Receive Based Message-Passing for SCMP

3030Router

Thread VCs

RTS and CTS Messages also need RTS and CTS Messages also need separate VC paths.separate VC paths.

RTS messages can block RTS messages can block in the network.in the network.

For a given RTS For a given RTS message to leave the message to leave the network, RTS messages network, RTS messages ahead of it must be ahead of it must be satisfiedsatisfied– CTS message to sourceCTS message to source– Data message backData message back

RTS and CTS messages RTS and CTS messages have their own VC paths.have their own VC paths.

West EastData VCs

RTS VCs

CTS VCs

Page 31: Send and Receive Based Message-Passing for SCMP

3131

Ready Mode Operation at the Ready Mode Operation at the SenderSender

sendh

No Entry?

Queue Head And Tag

SUSPEND

Create: Entry->In Use

F

T

Entry->Complete

Tail FlitNot SentSend Flit

ERROR

T

F

No Entry?

Head Flit @ Queue Head

Entry->In Progress

Page 32: Send and Receive Based Message-Passing for SCMP

3232

Ready Mode Operation at the Ready Mode Operation at the ReceiverReceiver

str

No Entry

Record str

Entry->In Use

SUSPENDF

T

Entry->Complete

Tail FlitNot StoredStore Flit

Data Message Arrives

No EntryDISCARD

In ProgressBlock Data

F

F

T

T

Page 33: Send and Receive Based Message-Passing for SCMP

3333

Stressmark testing was used to Stressmark testing was used to verify that performance was not verify that performance was not

hurt.hurt.DIS Stressmark SuiteDIS Stressmark Suite– Neighborhood StressmarkNeighborhood Stressmark– Matrix StressmarkMatrix Stressmark– Transitive Closure StressmarkTransitive Closure Stressmark

LU Factorization StressmarkLU Factorization Stressmark

Page 34: Send and Receive Based Message-Passing for SCMP

3434

The neighborhood stressmark The neighborhood stressmark measures image texture.measures image texture.

Every node owns a portion of the Every node owns a portion of the total rowstotal rowsEvery row owns complete sum Every row owns complete sum and difference histogramsand difference histogramsEach node determines, and Each node determines, and requests, the pair’s for pixels in requests, the pair’s for pixels in its rowsits rowsEach node fills in sum and Each node fills in sum and difference histogramdifference histogramHistograms are sharedHistograms are shared– Each node manages only a Each node manages only a

portion of each histogramportion of each histogram– Only the correct portion is sent to Only the correct portion is sent to

a nodea node

Page 35: Send and Receive Based Message-Passing for SCMP

3535

Queues with 16 flits perform best.Queues with 16 flits perform best.

8

10

12

14

16

18

20

22

24

26

28

10 20 30 40 50 60 70Number of Processors

Spee

dup

128 Length64 Length32 Length16 Length8 Length4 Length2 Length

Page 36: Send and Receive Based Message-Passing for SCMP

3636

The new system out performs the old under The new system out performs the old under the neighborhood stressmark.the neighborhood stressmark.

Seven-Bit Pixels

0

10

20

30

40

50

60

70

0 20 40 60 80Number of Processors

Spee

dup Ideal

Original SCMP SystemNew SCMP System

Eleven-Bit Pixels

0

10

20

30

40

50

60

70

0 20 40 60 80Number of Processors

Spee

dup Ideal

Original SCMP SystemNew SCMP System

Fifteen-Bit Pixels

0

10

20

30

40

50

60

70

0 20 40 60 80Number of Processors

Spee

dup Ideal

Original SCMP SystemNew SCMP System

Page 37: Send and Receive Based Message-Passing for SCMP

3737

Matrix stressmark solves a linear system of Matrix stressmark solves a linear system of equations using the Conjugate Gradient equations using the Conjugate Gradient

Method.Method.

bxA

Additional vectors Additional vectors rr and and pp used for intermediate steps used for intermediate stepsEvery node has:Every node has:– Rows of Rows of AA– Elements of Elements of bb and and rr– Complete Complete xx and and pp

After each iteration p must be globally redistributedAfter each iteration p must be globally redistributed– Share with columnsShare with columns– Share with rowsShare with rows

Page 38: Send and Receive Based Message-Passing for SCMP

3838

The new system provides marginal The new system provides marginal improvement over the original under the improvement over the original under the

matrix stressmark.matrix stressmark.

0

10

20

30

40

50

60

70

0 20 40 60 80Number of Processors

Spee

dup Ideal

Original SCMP SystemNew SCMP System

Page 39: Send and Receive Based Message-Passing for SCMP

3939

The transitive closure stressmark solves the The transitive closure stressmark solves the all-pairs shortest-path problem.all-pairs shortest-path problem.

Floyd-Warshall AlgorithmFloyd-Warshall Algorithm– Adjacency MatrixAdjacency Matrix

DD[[ii][][jj]]– Iterative ImprovementsIterative Improvements

DD[[ii][][jj] = min(] = min(DD[[ii][][jj], ], DD[[ii][][kk]+]+DD[[kk][][jj])])

Each node owns sub-block of Each node owns sub-block of adjacency matrixadjacency matrix– Each node needs portion of Each node needs portion of

row krow k– Each node needs portion of Each node needs portion of

column kcolumn k

03

9

10

4

6

7

1

8

5

2

1112

1314

15

Page 40: Send and Receive Based Message-Passing for SCMP

4040

The new system provides marginal The new system provides marginal improvement over the original under the improvement over the original under the

transitive closure stressmark.transitive closure stressmark.

0

10

20

30

40

50

60

70

0 20 40 60 80Number of Processors

Spee

dup Ideal

Original SCMP SystemNew SCMP System

Page 41: Send and Receive Based Message-Passing for SCMP

4141

The LU factorization stressmark is The LU factorization stressmark is used by linear system solvers.used by linear system solvers.

Factors matrix into a Factors matrix into a lower triangular matrix lower triangular matrix and an upper triangular and an upper triangular matrix.matrix.Matrix is divided into Matrix is divided into blocksblocksPivot block is factoredPivot block is factoredPivot column and row Pivot column and row blocks are divided by blocks are divided by pivot.pivot.Inner active matrix blocks Inner active matrix blocks are modified by the pivot are modified by the pivot row and column blocks.row and column blocks.

Pivot

Pivot Row

Pivot ColumnInner Active Matrix

Page 42: Send and Receive Based Message-Passing for SCMP

4242

The new system out performs the original The new system out performs the original under the LU factorization stressmark.under the LU factorization stressmark.

Rendezvous Mode

0

10

20

30

40

50

60

70

0 20 40 60 80Number of Prcessors

Spee

dup Ideal

Original SCMP SystemNew SCMP System

Ready Mode

0

10

20

30

40

50

60

70

0 20 40 60 80Number of Processors

Spee

dup Ideal

Original SCMP SystemNew SCMP System

Page 43: Send and Receive Based Message-Passing for SCMP

4343

Send-and-Receive Messaging for Send-and-Receive Messaging for SCMP is worthwhile.SCMP is worthwhile.

Fixes Problems With Original SCMP Fixes Problems With Original SCMP Messaging SystemMessaging System– Global Buffer PointersGlobal Buffer Pointers– Races between Data and SYNC messagesRaces between Data and SYNC messages– Request Thread StormsRequest Thread Storms

Programming Model is more familiarProgramming Model is more familiarPerformance is betterPerformance is better

Questions?