(c) 2004 daniel sorinduke architecture using speculation to simplify multiprocessor design daniel j....

(C) 2004 Daniel Sorin Duke Architecture

Using Speculation to Simplify Multiprocessor Design

Daniel J. Sorin1, Milo M. K. Martin2, Mark D. Hill3, David A. Wood3

1Dept. of Electrical & Computer Engineering, Duke University2Dept. of Computer & Information Science, Univ. of Pennsylvania

3Computer Sciences Dept., University of Wisconsin-Madison

IPDPS 2004 – Daniel Sorinslide 2

My Talk in One Slide

• Shared memory multiprocessors are complicated– Difficult to design for every possible corner case

• Proposal: Use speculation to target the common case– Speculate that corner cases won’t happen– Detect if they do occur and recover system– Ensure forward progress

• Case studies– Simplify cache coherence protocols– Simplify the interconnection network


Speculation for Simplicity

• Why we want to avoid complexity– Time and money for design and verification

• Design for the common case– But we have to make ALL cases work correctly

• Examples of this philosophy in uniprocessors– Trapping to software for infrequent/obsolescent instructions– Pentium4 recovers from edge case scheduler deadlocks

• But this idea hadn’t been used for multiprocessors– Key: we now have efficient multiprocessor recovery


Framework for Speculation

• Four keys to design simplification with speculation

1) Ensure that mis-speculations are rare

2) Detect all mis-speculations

3) Recover from mis-speculations

4) Ensure forward progress even for worst-case


SafetyNet Checkpoint/Recovery

• We use SafetyNet [ISCA 2002] for system recovery• All-hardware checkpoint/recovery for shared

memory multiprocessors• Periodically, takes logical checkpoints of system

– Including caches, coherence state, memory, directory state– Implements checkpointing with incremental logging– Consistent checkpoints using logical time coordination

• Can recover 100,000+ cycles• Negligible performance impact

– Incremental logging performed off critical path

• Small log buffers (512 KB) at caches & memories


The Need for Multiprocessor Recovery

• Assumption: multiprocessors will have system-wide recovery mechanisms for purposes of availability– As fault rates keep increasing, recovery is crucial

• Will be all-hardware (like SafetyNet) for performance– But many alternative designs are possible

• We leverage this recovery mechanism for recovering from mis-speculations


Outline

• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network• Evaluation• Conclusions


Directory Protocol Complexity

• We want adaptive routing in interconnection network– Better performance and availability– But adaptive routing precludes point-to-point ordering

• So what?– Point-to-point ordering simplifies protocol design– Eliminates several potential corner case races


Race Case in Directory Protocol

• Example race if no point-to-point ordering in network

P1

Dir

P2

RequestReadWrite

Writeback

RequestReadWrite arrives first at Dir, gets forwarded to P1

Forwarded RequestReadWrite



P1

Dir

P2

RequestReadWrite


Writeback AckWriteback

Forwarded RequestReadWrite arrives after Writeback Ack



• Problem: P1 sees Forwarded Request in state Invalid

P1

Dir

P2

RequestReadWrite


Writeback AckWriteback

Not possible if point-to-point order in interconnection network


Simplifying a Directory Protocol

• Speculate that adaptive network provides ordering

1) Why is mis-speculation rare?– Not many re-orderings– Most re-orderings don’t matter!

2) How do we detect all mis-speculations?– If we get a Forwarded RequestReadWrite in state Invalid

3) How do we recover?– SafetyNet

4) How do we ensure forward progress?– Slow-start operation for a while after recovery– Guarantees that this race can’t keep recurring


Simplifying a Snooping Coherence Protocol

• During design, we missed a corner case

StateM

Statetrans1

WritebackStatetrans2

Request ReadWrite

Request ReadWrite

• Solution: it’s rare, treat it as mis-speculation• Detect by seeing RequestReadWrite in state trans2• Recovery with SafetyNet• Forward progress with slow-start after recovery

???


Outline

• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network

– Deadlock– Avoiding deadlock

• Evaluation• Conclusions


Two Causes of Deadlock

P1

P2

Response

full of requests

full of requests

Response

Message M1

full of messages

full of messages

Message M2

Endpoint

Deadlock

Switch

Deadlock

switch1

switch2


Avoiding Deadlock

• Simple but wasteful solution: full buffering– But it’s rare that we ever need full buffering

• More efficient solution: virtual channels (networks)• For endpoint deadlock

– Need a virtual network per type of message

• For switch deadlock– Need some number of virtual channels per virtual network– Depends on network topology and routing scheme

• A major source of design complexity


Simplifying Deadlock Avoidance

• Speculate that deadlock won’t occur, despite using less than full buffering and no virtual channels

1) Why is mis-speculation rare?– Can usually avoid deadlock with reasonable buffering

2) How do we detect all mis-speculations?– Timeout mechanism for cache coherence transactions

3) How do we recover?– SafetyNet

4) How do we ensure forward progress?– Slow-start operation for a while after recovery– Guarantees that deadlock can’t keep recurring


Outline

• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network• Evaluation

– Goals– Methodology– Results

• Conclusions


Goals

• Discover the point at which mis-speculation recoveries impact performance– Determines whether our simplified snooping protocol and

our simplified interconnection network are viable

• Determine whether our simplified directory protocol can usefully speculate on point-to-point ordering


Methodology

• Full-system simulation– Simics provides full-system functionality– We added detailed timing model for memory system

• Workloads– Online transaction processing (OLTP) with DB2– SPECjbb2000 java middleware– Apache static web serving– Slashcode dynamic web serving– Barnes-Hut scientific simulation


How Rare Must Mis-speculation Be?

We can tolerate high mis-speculation rates – these rates are much higher than what our simplified designs incur


Adaptive Routing with Speculative Ordering

Adaptive routing can provide better performance by routing around congestion, even with mis-speculations


Conclusions

• Simplify multiprocessor design with speculation– Treat corner cases as mis-speculations & recover from them

• Must be able to ensure that– Mis-speculations are sufficiently rare– Can detect all mis-speculations– Can recover from mis-speculations– Can provide forward progress in all cases

• Showed how to simplify– Cache coherence protocols– Interconnection network deadlock avoidance

• Applicable to other complicated designs

(c) 2004 daniel sorinduke architecture using speculation to simplify multiprocessor design daniel j....

Documents

daniel sorin slide

requestreadwrite slide

madison slide

worstcase slide

interconnection network

caches memories slide

multiprocessor design

race case