(c) 2004 daniel sorinduke architecture using speculation to simplify multiprocessor design daniel j....

23
(C) 2004 Daniel Sorin Duke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1 , Milo M. K. Martin 2 , Mark D. Hill 3 , David A. Wood 3 1 Dept. of Electrical & Computer Engineering, Duke University 2 Dept. of Computer & Information Science, Univ. of Pennsylvania 3 Computer Sciences Dept., University of Wisconsin- Madison

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

(C) 2004 Daniel Sorin Duke Architecture

Using Speculation to Simplify Multiprocessor Design

Daniel J. Sorin1, Milo M. K. Martin2, Mark D. Hill3, David A. Wood3

1Dept. of Electrical & Computer Engineering, Duke University2Dept. of Computer & Information Science, Univ. of Pennsylvania

3Computer Sciences Dept., University of Wisconsin-Madison

IPDPS 2004 – Daniel Sorinslide 2

My Talk in One Slide

• Shared memory multiprocessors are complicated– Difficult to design for every possible corner case

• Proposal: Use speculation to target the common case– Speculate that corner cases won’t happen– Detect if they do occur and recover system– Ensure forward progress

• Case studies– Simplify cache coherence protocols– Simplify the interconnection network

IPDPS 2004 – Daniel Sorinslide 3

Speculation for Simplicity

• Why we want to avoid complexity– Time and money for design and verification

• Design for the common case– But we have to make ALL cases work correctly

• Examples of this philosophy in uniprocessors– Trapping to software for infrequent/obsolescent instructions– Pentium4 recovers from edge case scheduler deadlocks

• But this idea hadn’t been used for multiprocessors– Key: we now have efficient multiprocessor recovery

IPDPS 2004 – Daniel Sorinslide 4

Framework for Speculation

• Four keys to design simplification with speculation

1) Ensure that mis-speculations are rare

2) Detect all mis-speculations

3) Recover from mis-speculations

4) Ensure forward progress even for worst-case

IPDPS 2004 – Daniel Sorinslide 5

SafetyNet Checkpoint/Recovery

• We use SafetyNet [ISCA 2002] for system recovery• All-hardware checkpoint/recovery for shared

memory multiprocessors• Periodically, takes logical checkpoints of system

– Including caches, coherence state, memory, directory state– Implements checkpointing with incremental logging– Consistent checkpoints using logical time coordination

• Can recover 100,000+ cycles• Negligible performance impact

– Incremental logging performed off critical path

• Small log buffers (512 KB) at caches & memories

IPDPS 2004 – Daniel Sorinslide 6

The Need for Multiprocessor Recovery

• Assumption: multiprocessors will have system-wide recovery mechanisms for purposes of availability– As fault rates keep increasing, recovery is crucial

• Will be all-hardware (like SafetyNet) for performance– But many alternative designs are possible

• We leverage this recovery mechanism for recovering from mis-speculations

IPDPS 2004 – Daniel Sorinslide 7

Outline

• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network• Evaluation• Conclusions

IPDPS 2004 – Daniel Sorinslide 8

Directory Protocol Complexity

• We want adaptive routing in interconnection network– Better performance and availability– But adaptive routing precludes point-to-point ordering

• So what?– Point-to-point ordering simplifies protocol design– Eliminates several potential corner case races

IPDPS 2004 – Daniel Sorinslide 9

Race Case in Directory Protocol

• Example race if no point-to-point ordering in network

P1

Dir

P2

RequestReadWrite

Writeback

RequestReadWrite arrives first at Dir, gets forwarded to P1

Forwarded RequestReadWrite

IPDPS 2004 – Daniel Sorinslide 10

Race Case in Directory Protocol

P1

Dir

P2

RequestReadWrite

Forwarded RequestReadWrite

Writeback AckWriteback

Forwarded RequestReadWrite arrives after Writeback Ack

IPDPS 2004 – Daniel Sorinslide 11

Race Case in Directory Protocol

• Problem: P1 sees Forwarded Request in state Invalid

P1

Dir

P2

RequestReadWrite

Forwarded RequestReadWrite

Writeback AckWriteback

Not possible if point-to-point order in interconnection network

IPDPS 2004 – Daniel Sorinslide 12

Simplifying a Directory Protocol

• Speculate that adaptive network provides ordering

1) Why is mis-speculation rare?– Not many re-orderings– Most re-orderings don’t matter!

2) How do we detect all mis-speculations?– If we get a Forwarded RequestReadWrite in state Invalid

3) How do we recover?– SafetyNet

4) How do we ensure forward progress?– Slow-start operation for a while after recovery– Guarantees that this race can’t keep recurring

IPDPS 2004 – Daniel Sorinslide 13

Simplifying a Snooping Coherence Protocol

• During design, we missed a corner case

StateM

Statetrans1

WritebackStatetrans2

Request ReadWrite

Request ReadWrite

• Solution: it’s rare, treat it as mis-speculation• Detect by seeing RequestReadWrite in state trans2• Recovery with SafetyNet• Forward progress with slow-start after recovery

???

IPDPS 2004 – Daniel Sorinslide 14

Outline

• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network

– Deadlock– Avoiding deadlock

• Evaluation• Conclusions

IPDPS 2004 – Daniel Sorinslide 15

Two Causes of Deadlock

P1

P2

Response

full of requests

full of requests

Response

Message M1

full of messages

full of messages

Message M2

Endpoint

Deadlock

Switch

Deadlock

switch1

switch2

IPDPS 2004 – Daniel Sorinslide 16

Avoiding Deadlock

• Simple but wasteful solution: full buffering– But it’s rare that we ever need full buffering

• More efficient solution: virtual channels (networks)• For endpoint deadlock

– Need a virtual network per type of message

• For switch deadlock– Need some number of virtual channels per virtual network– Depends on network topology and routing scheme

• A major source of design complexity

IPDPS 2004 – Daniel Sorinslide 17

Simplifying Deadlock Avoidance

• Speculate that deadlock won’t occur, despite using less than full buffering and no virtual channels

1) Why is mis-speculation rare?– Can usually avoid deadlock with reasonable buffering

2) How do we detect all mis-speculations?– Timeout mechanism for cache coherence transactions

3) How do we recover?– SafetyNet

4) How do we ensure forward progress?– Slow-start operation for a while after recovery– Guarantees that deadlock can’t keep recurring

IPDPS 2004 – Daniel Sorinslide 18

Outline

• A Framework for Speculation• Simplifying Cache Coherence Protocols• Simplifying the Interconnection Network• Evaluation

– Goals– Methodology– Results

• Conclusions

IPDPS 2004 – Daniel Sorinslide 19

Goals

• Discover the point at which mis-speculation recoveries impact performance– Determines whether our simplified snooping protocol and

our simplified interconnection network are viable

• Determine whether our simplified directory protocol can usefully speculate on point-to-point ordering

IPDPS 2004 – Daniel Sorinslide 20

Methodology

• Full-system simulation– Simics provides full-system functionality– We added detailed timing model for memory system

• Workloads– Online transaction processing (OLTP) with DB2– SPECjbb2000 java middleware– Apache static web serving– Slashcode dynamic web serving– Barnes-Hut scientific simulation

IPDPS 2004 – Daniel Sorinslide 21

How Rare Must Mis-speculation Be?

We can tolerate high mis-speculation rates – these rates are much higher than what our simplified designs incur

IPDPS 2004 – Daniel Sorinslide 22

Adaptive Routing with Speculative Ordering

Adaptive routing can provide better performance by routing around congestion, even with mis-speculations

IPDPS 2004 – Daniel Sorinslide 23

Conclusions

• Simplify multiprocessor design with speculation– Treat corner cases as mis-speculations & recover from them

• Must be able to ensure that– Mis-speculations are sufficiently rare– Can detect all mis-speculations– Can recover from mis-speculations– Can provide forward progress in all cases

• Showed how to simplify– Cache coherence protocols– Interconnection network deadlock avoidance

• Applicable to other complicated designs