circuit-switched coherence natalie enright jerger*, li-shiuan peh +, mikko lipasti* *university of...

31
Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh + , Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE International Symposium on Networks-on-Chip

Upload: shayna-gayman

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Circuit-Switched Coherence

Natalie Enright Jerger*, Li-Shiuan Peh+, Mikko Lipasti*

*University of Wisconsin - Madison +Princeton University

2nd IEEE International Symposium on Networks-on-Chip

Page 2: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Motivation Network on Chip for general

purpose multi-core Replacing dedicated global wires Efficient/scalable communication on-

chip Router latency overhead can be

significant Exploit application characteristics to

lower latency Co-design coherence protocol to

match network functionality04/18/23 2Natalie Enright Jerger - University of Wisconsin

Page 3: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Executive Summary Hybrid Network

Interleaves circuit-switched and packet-switched flits

Optimize setup latency Improve throughput over traditional

circuit-switching Reduce interconnect delay by up to 22%

Co-design cache coherence protocol Improves performance by up to 17%

04/18/23 3Natalie Enright Jerger - University of Wisconsin

Page 4: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Switching Techniques Packet Switching

Efficient bandwidth utilization Router latency overhead

Circuit Switching Poor bandwidth utilization

Stalled requests due to unavailable resources

Low latency Avoids router overhead after circuit is

established

04/18/23 4Natalie Enright Jerger - University of Wisconsin

Best of both worlds?

Efficient bandwidth utilization + low latency

Page 5: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Circuit-Switched Coherence

Two key observations Commercial

workloads are very sensitive to communication latency

Significant pair-wise sharing

04/18/23 Natalie Enright Jerger - University of Wisconsin 5

Construct fast pair-wise circuits?

Commercial Workloads: SpecJBB, SpecWeb, TPC-H, TPC-WScientific Workloads: Barnes-Hut, Ocean, Radiosity, Raytrace

Page 6: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Traditional Circuit Switching

Traditional circuit-switching hurts performance by up to ~7%

04/18/23 6Natalie Enright Jerger - University of Wisconsin

*Data collected for 16 in-order core chip multiprocessor

Page 7: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Circuit Switching Redesigned Latency is critical Utilize Circuit Switching for lower

latency A circuit connects resources across

multiple hops to avoid router overhead Traditional circuit-switching

performs poorly My contributions

Novel setup mechanism Bandwidth stealing

04/18/23 Natalie Enright Jerger - University of Wisconsin 7

Page 8: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Outline Motivation Router Design

Setup Mechanism Bandwidth Stealing

Coherence Protocol Co-design Pair-wise sharing 3-hop optimization Region prediction

Results Conclusions

04/18/23 8Natalie Enright Jerger - University of Wisconsin

Page 9: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Traditional Circuit Switching Path Setup (with Acknowledgement)

Significant latency overhead prior to data transfer

Other requests forced to wait for resources04/18/23 Natalie Enright Jerger - University of Wisconsin

Acknowledgement

Configuration Probe

Data

Circuit

0

5

9

Page 10: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Novel Circuit Setup Policy

Overlap circuit setup with 1st data transfer Reconfigure existing circuits if no unused links available

Allows piggy-backed request to always achieve low latency

Multiple circuit planes prevent frequent reconfiguration

Configuration Packet

Data

Circuit

A0

5

04/18/23 10Natalie Enright Jerger - University of Wisconsin

Page 11: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Setup Network Light-weight setup network

Narrow Circuit plane identifier (2 bits) + Destination (4 bits)

Low Load No virtual channels small area footprint

Stores circuit configuration information Multiple narrow circuit planes prevent

frequent reconfiguration Reconfiguration

Buffered, traverses packet-switched pipeline

04/18/23 11Natalie Enright Jerger - University of Wisconsin

Page 12: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Packet-Switched Bandwidth Stealing Remember: problem with traditional

Circuit-Switching is poor bandwidth Need to overcome this limitation

Hybrid Circuit-Switched Solution: Packet-switched messages snoop incoming links When there are no circuit-switched

messages on the link A waiting packet-switched message can

steal idle bandwidth

04/18/23 12Natalie Enright Jerger - University of Wisconsin

Page 13: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Hybrid Circuit-Switched Router Design

T

T

T

T

T

Allocators

Crossbar

Inj

N

S

E

W

W

E

S

N

Ej

04/18/23 Natalie Enright Jerger - University of Wisconsin 13

Page 14: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

HCS Pipeline Circuit-switched messages: 1 stage

Packet-switched messages: 3 stages Aggressive Speculation reduces stages

Buffer Write

Virtual Channel/Switch

Allocation

Switch Traversal

Link Traversal

04/18/23 Natalie Enright Jerger - University of Wisconsin

Switch Traversal

Link Traversal

14

Router Link

Router Link

Link Traversal

Link Traversal

Page 15: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Outline Motivation Router Design

Setup Mechanism Bandwidth Stealing

Coherence Protocol Co-design Pair-wise sharing 3-hop optimization Region prediction

Results Conclusions

04/18/23 15Natalie Enright Jerger - University of Wisconsin

Page 16: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Sharing Characterization

Temporal sharing relationship: 67-76% of misses are serviced by 2 most recently shared with cores

04/18/23 16Natalie Enright Jerger - University of Wisconsin

Commercial Workloads: SpecJBB, SpecWeb, TPC-H, TPC-WScientific Workloads: Barnes-Hut, Ocean, Radiosity, Raytrace

Page 17: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Directory Coherence

Directory

Address

State Sharers

A Exclusive

2

B Shared 1,2

04/18/23 17Natalie Enright Jerger - University of Wisconsin

1 2Read A1

Forward Read A

2

Data Response A3

Directory

Address

State Sharers

A Shared 1,2

B Shared 1,2

Page 18: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Coherence Protocol Co-Design

Goal: Better exploit circuits through coherence protocol

Modifications: Allow a cache to send a request

directly to another cache Notify the directory in parallel Prediction mechanism for pair-wise

sharers Directory is sole ordering point

04/18/23 Natalie Enright Jerger - University of Wisconsin 18

Page 19: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Circuit-Switched Coherence Optimization

04/18/23 19Natalie Enright Jerger - University of Wisconsin

Directory

Address

State Sharers

A Exclusive

2

B Shared 1,2

1 2

Update A

1

Data Response A2

3

Directory

Address

State Sharers

A Shared 1,2

B Shared 1,2

Ack A

Read A1

Page 20: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Region Prediction

Each memory region spans 1KB Takes advantage of spatial and temporal sharing

04/18/23 20Natalie Enright Jerger - University of Wisconsin

Directory

Address

State Sharers

A[0] Shared 2

A[1] Shared 2

1 2Miss A[0]1

Forward Read A[0]

2

Data Response A[0]3

Region Table

A --

B 3

Region Table

A 2

B 3

Region A Update4

5 Read A[1]

Directory

Address

State Sharers

A[0] Shared 1,2

A[1] Shared 2

Page 21: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Simulation Methodology PHARMSim

Full-system multi-core simulator Detailed network level model

Cycle accurate router model Flit-level contention modeled

More results in paper

04/18/23 21Natalie Enright Jerger - University of Wisconsin

Page 22: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Simulation Workloads

04/18/23 22Natalie Enright Jerger - University of Wisconsin

Commercial

SPECjbb Java server workload24 warehouse, 200 requests

SPECweb Web server, 300 requests

TPC-W Web e-commerce, 40 transactions

TPC-H Decision support system

Scientific

Barnes-Hut 8k particles, full run

Ocean 514x514, parallel phase

Radiosity Parallel phase

Raytrace Car input, parallel phase

Synthetic

Uniform Random Destination select with uniform random distribution

Permutation Traffic

Each node communicates with one other node (pair-wise)

Page 23: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Simulation Configuration

Table with config parameters

04/18/23 23Natalie Enright Jerger - University of Wisconsin

Processors

Cores 16 in-order general purpose

Memory System

L1 I/D Caches 32 KB 2-way set associative1 cycle

Private L2 caches 512 KB 4-way set associative6 cycles64 Byte lines

Shared L3 Cache 16 MB (1MB bank/tile)4-way set associative12 cycles

Main Memory Latency 100 cycles

Interconnect: 4x4 2-D Mesh

Packet-switched baseline Optimized 1-3 router stages4 Virtual channels with 4 Buffers each

Hybrid Circuit Switching 1 router stage2 or 4 Circuit planes

Page 24: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Network Results

Communication latency is key: shave off precious cycles in network latency

04/18/23 24Natalie Enright Jerger - University of Wisconsin

Page 25: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Flit breakdown

Reduce interconnect latency for a significant fraction of messages

04/18/23 25Natalie Enright Jerger - University of Wisconsin

Page 26: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

HCS + Protocol Optimization

Improvement of HCS + Protocol optimization is greater than the sum of HCS or Protocol Optimization alone.

Protocol Optimization drives up circuit reuse, better utilizing HCS04/18/23 Natalie Enright Jerger - University of Wisconsin 26

Page 27: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Uniform Random Traffic

HCS successfully overcomes bandwidth limitations associated with Circuit Switching

04/18/23 27Natalie Enright Jerger - University of Wisconsin

Page 28: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Related Work Router optimizations

Express Virtual Channels [Kumar, ISCA 2007]

Single-cycle router [Mullins, ISCA 2004] Many more…

Hybrid Circuit-Switching Wave-switching [Duato, ICPP 1996] SoCBus [Wiklund, IPDPS 2003]

Coherence Protocols Significant research in removing

overhead of indirection

04/18/23 28Natalie Enright Jerger - University of Wisconsin

Page 29: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Circuit-Switched Coherence Summary

Replace packet-switched mesh with hybrid circuit-switched mesh

Interleave circuit and packet switched flits

Reconfigurable circuits Dedicated bandwidth for

frequent pair-wise sharers Low Latency and low power

Avoid switching/routing Devise novel coherence

mechanisms to take advantage of benefits of circuit switching

04/18/23 Natalie Enright Jerger - University of Wisconsin 29

Page 30: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Thank you

www.ece.wisc.edu/[email protected]

04/18/23 30Natalie Enright Jerger - University of Wisconsin

Page 31: Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE

Circuit Setup Novel Setup Policy

Overlap circuit setup with first data transfer Store circuit information at each router

Reconfigure existing circuits if no unused links available

Allows piggy-backed request to always achieve low latency

Multiple narrow circuit planes prevent frequent reconfiguration

Reconfiguration Buffered, traverses packet-switched pipeline

04/18/23 Natalie Enright Jerger - University of Wisconsin 31