hybrid on-chip data networks

49
Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University

Upload: others

Post on 12-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hybrid On-chip Data Networks

Hybrid On-chip Data Networks

Gilbert HendryKeren Bergman

Lightwave Research Lab

Columbia University

Page 2: Hybrid On-chip Data Networks

2

Chip-Scale Interconnection Networks

Intel Polaris IBM Cell AMD Opteron

• Chip multi-processors create need for high performance interconnects

• Performance bottleneck of on-chip networks and I/O• Power dissipation constraints of the chip package

• > 50% of total power comes from interconnects*

* N. Magen et al., “Interconnect-power dissipation in a microprocessor,” SLIP 2004.

Page 3: Hybrid On-chip Data Networks

3

Motivation

• CMPs of the future = 3D stacking• Lots of data on chip• Photonics offers

key advantages

Page 4: Hybrid On-chip Data Networks

4

Why Photonics?

TX RX

ELECTRONICS: Buffer, receive and re-transmit at

every router.

Each bus lane routed independently. (P ∝ NLANES)

Off-chip BW is pin-limited and power hungry.

Photonics changes the rules for Bandwidth, Energy, and Distance.

OPTICS: Modulate/receive high bandwidth

data stream once per communication event.

Broadband switch routes entire multi-wavelength stream.

Off-chip BW = On-chip BW for nearly same power.

RX

TX

RX RX

TX

RX

TX RXTX TX TXTXTX TX

RX

Page 5: Hybrid On-chip Data Networks

5

Hybrid Network Premise

Optical processing difficult and limited

Source, destination routing inefficient

Use electronics for routing, optics for switching and transmission

Hybrid Circuit-Switching

Page 6: Hybrid On-chip Data Networks

6

Hybrid Circuit-Switched Networks

Step 1: Path SETUP request

Electronic SETUP Msg

Source coreDestination Core

Page 7: Hybrid On-chip Data Networks

7

Hybrid Circuit-Switched Networks

Step 2: Path ACK

Electronic ACK Msg

Page 8: Hybrid On-chip Data Networks

8

Hybrid Circuit-Switched Networks

Step 3: Transmit Data

Photonic Switch Use Information

Page 9: Hybrid On-chip Data Networks

9

Hybrid Circuit-Switched Networks

Meanwhile: Path Contention

Path BLOCKED Msg(Backoff)

Page 10: Hybrid On-chip Data Networks

10

Hybrid Circuit-Switched Networks

Step 4: Path TEARDOWN

Electronic TEARDOWN Msg

Source coreDestination Core

Page 11: Hybrid On-chip Data Networks

11

Hybrid Circuit-Switched Networks

• Energy-efficient end-to-end transmission

• High bandwidth through WDM

• Electronic network still available for small control messages*

• Network-level support for secure regions

• Path setup latency• Path setup contention

(no fairness)

Pros: Cons:

* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]

Page 12: Hybrid On-chip Data Networks

Programming and Communication

Page 13: Hybrid On-chip Data Networks

13

Shared Memory

Implicit Communication

Explicit Communication

scaling

“… [ OpenMPon large systems] often performs worse than message passing due to a combination of false sharing, coherence traffic, contention, and system issues that arise from the difference in scheduling and network interface moderation”

~ ExascaleReport

Page 14: Hybrid On-chip Data Networks

14

Partitioned Global Address Space

Implicit Communication

Explicit Communication

[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]

Access Method

Local Read Optical Receive

Local Write Optical send

Remote Read Electronic request, optical receive

Remote Write Optical send

Shared R/W ?

Page 15: Hybrid On-chip Data Networks

15

Message Passing

Implicit Communication

Explicit Communication

* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]

• Complex, dynamic access patterns

• Relatively larger blocks of data• Scientific computing

Page 16: Hybrid On-chip Data Networks

16

Streaming

Implicit Communication

Explicit Communication

1

2

3

4

Input DataOutput Data

Persistent optical circuits

• Embedded / specialized systems (Graphics, Image + Signal Proc.)• Execution mode of general-purpose systems (Cell Processor)

Page 17: Hybrid On-chip Data Networks

Electronic Plane

Page 18: Hybrid On-chip Data Networks

18

Electronic Router

Arbiter

Control Router

Data Switch

Buffer Crossbar

Buffer Cntrl

Data Path

XbarCntrlRequest Bus

Flow Control

XbarAllocation

Data Switch Allocation

Routing Logic

Credits In

XbarCntrl

Ring Cntrl

Ring Cntrl

• Low frequency operation (~ 1GHz)• 1 VC (typically)• Small buffers (64-28)• Narrow Channels (8-32)

Page 19: Hybrid On-chip Data Networks

19

Network Gateway

Core

Core

Core

Core

Tx/Rx

Net

wor

k IF

Bidirectional Waveguide

Bidirectional Electronic Channel

Control RouterElectronic Crossbar

5-port photonic switch

To/From Control plane

To/From Data plane

Se

rializ

atio

n

Driv

ers

Des

eria

lizat

ion

Re

ceiv

ers

[P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]

External Concentration

Page 20: Hybrid On-chip Data Networks

The Photonic Plane

Page 21: Hybrid On-chip Data Networks

21

Wavelength Division Multiplexing

λ

waveguide

Page 22: Hybrid On-chip Data Networks

22

Silicon Photonic Waveguide Technology

[Vlasov and McNab, Optics Express 12 (8) 1622 (2004)]

C23(1559 nm)

C28

(1555 nm)C46

(1541 nm)C51

(1537 nm)

before injection into waveguide

after 5-cm waveguide and EDFA

[B. G. Lee et al., Photon. Technol. Lett. 20 (10) 767 (2008)]

1.28 Tb/s Data Transmission Experiment(occupies small slice of available WG BW)

100 ps

Silicon photonic waveguides provide low-power optical interconnects in CMOS-compatible platform.

Low-loss (1.7 dB/cm), high-bandwidth (> 200 nm) silicon photonic waveguides can be fabricated in commercial CMOS process.

Page 23: Hybrid On-chip Data Networks

23

Ring Resonator Operation

λ

λ

modulator/filter

Broadband spatial switch

Page 24: Hybrid On-chip Data Networks

24

Silicon Photonic Modulator and Detector Technology

[M Watts, Group Four Photonics (2008)]

[M Lipson, Optics Express (2007)]

85 fJ/bit demonstrated at 10 Gb/s Scalable to < 25 fJ/bit

18 Gb/s demonstrated

[S Koester, J. Lightw. Technol. (2007)]

Ge-on-Si Detectors: 40-GHz bandwidths 1 A/W responsivities

Receivers (detectors w/ CMOS amplifiers): 1.1 pJ/bit demonstrated at 10 Gb/s Scalable to < 50 fJ/bit

(CW) LASER

modulator detector

Page 25: Hybrid On-chip Data Networks

25

Higher Order Switch Designs

[A. Biberman, IEEE Phot. Tech. Letters (2010)]

Page 26: Hybrid On-chip Data Networks

26

On-Chip Topology Exploration

• Photonic Torus • Nonblocking Photonic Torus

[A. Shacham et al., Trans. on Comput., 2008] [M. Petracca et al. IEEE Micro, 2008]

Page 27: Hybrid On-chip Data Networks

27

On-Chip Topology Exploration

• TorusNX • Square Root

[J. Chan et al. JLT, May 2010]

Page 28: Hybrid On-chip Data Networks

28

Photonic Plane Characteristics

• Insertion Loss• Noise• Power

Page 29: Hybrid On-chip Data Networks

29

Insertion Loss and Optical Power Budget

Nonlinear Effects

WDM FactorO

pti

cal

P

ow

er B

ud

get

Worst-case Insertion Loss

Detector Sensitivity

Page 30: Hybrid On-chip Data Networks

30

Insertion Loss vs. Bandwidth

Network Size

Num

ber

of λ

Topologies

Page 31: Hybrid On-chip Data Networks

31

Simulation Results

4×46×6

8×810×10

12×12

14×14

16×16

18×18

0

10

20

30

40

50

Inse

rtio

n Lo

ss (

dB)

Topology Size (nodes)

Torus Topology

20.625.6

31.237.0

42.848.6

54.560.3

4×46×6

8×810×10

12×12

14×14

16×16

0

10

20

30

40

50

Inse

rtio

n Lo

ss (

dB)

Topology Size (nodes)

Non-BlockingTorus Topology

18.7 25.331.5

38.044.1

50.656.8

18×18

63.2

4×46×6

8×810×10

12×12

14×14

16×16

18×18

0

10

20

30

40

50

Inse

rtio

n Lo

ss (

dB)

Topology Size (nodes)

TorusNX Topology

15.819.5

23.227.1

31.034.9

38.842.7

4×48×8

16×16

0

10

20

30

40

50

Inse

rtio

n Lo

ss (

dB) Square Root Topology

12.221.5

30.6

Propagation Crossing Dropping Into a Ring

0 2 4 6 8Topology Size (nodes)

Page 32: Hybrid On-chip Data Networks

32

Simulation Results

0 100 200 3001

10

100

Num

ber

ofW

avel

engt

h C

hann

els

Number of Access Points

Torus Topology

100

Non-Blocking Torus Topology

10 20 301

10

Num

ber

ofW

avel

engt

h C

hann

els

Number of Access Points

20 dB30 dB40 dB

Improved

20 dB30 dB40 dB

Original

TorusNX Topology

0 100 200 3001

10

100

Num

ber

ofW

avel

engt

h C

hann

els

Number of Access Points

Square Root Topology

0 100 200 3001

10

100

Num

ber

ofW

avel

engt

h C

hann

els

Number of Access Points

20 dB30 dB40 dB

Improved

20 dB30 dB40 dB

Original

Original is based on the IL results from previous slide, Improved is based on a hypothetical improvement in crossing loss from 0.15 dB to 0.05 dB.

Optical power budget

Optical power budget

Page 33: Hybrid On-chip Data Networks

33

Photonic Plane Characteristics

• Insertion Loss• Noise• Power

Page 34: Hybrid On-chip Data Networks

34

Noise and Crosstalk

Laser Noise

Inter-Message Crosstalk

Intra-Message Crosstalk

Modulation Noise

Crosstalk

Filter

Coherent noise

Incoherent noise

Page 35: Hybrid On-chip Data Networks

35

Effects of Noise

Network Size

Op

tical

SN

R

Number of λ Network Load

Page 36: Hybrid On-chip Data Networks

36

Simulation Results

0

10

20

30

40

50

Opt

ical

SN

R(d

B)

100 101 102 103 104 105 106 107

Message Size (bit)

TorusNon-blocking TorusTorusNXSquare Root

The line at OSNR=16.9 dB is where a bit-error-rate of 10-12 can be achieved, assuming an ideal binary receiver circuit and orthogonal signaling.

Results •Results are plotted for network size of 8×8 at saturation, at the detectors.• Maximum OSNR = ~45 dB (due to laser noise)• Minimum OSNR < 17 dB (due to message-to-message crosstalk)• Variations between networks due to varying likelihood of two message intersecting on network topology.

System Performance• SNR measures the likelihood of error-free transmission.• Lower SNR designs will require additional retransmission, resulting in lower throughput performance.

Page 37: Hybrid On-chip Data Networks

37

Photonic Plane Characteristics

• Insertion Loss• Noise• Power

Page 38: Hybrid On-chip Data Networks

38

Power Usage

0V1V

n-regionp-regionElectronic Control

0V

1V

OhmicHeater

Thermal Control

λ

Tran

sm

issio

n

Injected Wavelengths

Off-resonance profile

On-resonance profile

• Laser Power• Active Power

• Modulating• Detecting• Broadband

• Static Power• Thermal tuning

• Tx\Rx Power• Drivers• TIAs

Page 39: Hybrid On-chip Data Networks

39

Energy Per Bit

10-13

10-12

10-11

10-10

10-9

10-8E

nerg

ype

rB

it(J

/bit)

10-7

100 101 102 103 104 105 106 107

Message Size (bit)

TorusNon-blocking TorusTorusNXSquare Root

Page 40: Hybrid On-chip Data Networks

40

Power Breakdown

Router Logic43%

Router Buffer44%

Electronic Wire3%

Detector3%

Modulator4%

PSE2%

Thermal1%

Router Logic45%

Router Buffer44%

Electronic Wire2%

Detector2%

Modulator4%

PSE2%

Thermal1%

• Results based on randomly generated traffic with message sizes of 100 kbit, with network in saturation.• Data was collected on 64 nodes topologies constrained to a total surface area of 2 cm × 2 cm.

Torus Topology Nonblocking Torus Topology

• 7 wavelengths @ 10 Gbps/each• Power Dissipation = 1.59 W

• 12 wavelengths @ 10 Gbps/each• Power Dissipation = 4.31 W

Page 41: Hybrid On-chip Data Networks

41

Power Breakdown

Router Logic37%

Router Buffer31%

Electronic Wire1%

Detector10%

Modulator17%

PSE1%

Thermal3%

Router Logic34%

Router Buffer31%

Electronic Wire7%

Detector8%

Modulator14%

PSE2%

Thermal4%

Square Root Topology TorusNX Topology

• 38 wavelengths @ 10 Gbps/each• Power Dissipation = 3.22 W

• 27 wavelengths @ 10 Gbps/each• Power Dissipation = 1.89 W

Page 42: Hybrid On-chip Data Networks

Performance

Page 43: Hybrid On-chip Data Networks

43

Performance

• Uniform random traffic• 256 cores, 64-node network

Page 44: Hybrid On-chip Data Networks

44

Scientific Applications0.00001

0.0001

0.001

0.01

Cactus GTC MADbench PARATEC

Exe

cutio

n T

ime

(s) E-Mesh P-Mesh

0.00001

0.0001

0.001

0.01

0.1

Cactus GTC MADbench PARATEC

Ene

rgy

(J)

E-Mesh P-Mesh

Page 45: Hybrid On-chip Data Networks

Other Interesting Issues

Page 46: Hybrid On-chip Data Networks

46

Memory Access

Processor Core

Network Router

Memory Access Point

[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]

Page 47: Hybrid On-chip Data Networks

47

Other Arbitration Means - TDM

[G. Hendry et al. Silicon Nanophotonic Network-On-Chip Using TDM Arbitration. In HOTI, Aug. 2010]

Page 48: Hybrid On-chip Data Networks

48

Wavelength Granularity

• Original Re-design

λ λ

Scalable number of WDM channels

Page 49: Hybrid On-chip Data Networks

49

Conclusion

• Some applications / programming models definitely well-suited to a circuit-switched photonic network

• Interesting tradeoffs and design space• Photonic physical layout / design• System-level benefits from device improvement• Network-level improvements